程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Using Python data analysis -- numpy Foundation: general functions, using arrays for data processing

編輯:Python

List of articles

  • 1 The generic function
  • 2 Using arrays for data processing
    • 2.1 np.meshgrid() function
    • 2.2 The conditional logic is expressed as an array operation
    • 2.3 Mathematical and statistical methods
    • 2.4 Method for Boolean arrays

1 The generic function

The generic function (ufunc) Is a kind of right ndarray A function that performs element-level operations on the data in .

  • for instance sqrt and exp
In [137]: arr = np.arange(10)
In [138]: arr
Out[138]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [139]: np.sqrt(arr)
Out[139]:
array([ 0. , 1. , 1.4142, 1.7321, 2. , 2.2361, 2.4495,
2.6458, 2.8284, 3. ])
In [140]: np.exp(arr)
Out[140]:
array([ 1. , 2.7183, 7.3891, 20.0855, 54.5982,
148.4132, 403.4288, 1096.6332, 2980.958 , 8103.0839])
  • There are also some functions , for example add or maximum, receive 2 An array .( Also called binary ufunc), And returns an array of results :
In [141]: x = np.random.randn(8)
In [142]: y = np.random.randn(8)
In [143]: x
Out[143]:
array([-0.0119, 1.0048, 1.3272, -0.9193, -1.5491, 0.0222, 0.7584,
-0.6605])
In [144]: y
Out[144]:
array([ 0.8626, -0.01 , 0.05 , 0.6702, 0.853 , -0.9559, -0.0235,
-2.3042])
In [145]: np.maximum(x, y)
Out[145]:
array([ 0.8626, 1.0048, 1.3272, 0.6702, 0.853 , 0.0222, 0.7584,
-0.6605])
  • There are some ufunc You can return multiple arrays ( But not much ). for example modf Can return the decimal and integer parts of a floating-point array :
In [146]: arr = np.random.randn(7) * 5
In [147]: arr
Out[147]: array([-3.2623, -6.0915, -6.663 , 5.3731, 3.6182, 3.45 , 5.0077])
In [148]: remainder, whole_part = np.modf(arr)
In [149]: remainder
Out[149]: array([-0.2623, -0.0915, -0.663 , 0.3731,
0.6182, 0.45 , 0.0077])
In [150]: whole_part
Out[150]: array([-3., -6., -6., 5., 3., 3., 5.])

2 Using arrays for data processing

NumPy Arrays can make Various data processing tasks are expressed as simple Array expression ( Otherwise you need to write a loop ).

Replace the loop with an array expression , This is often called vectorization .

2.1 np.meshgrid() function

np.meshgrid() Function means , Return the coordinate matrix according to the given vector .

give an example :
The abscissa given is [1,2,3], The ordinate is [7, 8]
The return is

[array([ [1,2,3] [1,2,3] ]), array([ [7,7,7] [8,8,8] ])]

That is to say, it represents six points ,(1, 7) (2, 7) (3, 7) (1, 8) (2, 8) (3, 8)

#coding:utf-8
import numpy as np
# Coordinate vector 
a = np.array([1,2,3])
# Coordinate vector 
b = np.array([7,8])
# Returns the coordinate matrix from the coordinate vector 
# return list, There are two elements , The first element is X The value of the axis , The second element is Y The value of the axis 
res = np.meshgrid(a,b)
# Return results : [array([ [1,2,3] [1,2,3] ]), array([ [7,7,7] [8,8,8] ])]
  • Another example : If you want to compute functions on a set of grids sqrt(x^2 + y^2)
In [155]: points = np.arange(-5, 5, 0.01) # 1000 equally spaced points
In [156]: xs, ys = np.meshgrid(points, points)
In [157]: ys
Out[157]:
array([[-5. , -5. , -5. , ..., -5. , -5. , -5. ],
[-4.99, -4.99, -4.99, ..., -4.99, -4.99, -4.99],
[-4.98, -4.98, -4.98, ..., -4.98, -4.98, -4.98],
...,
[ 4.97, 4.97, 4.97, ..., 4.97, 4.97, 4.97],
[ 4.98, 4.98, 4.98, ..., 4.98, 4.98, 4.98],
[ 4.99, 4.99, 4.99, ..., 4.99, 4.99, 4.99]])
  • Now? , It's easy to evaluate this function , Write the expression as if these two arrays were two floating point Numbers :
In [158]: z = np.sqrt(xs ** 2 + ys ** 2)
In [159]: z
Out[159]:
array([[ 7.0711, 7.064 , 7.0569, ..., 7.0499, 7.0569, 7.064 ],
[ 7.064 , 7.0569, 7.0499, ..., 7.0428, 7.0499, 7.0569],
[ 7.0569, 7.0499, 7.0428, ..., 7.0357, 7.0428, 7.0499],
...,
[ 7.0499, 7.0428, 7.0357, ..., 7.0286, 7.0357, 7.0428],
[ 7.0569, 7.0499, 7.0428, ..., 7.0357, 7.0428, 7.0499],
[ 7.064 , 7.0569, 7.0499, ..., 7.0428, 7.0499, 7.0569]])

2.2 The conditional logic is expressed as an array operation

  • numpy.where The function is a ternary expression x if condition else y Vectorized version of .

First , Initialize two value arrays and a Boolean array :

In [165]: xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
In [166]: yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
In [167]: cond = np.array([True, False, True, True, False])
  • Requirements are based on cond The value in , selection xarr and yarr The value in : When cond by True when , choice xarr Value , Otherwise, from y Select the .

Derivation method :

In [168]: result = [(x if c else y)
.....: for x, y, c in zip(xarr, yarr, cond)]
In [169]: result
Out[169]: [1.1000000000000001, 2.2000000000000002, 1.3, 1.3999999999999999, 2.5]
  • The above derivation is not very fast for large arrays ( Because all the work is done by Python complete ); And cannot be used with multidimensional arrays .

  • have access to np.where()

In [170]: result = np.where(cond, xarr, yarr)
In [171]: result
Out[171]: array([ 1.1, 2.2, 1.3, 1.4, 2.5])
  • In data analysis ,where Usually used to generate a new array from another array .

give an example , First, initialize a arr Array .

In [172]: arr = np.random.randn(4, 4)
In [173]: arr
Out[173]:
array([[-0.5031, -0.6223, -0.9212, -0.7262],
[ 0.2229, 0.0513, -1.1577, 0.8167],
[ 0.4336, 1.0107, 1.8249, -0.9975],
[ 0.8506, -0.1316, 0.9124, 0.1882]])

then , use arr>0 This vector expression , Get one bool Array .( This step is not necessary , For the convenience of understanding Wang np.where() function )


In [174]: arr > 0
Out[174]:
array([[False, False, False, False],
[ True, True, False, True],
[ True, True, True, False],
[ True, False, True, True]], dtype=bool)
  • demand : For a matrix , You want to replace all positive numbers with 2, Replace all negative numbers with -2.

In [175]: np.where(arr > 0, 2, -2)
Out[175]:
array([[-2, -2, -2, -2],
[ 2, 2, -2, 2],
[ 2, 2, 2, -2],
[ 2, -2, 2, 2]])
  • Combine scalars and arrays . It's just an example , Just replace all positive values with 2, The remainder remains the same .
In [176]: np.where(arr > 0, 2, arr) # set only positive values to 2
Out[176]:
array([[-0.5031, -0.6223, -0.9212, -0.7262],
[ 2. , 2. , -1.1577, 2. ],
[ 2. , 2. , 2. , -0.9975],
[ 2. , -0.1316, 2. , 2. ]])

2.3 Mathematical and statistical methods

  • The way I feel useful is :argminargmaxcumsumcumprod

2.4 Method for Boolean arrays

In these methods above , The Boolean value is cast to 1(True) and 0(False). therefore ,sum It is often used against Boolean arrays in True The value of count :

In [190]: arr = np.random.randn(100)
In [191]: (arr > 0).sum() # Number of positive values
Out[191]: 42
  • any To test the presence of one or more arrays True, and all Checks that all the values in the array are True:
In [192]: bools = np.array([False, False, True, False])
In [193]: bools.any()
Out[193]: True
In [194]: bools.all()
Out[194]: False

  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved