您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Using Python data analysis -- numpy Foundation: general functions, using arrays for data processing

編輯：Python

List of articles

1 The generic function
2 Using arrays for data processing
- 2.1 np.meshgrid() function
- 2.2 The conditional logic is expressed as an array operation
- 2.3 Mathematical and statistical methods
- 2.4 Method for Boolean arrays

1 The generic function

The generic function （ufunc） Is a kind of right ndarray A function that performs element-level operations on the data in .

for instance sqrt and exp：

In [137]: arr = np.arange(10)
In [138]: arr
Out[138]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [139]: np.sqrt(arr)
Out[139]:
array([ 0. , 1. , 1.4142, 1.7321, 2. , 2.2361, 2.4495,
2.6458, 2.8284, 3. ])
In [140]: np.exp(arr)
Out[140]:
array([ 1. , 2.7183, 7.3891, 20.0855, 54.5982,
148.4132, 403.4288, 1096.6332, 2980.958 , 8103.0839])

There are also some functions , for example add or maximum, receive 2 An array .（ Also called binary ufunc）, And returns an array of results ：

In [141]: x = np.random.randn(8)
In [142]: y = np.random.randn(8)
In [143]: x
Out[143]:
array([-0.0119, 1.0048, 1.3272, -0.9193, -1.5491, 0.0222, 0.7584,
-0.6605])
In [144]: y
Out[144]:
array([ 0.8626, -0.01 , 0.05 , 0.6702, 0.853 , -0.9559, -0.0235,
-2.3042])
In [145]: np.maximum(x, y)
Out[145]:
array([ 0.8626, 1.0048, 1.3272, 0.6702, 0.853 , 0.0222, 0.7584,
-0.6605])

There are some ufunc You can return multiple arrays （ But not much ）. for example modf Can return the decimal and integer parts of a floating-point array ：

In [146]: arr = np.random.randn(7) * 5
In [147]: arr
Out[147]: array([-3.2623, -6.0915, -6.663 , 5.3731, 3.6182, 3.45 , 5.0077])
In [148]: remainder, whole_part = np.modf(arr)
In [149]: remainder
Out[149]: array([-0.2623, -0.0915, -0.663 , 0.3731,
0.6182, 0.45 , 0.0077])
In [150]: whole_part
Out[150]: array([-3., -6., -6., 5., 3., 3., 5.])

2 Using arrays for data processing

NumPy Arrays can make Various data processing tasks are expressed as simple Array expression （ Otherwise you need to write a loop ）.

Replace the loop with an array expression , This is often called vectorization .

2.1 np.meshgrid() function

np.meshgrid() Function means , Return the coordinate matrix according to the given vector .

give an example ：
The abscissa given is [1,2,3], The ordinate is [7, 8]
The return is

[array([ [1,2,3] [1,2,3] ]), array([ [7,7,7] [8,8,8] ])]

That is to say, it represents six points ,(1, 7) (2, 7) (3, 7) (1, 8) (2, 8) (3, 8)

#coding:utf-8
import numpy as np
# Coordinate vector 
a = np.array([1,2,3])
# Coordinate vector 
b = np.array([7,8])
# Returns the coordinate matrix from the coordinate vector 
# return list, There are two elements , The first element is X The value of the axis , The second element is Y The value of the axis 
res = np.meshgrid(a,b)
# Return results : [array([ [1,2,3] [1,2,3] ]), array([ [7,7,7] [8,8,8] ])]

Another example ： If you want to compute functions on a set of grids sqrt(x^2 + y^2)

In [155]: points = np.arange(-5, 5, 0.01) # 1000 equally spaced points
In [156]: xs, ys = np.meshgrid(points, points)
In [157]: ys
Out[157]:
array([[-5. , -5. , -5. , ..., -5. , -5. , -5. ],
[-4.99, -4.99, -4.99, ..., -4.99, -4.99, -4.99],
[-4.98, -4.98, -4.98, ..., -4.98, -4.98, -4.98],
...,
[ 4.97, 4.97, 4.97, ..., 4.97, 4.97, 4.97],
[ 4.98, 4.98, 4.98, ..., 4.98, 4.98, 4.98],
[ 4.99, 4.99, 4.99, ..., 4.99, 4.99, 4.99]])

Now? , It's easy to evaluate this function , Write the expression as if these two arrays were two floating point Numbers ：

In [158]: z = np.sqrt(xs ** 2 + ys ** 2)
In [159]: z
Out[159]:
array([[ 7.0711, 7.064 , 7.0569, ..., 7.0499, 7.0569, 7.064 ],
[ 7.064 , 7.0569, 7.0499, ..., 7.0428, 7.0499, 7.0569],
[ 7.0569, 7.0499, 7.0428, ..., 7.0357, 7.0428, 7.0499],
...,
[ 7.0499, 7.0428, 7.0357, ..., 7.0286, 7.0357, 7.0428],
[ 7.0569, 7.0499, 7.0428, ..., 7.0357, 7.0428, 7.0499],
[ 7.064 , 7.0569, 7.0499, ..., 7.0428, 7.0499, 7.0569]])

2.2 The conditional logic is expressed as an array operation

numpy.where The function is a ternary expression x if condition else y Vectorized version of .

First , Initialize two value arrays and a Boolean array ：

In [165]: xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
In [166]: yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
In [167]: cond = np.array([True, False, True, True, False])

Requirements are based on cond The value in , selection xarr and yarr The value in ： When cond by True when , choice xarr Value , Otherwise, from y Select the .

Derivation method ：

In [168]: result = [(x if c else y)
.....: for x, y, c in zip(xarr, yarr, cond)]
In [169]: result
Out[169]: [1.1000000000000001, 2.2000000000000002, 1.3, 1.3999999999999999, 2.5]

The above derivation is not very fast for large arrays （ Because all the work is done by Python complete ）; And cannot be used with multidimensional arrays .
have access to np.where()

In [170]: result = np.where(cond, xarr, yarr)
In [171]: result
Out[171]: array([ 1.1, 2.2, 1.3, 1.4, 2.5])

In data analysis ,where Usually used to generate a new array from another array .

give an example , First, initialize a arr Array .

In [172]: arr = np.random.randn(4, 4)
In [173]: arr
Out[173]:
array([[-0.5031, -0.6223, -0.9212, -0.7262],
[ 0.2229, 0.0513, -1.1577, 0.8167],
[ 0.4336, 1.0107, 1.8249, -0.9975],
[ 0.8506, -0.1316, 0.9124, 0.1882]])

then , use arr>0 This vector expression , Get one bool Array .（ This step is not necessary , For the convenience of understanding Wang np.where() function ）


In [174]: arr > 0
Out[174]:
array([[False, False, False, False],
[ True, True, False, True],
[ True, True, True, False],
[ True, False, True, True]], dtype=bool)

demand ： For a matrix , You want to replace all positive numbers with 2, Replace all negative numbers with -2.


In [175]: np.where(arr > 0, 2, -2)
Out[175]:
array([[-2, -2, -2, -2],
[ 2, 2, -2, 2],
[ 2, 2, 2, -2],
[ 2, -2, 2, 2]])

Combine scalars and arrays . It's just an example , Just replace all positive values with 2, The remainder remains the same .

In [176]: np.where(arr > 0, 2, arr) # set only positive values to 2
Out[176]:
array([[-0.5031, -0.6223, -0.9212, -0.7262],
[ 2. , 2. , -1.1577, 2. ],
[ 2. , 2. , 2. , -0.9975],
[ 2. , -0.1316, 2. , 2. ]])

2.3 Mathematical and statistical methods

The way I feel useful is ：argmin、argmax、cumsum、cumprod

2.4 Method for Boolean arrays

In these methods above , The Boolean value is cast to 1（True） and 0（False）. therefore ,sum It is often used against Boolean arrays in True The value of count ：

In [190]: arr = np.random.randn(100)
In [191]: (arr > 0).sum() # Number of positive values
Out[191]: 42

any To test the presence of one or more arrays True, and all Checks that all the values in the array are True：

In [192]: bools = np.array([False, False, True, False])
In [193]: bools.any()
Out[193]: True
In [194]: bools.all()
Out[194]: False

上一篇文章：利用python數據分析——Numpy基礎：通用函數、利用數組進行數據處理
下一篇文章： Python easily makes an SSH login

Python

Attachment: use of urllib Library in Python

Today lets walk into python In

Bid farewell to monotony and transform the Django background home page - use the adminlte component

Preface I made a Django Proje

The python regression prediction model always reports index errors below

python Regression model predic

Python這麼火，要不要學？聽聽華為工程師怎麼說...

編程語言發展得非常之快，後起之秀中Python顯然最為耀眼。

Computer graduation design Python+django personal blog system (source code + system + mysql database + Lw document)

Project IntroductionWith the r

python矢量數據篩選

屬性過濾【

Installing the Python interpreter - detailed process

How to add the same character to each element of Python list

Pandas uses the split function to split the specific string data column of dataframe into two new data columns and generate a new dataframe

Self taught Python 32 generator: edge loop edge calculation

No action, no intention Python object-oriented-53. Introduction to encapsulation in Python

51job crawler + data visualization Python

[formatting method] Python & String

Python data structure problems

Database programming interface of Python operating database

How to master Python quickly

熱門圖文

百度雲推送-服務端 C# SDK LeetCode Largest Rectangle in Histogram Blue Bridge Cup [11th finals] blue jump Python 30 points python_bisect模塊的使用 PHP開發框架Yii Framework教程(24) 數據庫-DAO示例 java學習中，接口的使用（重要，常用知識點）（java 學習中的小記錄），java知識點 C說話求向量和的兩則成績解答分享 asp和SQL語法中引號的使用方法

欄目導航