程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python data analysis science library pandas (statistical analysis and decision)

編輯:Python

One 、Pandas

  • It is selected by NumPy Design for the foundation , In data analysis Pandas and NumPy These two modules are often used together .
    Pandas It is the preferred library for data processing and data analysis .
    Pandas It provides a large number of standard data models and functions and methods required for efficient operation of large data sets ,
    Is making Python One of the important factors to become an efficient and powerful data analysis tool .
    Use Pandas library , You need to import it and name it pd:
import pandas as pd

Two 、pandas data structure

  • 1)Series, One dimensional array with label ;
    2)DatetimeIndex, The time series ;
    3)DataFrame, Two dimensional table structure with label and variable size ;
    4)Panel, Three dimensional array with label and variable size .

3、 ... and 、series object

  • 1)Series:pandas An object that provides a dictionary structure similar to a one-dimensional array , Used to store a row or column of data .
    2)Series The internal structure of an object is composed of two interrelated arrays , Which is used to store data ( Instant value ) Yes. value Main array ,
    Each element of the main array has a tag associated with it ( Index ), These tags are stored in another called Index In the array .
    3) If the index is not explicitly specified at the time of creation, it is automatically used from 0 Start with a nonnegative integer as the index .
    4)pd.Series(data=None, index=None, dtype=None, name=None, copy=False)

  • data: The data that produces the sequence , It can be ndarray、list Or a dictionary
    index: catalog index , The number of index values must be the same as data The length of the data is the same . If you omit index Parameters , Its index 0, 1, 2, ….
    dtype: The data type of the sequence element , If you omit pandas Infer data types from data types .
    name: The name of the sequence
    copy: Copy the data , Generate a copy of the data . The default value is false

''' 1.Series Object creation (1) Create with list Series Series([ data 1, data 2,…],index=[ Indexes 1, Indexes 2,…]) If you omit index, The index is automatically added '''
import pandas as pd
slist1 = pd.Series([1, 2, 3, 4])
# Automatically add index , Index from 0 Start 
slist2 = pd.Series([1,2,3,4],index=['a','b','c','d'])
slist3 = pd.Series([1,2,3,4],[chr(i+65) for i in range(4)])
print(slist1,slist2,slist3,sep='\n')
#(2) Use range establish Series
#Series(range(strat,stop,step),index=[ Indexes 1, Indexes 2,…])
# Omit index, Automatically add index , Index from 0 Start 
srange1 = pd.Series(range(3))
srange2 = pd.Series(range(2,10,3))
srange3 = pd.Series(range(2,10,2),index=[i for i in range(2,10,2) ])
print(srange1,srange2,srange3,sep='\n')
#(3) Use numpy One dimensional array creation Series
import numpy as np
snumpy1 = pd.Series(np.array([1, 2, 3, 4]))
snumpy2 = pd.Series(np.arange(6,10))
snumpy3 = pd.Series(np.linspace(20,30,5))
#20-30 It is divided into 5 A floating point number 
print(snumpy1,snumpy2,snumpy3,sep='\n')
# Created by array Series when , Reference to the object .
aa = np.array([2,3,4,5])
aas = pd.Series(aa)
aas1 = pd.Series(aa,copy=True)
print(aa,aas,sep='\n')
aa[3] = 999
aas[0] = 1000
print(aa,aas,aas1,sep='\n')
#(4) Create with dictionary Series, The key of the dictionary is the index 
sdict1 = pd.Series( {
'Python Programming ':95,' data structure ':90,' Advanced mathematics ':88} )# Tuples 
sdict2 = pd.Series(dict(python Programming =92, Advanced mathematics =89, data structure =96)) # Convert to tuple -> Indexes 
print(sdict1,sdict2,sep='\n')
# 2.Series View the value and index of 
# utilize values and index You can see Series Values and labels 
print(sdict1.index)
print(sdict1.values)
print(slist2.index)
print(sdict2.values)
# 3. Access and modify by index or slice Series Value 
sdict1 = pd.Series({
'python Programming ':85,' data structure ':87,' Advanced mathematics ':78})
print('\n','sdict1 Raw data '.center(20, '='))
print(sdict1)
sdict1['python Programming ']=98
# How to modify data 
print('\n'," modify sdict1['python Programming ']=89 Post data ".center(30, '='))
print(sdict1)
sdict1[' College English ']=87 # If the index does not exist , Then add the data 
print('\n','sdict1 Data after adding '.center(20, '='))
print(sdict1)
# You can also use the default index for modification and access 
sdict1[0:2]=0 # Local modification in the form of slices 
sdict1[3]=999
print('\n','sdict1[0:2]=0 Slice modification sdict1[3]=999 Data after index modification '.center(40, '='),sdict1,sep='\n')
# 4.Series Operations between objects 
#(1) Arithmetic operation and mathematical function operation 
ss = pd.Series(np.linspace(-2,2,4))
print('\n','ss The original data '.center(14, '='),ss,sep='\n')
print('\n',' Yes ss Calculate the absolute value of all data , Retain 3 Decimal place '.center(20, '='),round(abs(ss),3),sep='\n')
print(np.sqrt(abs(ss)))
# round() Method returns a floating-point number x Round the value of . Its syntax :round( x [, n] )
# Parameters :x -- Numerical expression ;n -- Numerical expression , From the decimal places .
# abs Function to find the absolute value 
print('\n','ss The arithmetic operation of '.center(14, '='))
print(' Add +5:',ss+5,' reduce -5:',ss-5,' ride *5',ss*5,' except /5',ss/5,sep='\n')
# (2) Select the elements that meet the criteria 
ss = pd.Series([np.random.randint(1,50) for i in range(10)])
# Screening is greater than 25 All the numbers of 
ss1 = ss[ss>25]
print('\n',ss,ss1,sep='\n')
# Screening can be 5 Divisible number 
ss2 = ss[ss%5==0]
print(' Screening can be 5 Divisible number \n',ss2)
# select 20 To 30 Number between 
print(' select 20 To 30 Number between '.center(20,'='))
ss3 = ss[(20<=ss) & (ss<=30)]
print(ss3)
# Screen out less than 20 And greater than 40 Number of numbers 
print(' Screen out less than 20 And greater than 40 Number of numbers '.center(20,'='))
ss3 = ss[(40<=ss) | (ss<=20)]
print(ss3)
# (3) Modify the index reset_index(drop=True)
ss_index1 = ss1.reset_index()
print(ss_index1,type(ss_index1))
ss_index2 = ss1.reset_index(drop=True)
# Delete the original index 
print(ss_index2,type(ss_index2))
# 5.Series Object statistical operation 
stu1 = pd.Series({
'python Programming ':85,' data structure ':87,' Advanced mathematics ':96,' College English ':76})
# Find the maximum index and the minimum index 
print('stu1 Index of maximum value '.center(20, '='))
print(stu1.idxmax())
print('stu1 Index of minimum value '.center(20, '='))
print(stu1.idxmin())
# Calculating mean mean()
print('\n mean value :',stu1.mean())
print('stu1 Larger than average data in '.center(30,'='))
stu_mean=stu1[stu1>stu1.mean()]
print(stu_mean)
# Determine whether an element exists isin()
nn = [i for i in range(80,90)]
nn1 = stu1.isin(nn)
# Its return value is True or False
print(' With bool Value of output 80--89 Data between '.center(30, '='))
print(nn1)
print(' Only the output bool by True The data of '.center(30, '='))
print(nn1[nn1.values==True]) # The output value is True The data of 
nn1_index = nn1[nn1.values==True].index
print(' Output True Index of data '.center(30, '='))
print(nn1_index)
print(list(nn1_index))
# Project analysis 
import pandas as pd
num = pd.Series([np.random.randint(1,10) for i in range(20)])
print('\n primary Series\n',num)
# take Series Delete duplicate elements in , duplicate removal .
print(' take Series De duplication of elements in unique()'.center(30,'='))
print(num.unique())
# Count the number of occurrences of elements value_counts()
print(' Count the number of occurrences of elements value_counts()'.center(30,'='))
print(num.value_counts())
# Sort , Sort by value sort_values(ascending=True), Sort ascending by default , primary Series unchanged 
print(' Sort by value sort_values()'.center(30,'='))
print(num.sort_values())
# Sort by value sort_values(ascending=False), Sort in descending order , primary Series unchanged 
print(' Sort by value sort_values(ascending=False)'.center(30,'='))
print(num.sort_values(ascending=False))

The operation results are as follows :

  • 0 1
    1 2
    2 3
    3 4
    dtype: int64
    a 1
    b 2
    c 3
    d 4
    dtype: int64
    A 1
    B 2
    C 3
    D 4
    dtype: int64
    0 0
    1 1
    2 2
    dtype: int64
    0 2
    1 5
    2 8
    dtype: int64
    2 2
    4 4
    6 6
    8 8
    dtype: int64
    0 1
    1 2
    2 3
    3 4
    dtype: int32
    0 6
    1 7
    2 8
    3 9
    dtype: int32
    0 20.0
    1 22.5
    2 25.0
    3 27.5
    4 30.0
    dtype: float64
    [2 3 4 5]
    0 2
    1 3
    2 4
    3 5
    dtype: int32
    [1000 3 4 999]
    0 1000
    1 3
    2 4
    3 999
    dtype: int32
    0 2
    1 3
    2 4
    3 5
    dtype: int32
    Python Programming 95
    data structure 90
    Advanced mathematics 88
    dtype: int64
    python Programming 92
    Advanced mathematics 89
    data structure 96
    dtype: int64
    Index([‘Python Programming ’, ‘ data structure ’, ‘ Advanced mathematics ’], dtype=‘object’)
    [95 90 88]
    Index([‘a’, ‘b’, ‘c’, ‘d’], dtype=‘object’)
    [92 89 96]

  • =sdict1 Raw data =
    python Programming 85
    data structure 87
    Advanced mathematics 78
    dtype: int64

  • = modify sdict1[‘python Programming ’]=89 Post data =
    python Programming 98
    data structure 87
    Advanced mathematics 78
    dtype: int64

  • sdict1 Data after adding =
    python Programming 98
    data structure 87
    Advanced mathematics 78
    College English 87
    dtype: int64

  • =sdict1[0:2]=0 Slice modification sdict1[3]=999 Data after index modification =
    python Programming 0
    data structure 0
    Advanced mathematics 78
    College English 999
    dtype: int64

  • ss The original data =
    0 -2.000000
    1 -0.666667
    2 0.666667
    3 2.000000
    dtype: float64

  • = Yes ss Calculate the absolute value of all data , Retain 3 Decimal place =
    0 2.000
    1 0.667
    2 0.667
    3 2.000
    dtype: float64
    0 1.414214
    1 0.816497
    2 0.816497
    3 1.414214
    dtype: float64

  • =ss The arithmetic operation of ==
    Add +5:
    0 3.000000
    1 4.333333
    2 5.666667
    3 7.000000
    dtype: float64
    reduce -5:
    0 -7.000000
    1 -5.666667
    2 -4.333333
    3 -3.000000
    dtype: float64
    ride *5
    0 -10.000000
    1 -3.333333
    2 3.333333
    3 10.000000
    dtype: float64
    except /5
    0 -0.400000
    1 -0.133333
    2 0.133333
    3 0.400000
    dtype: float64

  • 0 28
    1 29
    2 33
    3 25
    4 21
    5 46
    6 17
    7 13
    8 13
    9 20
    dtype: int64
    0 28
    1 29
    2 33
    5 46
    dtype: int64
    Screening can be 5 Divisible number
    3 25
    9 20
    dtype: int64
    select 20 To 30 Number between
    0 28
    1 29
    3 25
    4 21
    9 20
    dtype: int64
    = Screen out less than 20 And greater than 40 Number of numbers =
    5 46
    6 17
    7 13
    8 13
    9 20
    dtype: int64
    index 0
    0 0 28
    1 1 29
    2 2 33
    3 5 46 <class ‘pandas.core.frame.DataFrame’>
    0 28
    1 29
    2 33
    3 46
    dtype: int64 <class ‘pandas.core.series.Series’>
    =stu1 Index of maximum value =
    Advanced mathematics
    =stu1 Index of minimum value =
    College English

  • mean value : 86.0
    stu1 Larger than average data in =
    data structure 87
    Advanced mathematics 96
    dtype: int64
    With bool Value of output 80–89 Data between
    python Programming True
    data structure True
    Advanced mathematics False
    College English False
    dtype: bool
    = Only the output bool by True The data of ==
    python Programming True
    data structure True
    dtype: bool
    = Output True Index of data =
    Index([‘python Programming ’, ‘ data structure ’], dtype=‘object’)
    [‘python Programming ’, ‘ data structure ’]

  • primary Series
    0 4
    1 7
    2 4
    3 5
    4 1
    5 1
    6 5
    7 8
    8 6
    9 7
    10 4
    11 8
    12 9
    13 1
    14 6
    15 5
    16 6
    17 7
    18 2
    19 1
    dtype: int64
    take Series De duplication of elements in unique()=
    [4 7 5 1 8 6 9 2]
    = Count the number of occurrences of elements value_counts()=
    1 4
    7 3
    6 3
    5 3
    4 3
    8 2
    9 1
    2 1
    dtype: int64
    Sort by value sort_values()
    19 1
    13 1
    4 1
    5 1
    18 2
    10 4
    0 4
    2 4
    3 5
    15 5
    6 5
    8 6
    14 6
    16 6
    1 7
    17 7
    9 7
    7 8
    11 8
    12 9
    dtype: int64
    Sort by value sort_values(ascending=False)
    12 9
    11 8
    7 8
    9 7
    17 7
    1 7
    16 6
    14 6
    8 6
    6 5
    15 5
    3 5
    2 4
    0 4
    10 4
    18 2
    5 1
    4 1
    13 1
    19 1
    dtype: int64


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved