您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

[Python] detailed pandas missing value processing

編輯：Python

 Detailed explanation of this article pandas Missing values in （Missing data handling） Handle common operations .
Missing value processing is often used in data analysis and data cleaning ;
Pandas Define the following types as missing values ：
NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’,
‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘<NA>’, ‘N/A’, ‘NA’,
‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’,None

1、pandas Precautions for missing values in

pandas and numpy in Any two missing values are not equal （np.nan != np.nan）

Two in the following figure NaN It's not equal ：

In [224]: df1.iloc[3:,0].values# Take out 'one' In column NaN
Out[224]: array([nan])
In [225]: df1.iloc[2:3,1].values# Take out 'two' In column NaN
Out[225]: array([nan])
In [226]: df1.iloc[3:,0].values == df1.iloc[2:3,1].values# Two NaN Value inequality
Out[226]: array([False])

pandas When reading a file Those values are considered missing values

NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’,‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘<NA>’, ‘N/A’, ‘NA’,‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’,None

2、pandas Missing value operation

pandas.DataFrame in Determine which values are missing ：isna Method

# Define an experiment DataFrame
In [47]: d = {'one': pd.Series([1., 2., 3.], index=['a', 'b', 'c']),'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
In [48]: df = pd.DataFrame(d)
In [49]: df
Out[49]:
   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0
In [120]: df.isna()# Return the same shape bool Value padding DataFrame
Out[120]:
     one    two
a  False  False
b  False  False
c  False  False
d   True  False

pandas.DataFrame in Delete rows with missing values ：dropna(axis=0)

In [67]: df
Out[67]:
   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0
In [68]: df.dropna()# Default axis=0
Out[68]:
   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0

pandas.DataFrame in Delete columns with missing values ：dropna(axis=1)

In [72]: df.dropna(axis=1)
Out[72]:
   two
a  1.0
b  2.0
c  3.0
d  4.0

pandas.DataFrame in Delete columns and rows that contain missing values ：dropna(how='any')

In [97]: df['three']=np.nan# Add a new column, all of which are NaN
In [98]: df
Out[98]:
   one  two  three
a  1.0  1.0    NaN
b  2.0  2.0    NaN
c  3.0  3.0    NaN
d  NaN  4.0    NaN
In [99]: df.dropna(how='any')
Out[99]:
Empty DataFrame# All deleted
Columns: [one, two, three]
Index: []

pandas.DataFrame in Delete rows that are all missing values ：dropna(axis=0,how='all')

In [101]: df.dropna(axis=0,how='all')
Out[101]:
   one  two  three
a  1.0  1.0    NaN
b  2.0  2.0    NaN
c  3.0  3.0    NaN
d  NaN  4.0    NaN

pandas.DataFrame in Delete columns that are all missing values ：dropna(axis=1,how='all')

In [102]: df.dropna(axis=1,how='all')
Out[102]:
   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0

pandas.DataFrame in Fill in the missing values with a value ：fillna( A certain value )

In [103]: df.fillna(666)# Use 666 fill
Out[103]:
     one  two  three
a    1.0  1.0  666.0
b    2.0  2.0  666.0
c    3.0  3.0  666.0
d  666.0  4.0  666.0

pandas.DataFrame in Fill in the missing values with the values from the previous column ：fillna(axis=1,method='ffill')

# The latter column is filled with fillna(axis=1,method=bfill')
In [109]: df.fillna(axis=1,method='ffill')
Out[109]:
   one  two  three
a  1.0  1.0    1.0
b  2.0  2.0    2.0
c  3.0  3.0    3.0
d  NaN  4.0    4.0

pandas.DataFrame in Fill in the missing values with the values from the previous row ：fillna(axis=0,method='ffill')

# The next line is filled with fillna(axis=1,method=bfill')
In [110]: df.fillna(method='ffill')
Out[110]:
   one  two  three
a  1.0  1.0    NaN
b  2.0  2.0    NaN
c  3.0  3.0    NaN
d  3.0  4.0    NaN

pandas.DataFrame in Use the dictionary to fill in the missing values of the specified column

In [112]: df.fillna({'one':666})# fill one Column NaN value
Out[112]:
     one  two  three
a    1.0  1.0    NaN
b    2.0  2.0    NaN
c    3.0  3.0    NaN
d  666.0  4.0    NaN
In [113]: df.fillna({'three':666})
Out[113]:
   one  two  three
a  1.0  1.0  666.0
b  2.0  2.0  666.0
c  3.0  3.0  666.0
d  NaN  4.0  666.0

3、 Reference material

https://pandas.pydata.org/pandas-docs/stable/reference/frame.html?highlight=missing

-END-

 Past highlights
It is suitable for beginners to download the route and materials of artificial intelligence ( Image & Text + video ) Introduction to machine learning series download Chinese University Courses 《 machine learning 》（ Huang haiguang keynote speaker ） Print materials such as machine learning and in-depth learning notes 《 Statistical learning method 》 Code reproduction album machine learning communication qq Group 955171419, Please scan the code to join wechat group