程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python Basics - missing value processing (16)

編輯:Python

「 This is my participation 2022 For the first time, the third challenge is 16 God , Check out the activity details :2022 For the first time, it's a challenge 」.

We talked about how to deal with duplicate values , Today, let's talk about missing values . The missing values are mainly divided into mechanical reasons and human reasons . The mechanical reason is that the memory is broken , Failure to collect data for a certain period of time due to machine failure, etc . There are more types of human causes , Such as deliberate concealment .

First build a with missing values DataFrame, as follows :

import pandas as pd
import numpy as np
data = pd.DataFrame([[1,np.nan,3],[np.nan,5,np.nan]],columns = ['a','b','c'])
print(data)
 Copy code 

See that ?np.nan Namely NAN value , Meaning of null value .

stay numpy There is a function in to view null values , incorrect , Are the two ,isnull() and isna() These two functions . Let's try their effects separately :

import pandas as pd
import numpy as np
data = pd.DataFrame([[1,np.nan,3],[np.nan,5,np.nan]],columns = ['a','b','c'])
data.isnull()
data.isna()
 Copy code 

It can be seen that , These two functions are used to judge whether the data is null , If it is , Just go back to true, No, it is. false.

Usually , There are two ways to handle null values , One is to delete null values , The other is to fill it in , Let's start with the first one , Delete null , We can dropna() This function deletes null values . it is to be noted that , It will delete the entire line with null values . for example :

import pandas as pd
import numpy as np
data = pd.DataFrame([[1,np.nan,3],[np.nan,5,np.nan]],columns = ['a','b','c'])
data.dropna()
 Copy code 

The example above uses drop After the function , Nothing !

We can set when each line of blank value is redundant 2 Delete after ( lower than 2 A reservation ), It's time to use dropna() Parameters of thresh.

There are many ways to add null values , Useful mean complements , Median supplement, etc , We need to use fillna() This function . for example , We use the mean to fill in the above data, The code is as follows :

import pandas as pd
import numpy as np
data = pd.DataFrame([[1,np.nan,3],[np.nan,5,np.nan]],columns = ['a','b','c'])
data.fillna(data.mean())
 Copy code 

The result of running the code is as follows , You can see that the null values are filled with the mean values of the corresponding columns .


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved