程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Pandas de redo the previous or next drop_ duplicates

編輯:Python

pandas duplicate removal Keep previous or next drop_duplicates

  • subset Parameters
  • keep Parameters
  • inplace Parameters
  • Example

pandas In the library drop_duplicates() A function is a de duplication artifact , This function can also be used to manually set whether to keep the top record or the bottom record in the de duplication process .

DataFrame.drop_duplicates(self, subset=None, keep='first', inplace=False)[source]

There are three parameters ,subset、keep and inplace

subset Parameters

subset : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns

subset Parameter is used to set which column repetition is used as the repetition standard , Parameters are column labels , If the value is not set , The default is to use all columns as the repeated judgment condition .

keep Parameters

keep : {
‘first’, ‘last’, False}, default ‘first’
first : Drop duplicates except for the first occurrence.
last : Drop duplicates except for the last occurrence.
False : Drop all duplicates.

keep It can be set to three parameters , The default is first
first It means to keep the record of the first occurrence
last It means to keep the record of the last occurrence
False Delete all duplicates

inplace Parameters

inplace : boolean, default False
Whether to drop duplicates in place or to return a copy

inplace It can be set to True or False, The default is False
True It means to remove the weight in place , Will change dataframe
False Indicates that a new... Will be returned dataframe, It won't change the original variable

Example

import pandas as pd
data = pd.DataFrame([[1, 'Wang', 20], [2, 'Li', 20], [1, 'Wang', 21], [1, 'Wang', 20]], columns=['id', 'name', 'age'])

The data is

 id name age
0 1 Wang 20
1 2 Li 20
2 1 Wang 21
3 1 Wang 20

Obviously No 0 Article and paragraph 3 Duplicate records , Use the default usage to remove

print(data.drop_duplicates())

The result is

 id name age
0 1 Wang 20
1 2 Li 20
2 1 Wang 21

It is obvious that the first 0 Bar record , And go except for the first 3 Bar record , By setting keep Parameter is last Make it keep the last parameter

print(data.drop_duplicates(keep='last'))

The result is

 id name age
1 2 Li 20
2 1 Wang 21
3 1 Wang 20

And for datasets

 id name age
0 1 Wang 20
1 2 Li 20
2 1 Wang 21
3 1 Wang 20

Think id and name The same is repetition , have access to

print(data.drop_duplicates(['id', 'name']))

obtain

 id name age
0 1 Wang 20
1 2 Li 20

If you want to delete all duplicate data , Then use

print(data.drop_duplicates(['id', 'name'], keep=False))

obtain

 id name age
1 2 Li 20

  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved