程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python data processing dataframe sorting and ranking

編輯:Python

pandas Of DataFrame It greatly simplifies some cumbersome operations in the process of data analysis , It is a tabular data structure , Each column represents a variable , Each line is a record . In a nutshell ,DataFrame It's sharing the same index Of Series Set .

DataFrame The sorting of data is divided into three categories :

  1. Sort index sort_index();
  2. Sort values sort_values();
  3. Is to rank values rank().

(1)、 Index ranking


For index sorting , It involves sorting the row index and the column index in ascending or descending order df.sort_index(axis= , ascending= , inplace=), These three parameters require special attention .axis Indicates the index sorting of rows , Or sort the index of the column ;ascending Expressing ascending order , Or descending operation .


dates = ['2022-01-01','2022-09-02','2022-01-03','2022-01-04','2022-01-05','2022-01-06']
dates=pd.to_datetime(dates)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
# By default, the row is sorted in ascending order according to the index of the row
df.sort_index()
Out[48]:
A B C D
2022-01-01 0.096360 0.390240 -1.272864 -0.248937
2022-01-03 2.085985 -1.026902 0.089471 0.049253
2022-01-04 0.459439 1.356780 -0.327171 0.735977
2022-01-05 0.625936 -1.434436 0.289198 -1.308614
2022-01-06 0.306561 -0.718824 -1.639355 -0.287135
2022-09-02 0.043364 2.206094 0.853971 2.067719
# Sort by column index in descending order
df.sort_index(axis=1,ascending=False)
Out[49]:
D C B A
2022-01-01 -0.248937 -1.272864 0.390240 0.096360
2022-09-02 2.067719 0.853971 2.206094 0.043364
2022-01-03 0.049253 0.089471 -1.026902 2.085985
2022-01-04 0.735977 -0.327171 1.356780 0.459439
2022-01-05 -1.308614 0.289198 -1.434436 0.625936
2022-01-06 -0.287135 -1.639355 -0.718824 0.306561

(2)、 Sort values


For value ranking , Using functions df.sort_values(by= , axix=,ascending= , inplace=,na_postion=).
Be careful :axis=0 Indicates ranking by row ,axis=1 Means ranking by column , Default 0;ascending=True Expressing ascending order ,ascending=False Representation of descending order , Default True.na_position The parameter is used to set the display position of the missing value ,first Indicates that the missing value is displayed first ;last Indicates that the missing value is displayed at the end

df = pd.DataFrame(np.random.randn(6,4),index=['a','b','c','d','e','f'],columns=list('ABCD'))
# Sort the values of a single column in descending order , Other columns do not participate in sorting .
df.sort_values(by='A',axis=0,ascending=False,inplace=True)
# Sort the values of a single row in descending order , Other rows do not participate in sorting
f.sort_values(by='b',axis=1,ascending=False,inplace=True)
# Sort the values of multiple columns in descending order
df.sort_values(by='A',axis=0,ascending=False,inplace=True)
df.loc['g','A']=1
df.sort_values(by='A',axis=0,ascending=False,inplace=True,na_position="first")

(3)、 Rank by value

        For value ranking , Using functions rank(axis= , ascending= ,method,na_position=),.
Be careful :axis=0 Indicates ranking by row ,axis=1 Means ranking by column , Default 0;ascending=True Expressing ascending order ,ascending=False Representation of descending order , Default True.method='average' Is when the values are equal , Take the average sort as the ranking ,method='min' When the values are equal , Take the smallest sort as the ranking ,method='max' When the values are equal , Take the largest sort as the ranking ,method='first' When the values are equal , Natural position the former is in front ,method='dense' When the values are equal , Take the smallest sort as the ranking , And the follow-up ranking only follows +1.

df['A']
Out[74]:
a 0.0
b 1.0
c 1.0
d 1.0
e 1.0
f 1.0
g 3.0
h 3.0
df['A'].rank()
Out[75]:
a 1.0
b 4.0
c 4.0
d 4.0
e 4.0
f 4.0
g 7.5
h 7.5
df['A'].rank(method='first')
Out[76]:
a 1.0
b 2.0
c 3.0
d 4.0
e 5.0
f 6.0
g 7.0
h 8.0
df['A'].rank(method='average')
Out[77]:
a 1.0
b 4.0
c 4.0
d 4.0
e 4.0
f 4.0
g 7.5
h 7.5
df['A'].rank(method='min')
Out[78]:
a 1.0
b 2.0
c 2.0
d 2.0
e 2.0
f 2.0
g 7.0
h 7.0
df['A'].rank(method='max')
Out[79]:
a 1.0
b 6.0
c 6.0
d 6.0
e 6.0
f 6.0
g 8.0
h 8.0
df['A'].rank(method='dense')
Out[80]:
a 1.0
b 2.0
c 2.0
d 2.0
e 2.0
f 2.0
g 3.0
h 3.0

  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved