程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Pandas knowledge points - detailed aggregation function agg

編輯:Python

Pandas知識點-Explain aggregate functions in detailagg

PandasSeveral aggregate functions are provided,Aggregate functions can be fast、Concisely aggregate the execution results of multiple functions together.
The aggregate function described in this article is DataFrame.aggregate(),別名DataFrame.agg(),aggregate()和agg()是同一個函數,Only the names are different.
agg()Parameters and usage introduction
agg(self, func=None, axis=0, *args, **kwargs):

  • func: 用於聚合數據的函數,如max()、mean()、count()等,The function must satisfy passing in oneDataFrame能正常使用,or pass toDataFrame.apply()中能正常使用.
    funcThe parameter can receive the name of the function、A string of function names、函數組成的列表、行/A dictionary of column labels and functions.
  • axis: Set whether to aggregate by column or row.設置為0或index,Indicates that an aggregate function is applied to each column,設置為1或columns,Indicates that an aggregate function is applied to each row.
  • *args: 傳遞給函數func的位置參數.
  • **kwargs: 傳遞給函數func的關鍵字參數.

There are three types of data returned:scalar(標量)、Series或DataFrame.

  • scalar: 當Series.agg()Returns a scalar when aggregating a single function.
  • Series: 當DataFrame.agg()When aggregating a single function,或Series.agg()Returned when multiple functions are aggregatedSeries.
  • DataFrame: 當DataFrame.agg()Returned when multiple functions are aggregatedDataFrame.

傳入單個參數

# coding=utf-8
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
'Col-1': [1, 3, 5], 'Col-2': [2, 4, 6],
'Col-3': [9, 8, 7], 'Col-4': [3, 6, 9]},
index=['A', 'B', 'C'])
print(df)
 Col-1 Col-2 Col-3 Col-4
A 1 2 9 3
B 3 4 8 6
C 5 6 7 9
res1 = df.agg(np.mean)
print('-' * 30, '\n', res1, sep='')
res2 = df.mean() # 調用Python內置函數
print('-' * 30, '\n', res2, sep='')
res3 = df['Col-1'].agg(np.mean)
print('-' * 30, '\n', res3, sep='')
------------------------------
Col-1 3.0
Col-2 4.0
Col-3 8.0
Col-4 6.0
dtype: float64
------------------------------
Col-1 3.0
Col-2 4.0
Col-3 8.0
Col-4 6.0
dtype: float64
------------------------------
3.0

DataFrame應用單個函數時,agg()results and useapply()的結果等效,用DataFrame調用Python's built-in function can also achieve the same effect.
apply()詳解參考:Pandas知識點-詳解行列級批處理函數apply
Series對象在agg()A single function is passed in,The aggregate result is a scalar value,That is, a single data.
There are many ways to pass functionsfunc

# Pass in as a list
res4 = df.agg([np.mean, np.max, np.sum])
print('-' * 30, '\n', res4, sep='')
# Pass it in as a dictionary
res5 = df.agg({
'Col-1': [sum, max], 'Col-2': [sum, min], 'Col-3': [max, min]})
print('-' * 30, '\n', res5, sep='')
# The function name is passed in as a string
res6 = df.agg({
'Col-1': ['sum', 'max'], 'Col-2': ['sum', 'min'], 'Col-3': ['max', 'min']})
print('-' * 30, '\n', res6, sep='')
------------------------------
Col-1 Col-2 Col-3 Col-4
mean 3.0 4.0 8.0 6.0
amax 5.0 6.0 9.0 9.0
sum 9.0 12.0 24.0 18.0
------------------------------
Col-1 Col-2 Col-3
sum 9.0 12.0 NaN
max 5.0 NaN 9.0
min NaN 2.0 7.0
------------------------------
Col-1 Col-2 Col-3
sum 9.0 12.0 NaN
max 5.0 NaN 9.0
min NaN 2.0 7.0

在agg()中,Multiple functions can be passed in as a list,The execution results of these functions in each column will be aggregated into oneDataFrame中,結果DataFrameThe index in is the corresponding function name.
You can also use a dictionary by columns/The row specifies the aggregate function,will specify the column/Rows are aggregated into one with the execution result of the corresponding functionDataFrame中,列/Places where there is no correspondence between lines and functions are filled with null values.
在上面的情況中,The function name can be replaced with a string passed in,結果一樣.

# Column-wise as a tuple/Line passed to the function
res7 = df.agg(X=('Col-1', 'sum'), Y=('Col-2', 'max'), Z=('Col-3', 'min'),)
print('-' * 30, '\n', res7, sep='')
res8 = df.agg(X=('Col-1', 'sum'), Y=('Col-2', 'max'), Zmin=('Col-3', 'min'), Zmax=('Col-3', 'max'))
print('-' * 30, '\n', res8, sep='')
------------------------------
Col-1 Col-2 Col-3
X 9.0 NaN NaN
Y NaN 6.0 NaN
Z NaN NaN 7.0
------------------------------
Col-1 Col-2 Col-3
X 9.0 NaN NaN
Y NaN 6.0 NaN
Zmin NaN NaN 7.0
Zmax NaN NaN 9.0

agg()Also supports different columns/Rows and functions are combined into tuples,Assign to a custom index name,聚合結果DataFrameThe index is renamed according to the custom value.
When passing in a function this way,There can only be two elements in a tuple:列/row names and a function,Multiple functions cannot be passed in at the same time,If you want the same column/line executes multiple functions,Multiple assignments with multiple tuples are required.
Pass in custom functions and anonymous functions

def fake_mean(s):
return (s.max()+s.min())/2
res9 = df.agg([fake_mean, lambda x: x.mean()])
print('-' * 40, '\n', res9, sep='')
res10 = df.agg([fake_mean, lambda x: x.max(), lambda x: x.min()])
print('-' * 40, '\n', res10, sep='')
----------------------------------------
Col-1 Col-2 Col-3 Col-4
fake_mean 3.0 4.0 8.0 6.0
<lambda> 3.0 4.0 8.0 6.0
----------------------------------------
Col-1 Col-2 Col-3 Col-4
fake_mean 3.0 4.0 8.0 6.0
<lambda> 5.0 6.0 9.0 9.0
<lambda> 1.0 2.0 7.0 3.0

When passing in custom functions and anonymous functions,The corresponding index in the aggregation result also displays the function name,Anonymous functions are displayed<lambda>,When there are multiple anonymous functions,同時顯示<lambda>.
這裡需要注意,Only anonymous functions can pass duplicate functions,Custom functions and built-in functions, etc. cannot be repeated,會報錯SpecificationError: Function names must be unique if there is no new column names assigned.
自定義實現describe函數的效果

print(df.describe())
 Col-1 Col-2 Col-3 Col-4
count 3.0 3.0 3.0 3.0
mean 3.0 4.0 8.0 6.0
std 2.0 2.0 1.0 3.0
min 1.0 2.0 7.0 3.0
25% 2.0 3.0 7.5 4.5
50% 3.0 4.0 8.0 6.0
75% 4.0 5.0 8.5 7.5
max 5.0 6.0 9.0 9.0

describe()The function contains a number of values、均值、標准差、最小值、1/4分位數、中位數、3/4分位數、最大值.

from functools import partial
# 20%分為數
per_20 = partial(pd.Series.quantile, q=0.2)
per_20.__name__ = '20%'
# 80%分為數
per_80 = partial(pd.Series.quantile, q=0.8)
per_80.__name__ = '80%'
res11 = df.agg([np.min, per_20, np.median, per_80, np.max])
print('-' * 40, '\n', res11, sep='')
 Col-1 Col-2 Col-3 Col-4
amin 1.0 2.0 7.0 3.0
20% 1.8 2.8 7.4 4.2
median 3.0 4.0 8.0 6.0
80% 4.2 5.2 8.6 7.8
amax 5.0 6.0 9.0 9.0

用agg()Functions can be aggregateddescribe()相同的效果,Just group the functions together and pass them to agg()即可.So we can increase or crop according to our needsdescribe()中的內容.
上面的例子中,pd.Series.quantile()是pandasA function to find quantiles in ,The default is to find the median,指定qParameters can calculate different quantiles.
partial()是Python的functoolsFunctions in the built-in library,The role is to fix the parameter value to the function passed into it,fixed as abovequantile()的q參數為0.2/0.8.
Used in conjunction with grouped aggregations

# 先用groupby()Group reuseagg()聚合
res12 = df.groupby('Col-1').agg([np.min, np.max])
print('-' * 40, '\n', res12, sep='')
# Aggregate only one column after grouping
res13 = df.groupby('Col-1').agg({
'Col-2': [np.min, np.mean, np.max]})
print('-' * 40, '\n', res13, sep='')
----------------------------------------
Col-2 Col-3 Col-4
amin amax amin amax amin amax
Col-1
1 2 2 9 9 3 3
3 4 4 8 8 6 6
5 6 6 7 7 9 9
----------------------------------------
Col-2
amin mean amax
Col-1
1 2 2.0 2
3 4 4.0 4
5 6 6.0 6

agg()Often connected to grouping functionsgroupby()的後面使用,先分組再聚合,After grouping, all groups can be aggregated,It is also possible to aggregate only the groups that need to be aggregated.
groupby()詳解參考:Pandas知識點-詳解分組函數groupby

res14 = df.groupby('Col-1').agg(
c2_min=pd.NamedAgg(column='Col-2', aggfunc='min'),
c3_min=pd.NamedAgg(column='Col-3', aggfunc='min'),
c2_sum=pd.NamedAgg(column='Col-2', aggfunc='sum'),
c3_sum=pd.NamedAgg(column='Col-3', aggfunc='sum'),
c4_sum=pd.NamedAgg(column='Col-4', aggfunc='sum')
)
print('-' * 40, '\n', res14, sep='')
----------------------------------------
c2_min c3_min c2_sum c3_sum c4_sum
Col-1
1 2 9 2 9 3
3 4 8 4 8 6
5 6 7 6 7 9

pd.NamedAgg可以對聚合進行更精准的定義,它包含column和aggfunc兩個定制化的字段,columnSet the column used for aggregation,aggfuncSet the function to use for aggregation.
借助pd.NamedAgg,可以給column和aggfuncA combination of custom naming,Custom naming is reflected in the column names in the aggregated results.
以上就是pandas中聚合函數agg()Usage introduction and analysis,如果本文的內容對你有幫助,歡迎點贊、評論、收藏,也可以關注和聯系我一起交流討論.

參考文檔:
[1] pandas中文網:https://www.pypandas.cn/docs/


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved