Pandas知識(shí)點(diǎn)-詳解聚合函數(shù)agg
Pandas提供了多個(gè)聚合函數(shù),聚合函數(shù)可以快速、簡(jiǎn)潔地將多個(gè)函數(shù)的執(zhí)行結(jié)果聚合到一起。
agg()參數(shù)和用法介紹
agg(self, func=None, axis=0, *args, **kwargs):
func: 用于聚合數(shù)據(jù)的函數(shù),如max()、mean()、count()等,函數(shù)必須滿足傳入一個(gè)DataFrame能正常使用,或傳遞到DataFrame.apply()中能正常使用。
axis: 設(shè)置按列還是按行聚合。設(shè)置為0或index,表示對(duì)每列應(yīng)用聚合函數(shù),設(shè)置為1或columns,表示對(duì)每行應(yīng)用聚合函數(shù)。
*args: 傳遞給函數(shù)func的位置參數(shù)。
**kwargs: 傳遞給函數(shù)func的關(guān)鍵字參數(shù)。
返回的數(shù)據(jù)分為三種:scalar(標(biāo)量)、Series或DataFrame。
scalar: 當(dāng)Series.agg()聚合單個(gè)函數(shù)時(shí)返回標(biāo)量。
Series: 當(dāng)DataFrame.agg()聚合單個(gè)函數(shù)時(shí),或Series.agg()聚合多個(gè)函數(shù)時(shí)返回Series。
DataFrame: 當(dāng)DataFrame.agg()聚合多個(gè)函數(shù)時(shí)返回DataFrame。
傳入單個(gè)參數(shù)
# coding=utf-8
import pandas as pd
import numpy as np
df = pd.DataFrame(
{'Col-1': [1, 3, 5], 'Col-2': [2, 4, 6],
'Col-3': [9, 8, 7], 'Col-4': [3, 6, 9]},
index=['A', 'B', 'C'])
print(df)
Col-1 Col-2 Col-3 Col-4
A 1 2 9 3
B 3 4 8 6
C 5 6 7 9res1 = df.agg(np.mean)
print('-' * 30, '\n', res1, sep='')
res2 = df.mean() # 調(diào)用Python內(nèi)置函數(shù)
print('-' * 30, '\n', res2, sep='')
res3 = df['Col-1'].agg(np.mean)
print('-' * 30, '\n', res3, sep='')
------------------------------
Col-1 3.0
Col-2 4.0
Col-3 8.0
Col-4 6.0
dtype: float64
------------------------------
Col-1 3.0
Col-2 4.0
Col-3 8.0
Col-4 6.0
dtype: float64
------------------------------
3.0多種方式傳入函數(shù)func
# 用列表的方式傳入
res4 = df.agg([np.mean, np.max, np.sum])
print('-' * 30, '\n', res4, sep='')
# 用字典的方式傳入
res5 = df.agg({'Col-1': [sum, max], 'Col-2': [sum, min], 'Col-3': [max, min]})
print('-' * 30, '\n', res5, sep='')
# 函數(shù)名用字符串的方式傳入
res6 = df.agg({'Col-1': ['sum', 'max'], 'Col-2': ['sum', 'min'], 'Col-3': ['max', 'min']})
print('-' * 30, '\n', res6, sep='')
------------------------------
Col-1 Col-2 Col-3 Col-4
mean 3.0 4.0 8.0 6.0
amax 5.0 6.0 9.0 9.0
sum 9.0 12.0 24.0 18.0
------------------------------
Col-1 Col-2 Col-3
sum 9.0 12.0 NaN
max 5.0 NaN 9.0
min NaN 2.0 7.0
------------------------------
Col-1 Col-2 Col-3
sum 9.0 12.0 NaN
max 5.0 NaN 9.0
min NaN 2.0 7.0# 用元組的方式按列/行傳入函數(shù)
res7 = df.agg(X=('Col-1', 'sum'), Y=('Col-2', 'max'), Z=('Col-3', 'min'),)
print('-' * 30, '\n', res7, sep='')
res8 = df.agg(X=('Col-1', 'sum'), Y=('Col-2', 'max'), Zmin=('Col-3', 'min'), Zmax=('Col-3', 'max'))
print('-' * 30, '\n', res8, sep='')
------------------------------
Col-1 Col-2 Col-3
X 9.0 NaN NaN
Y NaN 6.0 NaN
Z NaN NaN 7.0
------------------------------
Col-1 Col-2 Col-3
X 9.0 NaN NaN
Y NaN 6.0 NaN
Zmin NaN NaN 7.0
Zmax NaN NaN 9.0傳入自定義函數(shù)和匿名函數(shù)
def fake_mean(s):
return (s.max()+s.min())/2
res9 = df.agg([fake_mean, lambda x: x.mean()])
print('-' * 40, '\n', res9, sep='')
res10 = df.agg([fake_mean, lambda x: x.max(), lambda x: x.min()])
print('-' * 40, '\n', res10, sep='')
----------------------------------------
Col-1 Col-2 Col-3 Col-4
fake_mean 3.0 4.0 8.0 6.0
<lambda> 3.0 4.0 8.0 6.0
----------------------------------------
Col-1 Col-2 Col-3 Col-4
fake_mean 3.0 4.0 8.0 6.0
<lambda> 5.0 6.0 9.0 9.0
<lambda> 1.0 2.0 7.0 3.0自定義實(shí)現(xiàn)describe函數(shù)的效果
print(df.describe())
Col-1 Col-2 Col-3 Col-4
count 3.0 3.0 3.0 3.0
mean 3.0 4.0 8.0 6.0
std 2.0 2.0 1.0 3.0
min 1.0 2.0 7.0 3.0
25% 2.0 3.0 7.5 4.5
50% 3.0 4.0 8.0 6.0
75% 4.0 5.0 8.5 7.5
max 5.0 6.0 9.0 9.0from functools import partial
# 20%分為數(shù)
per_20 = partial(pd.Series.quantile, q=0.2)
per_20.__name__ = '20%'
# 80%分為數(shù)
per_80 = partial(pd.Series.quantile, q=0.8)
per_80.__name__ = '80%'
res11 = df.agg([np.min, per_20, np.median, per_80, np.max])
print('-' * 40, '\n', res11, sep='')
Col-1 Col-2 Col-3 Col-4
amin 1.0 2.0 7.0 3.0
20% 1.8 2.8 7.4 4.2
median 3.0 4.0 8.0 6.0
80% 4.2 5.2 8.6 7.8
amax 5.0 6.0 9.0 9.0分組聚合結(jié)合使用
# 先用groupby()分組再用agg()聚合
res12 = df.groupby('Col-1').agg([np.min, np.max])
print('-' * 40, '\n', res12, sep='')
# 分組后只聚合某一列
res13 = df.groupby('Col-1').agg({'Col-2': [np.min, np.mean, np.max]})
print('-' * 40, '\n', res13, sep='')
----------------------------------------
Col-2 Col-3 Col-4
amin amax amin amax amin amax
Col-1
1 2 2 9 9 3 3
3 4 4 8 8 6 6
5 6 6 7 7 9 9
----------------------------------------
Col-2
amin mean amax
Col-1
1 2 2.0 2
3 4 4.0 4
5 6 6.0 6res14 = df.groupby('Col-1').agg(
c2_min=pd.NamedAgg(column='Col-2', aggfunc='min'),
c3_min=pd.NamedAgg(column='Col-3', aggfunc='min'),
c2_sum=pd.NamedAgg(column='Col-2', aggfunc='sum'),
c3_sum=pd.NamedAgg(column='Col-3', aggfunc='sum'),
c4_sum=pd.NamedAgg(column='Col-4', aggfunc='sum')
)
print('-' * 40, '\n', res14, sep='')
----------------------------------------
c2_min c3_min c2_sum c3_sum c4_sum
Col-1
1 2 9 2 9 3
3 4 8 4 8 6
5 6 7 6 7 9參考文檔:
[1] pandas中文網(wǎng):https://www.pypandas.cn/docs/
