三级片网站在线观看,神马午夜限制,黄色片视频,色老板视频凹凸精品视频,欧美日韩一级二级三级,婷婷操爱,一级片在线免费看,国产精品在线无码

Pandas提供了多個(gè)聚合函數(shù)，聚合函數(shù)可以快速、簡(jiǎn)潔地將多個(gè)函數(shù)的執(zhí)行結(jié)果聚合到一起。

本文介紹的聚合函數(shù)為DataFrame.aggregate()，別名DataFrame.agg()，aggregate()和agg()是同一個(gè)函數(shù)，僅名字不同。

agg()參數(shù)和用法介紹

agg(self, func=None, axis=0, *args, **kwargs):

func: 用于聚合數(shù)據(jù)的函數(shù)，如max()、mean()、count()等，函數(shù)必須滿足傳入一個(gè)DataFrame能正常使用，或傳遞到DataFrame.apply()中能正常使用。

func參數(shù)可以接收函數(shù)的名字、函數(shù)名的字符串、函數(shù)組成的列表、行/列標(biāo)簽和函數(shù)組成的字典。

axis: 設(shè)置按列還是按行聚合。設(shè)置為0或index，表示對(duì)每列應(yīng)用聚合函數(shù)，設(shè)置為1或columns，表示對(duì)每行應(yīng)用聚合函數(shù)。

*args: 傳遞給函數(shù)func的位置參數(shù)。

**kwargs: 傳遞給函數(shù)func的關(guān)鍵字參數(shù)。

返回的數(shù)據(jù)分為三種：scalar（標(biāo)量）、Series或DataFrame。

scalar: 當(dāng)Series.agg()聚合單個(gè)函數(shù)時(shí)返回標(biāo)量。

Series: 當(dāng)DataFrame.agg()聚合單個(gè)函數(shù)時(shí)，或Series.agg()聚合多個(gè)函數(shù)時(shí)返回Series。

DataFrame: 當(dāng)DataFrame.agg()聚合多個(gè)函數(shù)時(shí)返回DataFrame。

傳入單個(gè)參數(shù)

# coding=utf-8
import pandas as pd
import numpy as np

df = pd.DataFrame(
    {'Col-1': [1, 3, 5], 'Col-2': [2, 4, 6],
     'Col-3': [9, 8, 7], 'Col-4': [3, 6, 9]},
    index=['A', 'B', 'C'])
print(df)

   Col-1  Col-2  Col-3  Col-4
A      1      2      9      3
B      3      4      8      6
C      5      6      7      9

res1 = df.agg(np.mean)
print('-' * 30, '\n', res1, sep='')
res2 = df.mean()  # 調(diào)用Python內(nèi)置函數(shù)
print('-' * 30, '\n', res2, sep='')
res3 = df['Col-1'].agg(np.mean)
print('-' * 30, '\n', res3, sep='')

------------------------------
Col-1    3.0
Col-2    4.0
Col-3    8.0
Col-4    6.0
dtype: float64
------------------------------
Col-1    3.0
Col-2    4.0
Col-3    8.0
Col-4    6.0
dtype: float64
------------------------------
3.0

DataFrame應(yīng)用單個(gè)函數(shù)時(shí)，agg()的結(jié)果與用apply()的結(jié)果等效，用DataFrame調(diào)用Python的內(nèi)置函數(shù)也可以實(shí)現(xiàn)相同效果。

apply()詳解參考：Pandas知識(shí)點(diǎn)-詳解行列級(jí)批處理函數(shù)apply

Series對(duì)象在agg()中傳入單個(gè)函數(shù)，聚合結(jié)果為標(biāo)量值，也就是單個(gè)數(shù)據(jù)。

多種方式傳入函數(shù)func

# 用列表的方式傳入
res4 = df.agg([np.mean, np.max, np.sum])
print('-' * 30, '\n', res4, sep='')
# 用字典的方式傳入
res5 = df.agg({'Col-1': [sum, max], 'Col-2': [sum, min], 'Col-3': [max, min]})
print('-' * 30, '\n', res5, sep='')
# 函數(shù)名用字符串的方式傳入
res6 = df.agg({'Col-1': ['sum', 'max'], 'Col-2': ['sum', 'min'], 'Col-3': ['max', 'min']})
print('-' * 30, '\n', res6, sep='')

------------------------------
      Col-1  Col-2  Col-3  Col-4
mean    3.0    4.0    8.0    6.0
amax    5.0    6.0    9.0    9.0
sum     9.0   12.0   24.0   18.0
------------------------------
     Col-1  Col-2  Col-3
sum    9.0   12.0    NaN
max    5.0    NaN    9.0
min    NaN    2.0    7.0
------------------------------
     Col-1  Col-2  Col-3
sum    9.0   12.0    NaN
max    5.0    NaN    9.0
min    NaN    2.0    7.0

在agg()中，可以用列表的方式傳入多個(gè)函數(shù)，會(huì)將這些函數(shù)在每一列的執(zhí)行結(jié)果聚合到一個(gè)DataFrame中，結(jié)果DataFrame中的索引為對(duì)應(yīng)的函數(shù)名。

也可以用字典的方式按列/行指定聚合函數(shù)，會(huì)將指定列/行與對(duì)應(yīng)函數(shù)的執(zhí)行結(jié)果聚合到一個(gè)DataFrame中，列/行和函數(shù)沒(méi)有對(duì)應(yīng)關(guān)系的位置填充空值。

在上面的情況中，函數(shù)名都可以換成用字符串的方式傳入，結(jié)果一樣。

# 用元組的方式按列/行傳入函數(shù)
res7 = df.agg(X=('Col-1', 'sum'), Y=('Col-2', 'max'), Z=('Col-3', 'min'),)
print('-' * 30, '\n', res7, sep='')
res8 = df.agg(X=('Col-1', 'sum'), Y=('Col-2', 'max'), Zmin=('Col-3', 'min'), Zmax=('Col-3', 'max'))
print('-' * 30, '\n', res8, sep='')

------------------------------
   Col-1  Col-2  Col-3
X    9.0    NaN    NaN
Y    NaN    6.0    NaN
Z    NaN    NaN    7.0
------------------------------
      Col-1  Col-2  Col-3
X       9.0    NaN    NaN
Y       NaN    6.0    NaN
Zmin    NaN    NaN    7.0
Zmax    NaN    NaN    9.0

agg()還支持將不同的列/行和函數(shù)組合成元組，賦值給一個(gè)自定義的索引名，聚合結(jié)果DataFrame的索引按自定義的值重命名。

用這種方式傳入函數(shù)時(shí)，元組中只能有兩個(gè)元素：列/行名和一個(gè)函數(shù)，不能同時(shí)傳入多個(gè)函數(shù)，如果要對(duì)同一列/行執(zhí)行多個(gè)函數(shù)，需要用多個(gè)元組多次賦值。

傳入自定義函數(shù)和匿名函數(shù)

def fake_mean(s):
    return (s.max()+s.min())/2


res9 = df.agg([fake_mean, lambda x: x.mean()])
print('-' * 40, '\n', res9, sep='')
res10 = df.agg([fake_mean, lambda x: x.max(), lambda x: x.min()])
print('-' * 40, '\n', res10, sep='')

----------------------------------------
           Col-1  Col-2  Col-3  Col-4
fake_mean    3.0    4.0    8.0    6.0
<lambda>     3.0    4.0    8.0    6.0
----------------------------------------
           Col-1  Col-2  Col-3  Col-4
fake_mean    3.0    4.0    8.0    6.0
<lambda>     5.0    6.0    9.0    9.0
<lambda>     1.0    2.0    7.0    3.0

傳入自定義函數(shù)和匿名函數(shù)時(shí)，聚合結(jié)果中對(duì)應(yīng)的索引也是顯示函數(shù)名字，匿名函數(shù)顯示<lambda>，有多個(gè)匿名函數(shù)時(shí)，同時(shí)顯示<lambda>。

這里需要注意，只有匿名函數(shù)可以傳入重復(fù)的函數(shù)，自定義函數(shù)和內(nèi)置函數(shù)等不能重復(fù)，會(huì)報(bào)錯(cuò)SpecificationError: Function names must be unique if there is no new column names assigned。

自定義實(shí)現(xiàn)describe函數(shù)的效果

print(df.describe())

       Col-1  Col-2  Col-3  Col-4
count    3.0    3.0    3.0    3.0
mean     3.0    4.0    8.0    6.0
std      2.0    2.0    1.0    3.0
min      1.0    2.0    7.0    3.0
25%      2.0    3.0    7.5    4.5
50%      3.0    4.0    8.0    6.0
75%      4.0    5.0    8.5    7.5
max      5.0    6.0    9.0    9.0

describe()函數(shù)包含了數(shù)值個(gè)數(shù)、均值、標(biāo)準(zhǔn)差、最小值、1/4分位數(shù)、中位數(shù)、3/4分位數(shù)、最大值。

from functools import partial

# 20%分為數(shù)
per_20 = partial(pd.Series.quantile, q=0.2)
per_20.__name__ = '20%'
# 80%分為數(shù)
per_80 = partial(pd.Series.quantile, q=0.8)
per_80.__name__ = '80%'
res11 = df.agg([np.min, per_20, np.median, per_80, np.max])
print('-' * 40, '\n', res11, sep='')

        Col-1  Col-2  Col-3  Col-4
amin      1.0    2.0    7.0    3.0
20%       1.8    2.8    7.4    4.2
median    3.0    4.0    8.0    6.0
80%       4.2    5.2    8.6    7.8
amax      5.0    6.0    9.0    9.0

用agg()函數(shù)可以聚合實(shí)現(xiàn)describe()相同的效果，只要將函數(shù)組合在一起傳給agg()即可。所以我們可以根據(jù)自己的需要來(lái)增加或裁剪describe()中的內(nèi)容。

上面的例子中，pd.Series.quantile()是pandas中求分位數(shù)的函數(shù)，默認(rèn)是求中位數(shù)，指定q參數(shù)可以計(jì)算不同的分位數(shù)。

partial()是Python的functools內(nèi)置庫(kù)中的函數(shù)，作用是給傳入它的函數(shù)固定參數(shù)值，如上面分別固定quantile()的q參數(shù)為0.2/0.8。

分組聚合結(jié)合使用

# 先用groupby()分組再用agg()聚合
res12 = df.groupby('Col-1').agg([np.min, np.max])
print('-' * 40, '\n', res12, sep='')
# 分組后只聚合某一列
res13 = df.groupby('Col-1').agg({'Col-2': [np.min, np.mean, np.max]})
print('-' * 40, '\n', res13, sep='')

----------------------------------------
      Col-2      Col-3      Col-4     
       amin amax  amin amax  amin amax
Col-1                                 
1         2    2     9    9     3    3
3         4    4     8    8     6    6
5         6    6     7    7     9    9
----------------------------------------
      Col-2          
       amin mean amax
Col-1                
1         2  2.0    2
3         4  4.0    4
5         6  6.0    6

agg()經(jīng)常接在分組函數(shù)groupby()的后面使用，先分組再聚合，分組之后可以對(duì)所有組聚合，也可以只聚合需要聚合的組。

groupby()詳解參考：Pandas知識(shí)點(diǎn)-詳解分組函數(shù)groupby

res14 = df.groupby('Col-1').agg(
    c2_min=pd.NamedAgg(column='Col-2', aggfunc='min'),
    c3_min=pd.NamedAgg(column='Col-3', aggfunc='min'),
    c2_sum=pd.NamedAgg(column='Col-2', aggfunc='sum'),
    c3_sum=pd.NamedAgg(column='Col-3', aggfunc='sum'),
    c4_sum=pd.NamedAgg(column='Col-4', aggfunc='sum')
)
print('-' * 40, '\n', res14, sep='')

----------------------------------------
       c2_min  c3_min  c2_sum  c3_sum  c4_sum
Col-1                                        
1           2       9       2       9       3
3           4       8       4       8       6
5           6       7       6       7       9

pd.NamedAgg可以對(duì)聚合進(jìn)行更精準(zhǔn)的定義，它包含column和aggfunc兩個(gè)定制化的字段，column設(shè)置用于聚合的列，aggfunc設(shè)置用于聚合的函數(shù)。

借助pd.NamedAgg，可以給column和aggfunc的組合自定義命名，自定義命名體現(xiàn)為聚合結(jié)果中的列名。

以上就是pandas中聚合函數(shù)agg()的用法介紹和分析，如果本文的內(nèi)容對(duì)你有幫助，歡迎點(diǎn)贊、在看、收藏，也可以關(guān)注和聯(lián)系我一起交流討論。

參考文檔：

[1] pandas中文網(wǎng)：https://www.pypandas.cn/docs/

Pandas知識(shí)點(diǎn)-詳解聚合函數(shù)agg