【Python】20個Pandas數(shù)據(jù)實戰(zhàn)案例,干貨多多
pandas當中的數(shù)據(jù)過濾內(nèi)容,小編之前也寫過也一篇相類似的文章,但是是基于文本數(shù)據(jù)的過濾,大家有興趣也可以去查閱一下。import?pandas?as?pd
df?=?pd.DataFrame({
????"name":?["John","Jane","Emily","Lisa","Matt"],
????"note":?[92,94,87,82,90],
????"profession":["Electrical?engineer","Mechanical?engineer",
??????????????????"Data?scientist","Accountant","Athlete"],
????"date_of_birth":["1998-11-01","2002-08-14","1996-01-12",
?????????????????????"2002-10-24","2004-04-05"],
????"group":["A","B","B","A","C"]
})
output
????name??note???????????profession?date_of_birth?group
0???John????92??Electrical?engineer????1998-11-01?????A
1???Jane????94??Mechanical?engineer????2002-08-14?????B
2??Emily????87???????Data?scientist????1996-01-12?????B
3???Lisa????82???????????Accountant????2002-10-24?????A
4???Matt????90??????????????Athlete????2004-04-05?????C
篩選表格中的若干列
代碼如下
df[["name","note"]]
output
????name??note
0???John????92
1???Jane????94
2??Emily????87
3???Lisa????82
4???Matt????90
再篩選出若干行
我們基于上面搜索出的結(jié)果之上,再篩選出若干行,代碼如下
df.loc[:3,?["name","note"]]
output
????name??note
0???John????92
1???Jane????94
2??Emily????87
3???Lisa????82
根據(jù)索引來過濾數(shù)據(jù)
這里我們用到的是iloc方法,代碼如下
df.iloc[:3,?2]
output
0????Electrical?engineer
1????Mechanical?engineer
2?????????Data?scientist
通過比較運算符來篩選數(shù)據(jù)
df[df.note?>?90]
output
???name??note???????????profession?date_of_birth?group
0??John????92??Electrical?engineer????1998-11-01?????A
1??Jane????94??Mechanical?engineer????2002-08-14?????B
dt屬性接口
dt屬性接口是用于處理時間類型的數(shù)據(jù)的,當然首先我們需要將字符串類型的數(shù)據(jù),或者其他類型的數(shù)據(jù)轉(zhuǎn)換成事件類型的數(shù)據(jù),然后再處理,代碼如下df.date_of_birth?=?df.date_of_birth.astype("datetime64[ns]")
df[df.date_of_birth.dt.month==11]
output
???name??note???????????profession?date_of_birth?group
0??John????92??Electrical?engineer????1998-11-01?????A
或者我們也可以
df[df.date_of_birth.dt.year?>?2000]
output
???name??note???????????profession?date_of_birth?group
1??Jane????94??Mechanical?engineer????2002-08-14?????B
3??Lisa????82???????????Accountant????2002-10-24?????A
4??Matt????90??????????????Athlete????2004-04-05?????C
多個條件交集過濾數(shù)據(jù)
df[(df.date_of_birth.dt.year?>?2000)?&??
???(df.profession.str.contains("engineer"))]
output
???name??note???????????profession?date_of_birth?group
1??Jane????94??Mechanical?engineer????2002-08-14?????B
多個條件并集篩選數(shù)據(jù)
當多個條件是以并集的方式來過濾數(shù)據(jù)的時候,代碼如下
df[(df.note?>?90)?|?(df.profession=="Data?scientist")]
output
????name??note???????????profession?date_of_birth?group
0???John????92??Electrical?engineer????1998-11-01?????A
1???Jane????94??Mechanical?engineer????2002-08-14?????B
2??Emily????87???????Data?scientist????1996-01-12?????B
Query方法過濾數(shù)據(jù)
Pandas當中的query方法也可以對數(shù)據(jù)進行過濾,我們將過濾的條件輸入
df.query("note?>?90")
output
???name??note???????????profession?date_of_birth?group
0??John????92??Electrical?engineer????1998-11-01?????A
1??Jane????94??Mechanical?engineer????2002-08-14?????B
又或者是
df.query("group=='A'?and?note?>?89")
output
???name??note???????????profession?date_of_birth?group
0??John????92??Electrical?engineer????1998-11-01?????A
nsmallest方法過濾數(shù)據(jù)
pandas當中的nsmallest以及nlargest方法是用來找到數(shù)據(jù)集當中最大、最小的若干數(shù)據(jù),代碼如下df.nsmallest(2,?"note")
output
????name??note??????profession?date_of_birth?group
3???Lisa????82??????Accountant????2002-10-24?????A
2??Emily????87??Data?scientist????1996-01-12?????B
df.nlargest(2,?"note")
output
???name??note???????????profession?date_of_birth?group
1??Jane????94??Mechanical?engineer????2002-08-14?????B
0??John????92??Electrical?engineer????1998-11-01?????A
isna()方法
isna()方法功能在于過濾出那些是空值的數(shù)據(jù),首先我們將表格當中的某些數(shù)據(jù)設置成空值df.loc[0,?"profession"]?=?np.nan
df[df.profession.isna()]
output
???name??note?profession?date_of_birth?group
0??John????92????????NaN????1998-11-01?????A
notna()方法
notna()方法上面的isna()方法正好相反的功能在于過濾出那些不是空值的數(shù)據(jù),代碼如下df[df.profession.notna()]
output
????name??note???????????profession?date_of_birth?group
1???Jane????94??Mechanical?engineer????2002-08-14?????B
2??Emily????87???????Data?scientist????1996-01-12?????B
3???Lisa????82???????????Accountant????2002-10-24?????A
4???Matt????90??????????????Athlete????2004-04-05?????C
assign方法
pandas當中的assign方法作用是直接向數(shù)據(jù)集當中來添加一列
df_1?=?df.assign(score=np.random.randint(0,100,size=5))
df_1
output
????name??note???????????profession?date_of_birth?group??score
0???John????92??Electrical?engineer????1998-11-01?????A?????19
1???Jane????94??Mechanical?engineer????2002-08-14?????B?????84
2??Emily????87???????Data?scientist????1996-01-12?????B?????68
3???Lisa????82???????????Accountant????2002-10-24?????A?????70
4???Matt????90??????????????Athlete????2004-04-05?????C?????39
explode方法
explode()方法直譯的話,是爆炸的意思,我們經(jīng)常會遇到這樣的數(shù)據(jù)集
??Name????????????Hobby
0???呂布??[打籃球,?玩游戲,?喝奶茶]
1???貂蟬???????[敲代碼,?看電影]
2???趙云????????[聽音樂,?健身]
Hobby列當中的每行數(shù)據(jù)都以列表的形式集中到了一起,而explode()方法則是將這些集中到一起的數(shù)據(jù)拆開來,代碼如下?Name?Hobby
0???呂布???打籃球
0???呂布???玩游戲
0???呂布???喝奶茶
1???貂蟬???敲代碼
1???貂蟬???看電影
2???趙云???聽音樂
2???趙云????健身
當然我們會展開來之后,數(shù)據(jù)會存在重復的情況,
df.explode('Hobby').drop_duplicates().reset_index(drop=True)
output
?Name?Hobby
0???呂布???打籃球
1???呂布???玩游戲
2???呂布???喝奶茶
3???貂蟬???敲代碼
4???貂蟬???看電影
5???趙云???聽音樂
6???趙云????健身
往期精彩回顧
適合初學者入門人工智能的路線及資料下載 (圖文+視頻)機器學習入門系列下載 中國大學慕課《機器學習》(黃海廣主講) 機器學習及深度學習筆記等資料打印 《統(tǒng)計學習方法》的代碼復現(xiàn)專輯 AI基礎下載 機器學習交流qq群955171419,加入微信群請掃碼:
評論
圖片
表情
