<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          【Python】20個Pandas數(shù)據(jù)實戰(zhàn)案例,干貨多多

          共 5100字,需瀏覽 11分鐘

           ·

          2022-03-10 22:29

          今天我們講一下pandas當中的數(shù)據(jù)過濾內(nèi)容,小編之前也寫過也一篇相類似的文章,但是是基于文本數(shù)據(jù)的過濾,大家有興趣也可以去查閱一下。
          下面小編會給出大概20個案例來詳細說明數(shù)據(jù)過濾的方法,首先我們先建立要用到的數(shù)據(jù)集,代碼如下
          import?pandas?as?pd
          df?=?pd.DataFrame({
          ????"name":?["John","Jane","Emily","Lisa","Matt"],
          ????"note":?[92,94,87,82,90],
          ????"profession":["Electrical?engineer","Mechanical?engineer",
          ??????????????????"Data?scientist","Accountant","Athlete"],
          ????"date_of_birth":["1998-11-01","2002-08-14","1996-01-12",
          ?????????????????????"2002-10-24","2004-04-05"],
          ????"group":["A","B","B","A","C"]
          })

          output

          ????name??note???????????profession?date_of_birth?group
          0???John????92??Electrical?engineer????1998-11-01?????A
          1???Jane????94??Mechanical?engineer????2002-08-14?????B
          2??Emily????87???????Data?scientist????1996-01-12?????B
          3???Lisa????82???????????Accountant????2002-10-24?????A
          4???Matt????90??????????????Athlete????2004-04-05?????C

          篩選表格中的若干列

          代碼如下

          df[["name","note"]]

          output

          ????name??note
          0???John????92
          1???Jane????94
          2??Emily????87
          3???Lisa????82
          4???Matt????90

          再篩選出若干行

          我們基于上面搜索出的結(jié)果之上,再篩選出若干行,代碼如下

          df.loc[:3,?["name","note"]]

          output

          ????name??note
          0???John????92
          1???Jane????94
          2??Emily????87
          3???Lisa????82

          根據(jù)索引來過濾數(shù)據(jù)

          這里我們用到的是iloc方法,代碼如下

          df.iloc[:3,?2]

          output

          0????Electrical?engineer
          1????Mechanical?engineer
          2?????????Data?scientist

          通過比較運算符來篩選數(shù)據(jù)

          df[df.note?>?90]

          output

          ???name??note???????????profession?date_of_birth?group
          0??John????92??Electrical?engineer????1998-11-01?????A
          1??Jane????94??Mechanical?engineer????2002-08-14?????B

          dt屬性接口

          dt屬性接口是用于處理時間類型的數(shù)據(jù)的,當然首先我們需要將字符串類型的數(shù)據(jù),或者其他類型的數(shù)據(jù)轉(zhuǎn)換成事件類型的數(shù)據(jù),然后再處理,代碼如下
          df.date_of_birth?=?df.date_of_birth.astype("datetime64[ns]")
          df[df.date_of_birth.dt.month==11]

          output

          ???name??note???????????profession?date_of_birth?group
          0??John????92??Electrical?engineer????1998-11-01?????A

          或者我們也可以

          df[df.date_of_birth.dt.year?>?2000]

          output

          ???name??note???????????profession?date_of_birth?group
          1??Jane????94??Mechanical?engineer????2002-08-14?????B
          3??Lisa????82???????????Accountant????2002-10-24?????A
          4??Matt????90??????????????Athlete????2004-04-05?????C

          多個條件交集過濾數(shù)據(jù)

          當我們遇上多個條件,并且是交集的情況下過濾數(shù)據(jù)時,代碼應該這么來寫
          df[(df.date_of_birth.dt.year?>?2000)?&??
          ???(df.profession.str.contains("engineer"))]

          output

          ???name??note???????????profession?date_of_birth?group
          1??Jane????94??Mechanical?engineer????2002-08-14?????B

          多個條件并集篩選數(shù)據(jù)

          當多個條件是以并集的方式來過濾數(shù)據(jù)的時候,代碼如下

          df[(df.note?>?90)?|?(df.profession=="Data?scientist")]

          output

          ????name??note???????????profession?date_of_birth?group
          0???John????92??Electrical?engineer????1998-11-01?????A
          1???Jane????94??Mechanical?engineer????2002-08-14?????B
          2??Emily????87???????Data?scientist????1996-01-12?????B

          Query方法過濾數(shù)據(jù)

          Pandas當中的query方法也可以對數(shù)據(jù)進行過濾,我們將過濾的條件輸入

          df.query("note?>?90")

          output

          ???name??note???????????profession?date_of_birth?group
          0??John????92??Electrical?engineer????1998-11-01?????A
          1??Jane????94??Mechanical?engineer????2002-08-14?????B

          又或者是

          df.query("group=='A'?and?note?>?89")

          output

          ???name??note???????????profession?date_of_birth?group
          0??John????92??Electrical?engineer????1998-11-01?????A

          nsmallest方法過濾數(shù)據(jù)

          pandas當中的nsmallest以及nlargest方法是用來找到數(shù)據(jù)集當中最大、最小的若干數(shù)據(jù),代碼如下
          df.nsmallest(2,?"note")

          output

          ????name??note??????profession?date_of_birth?group
          3???Lisa????82??????Accountant????2002-10-24?????A
          2??Emily????87??Data?scientist????1996-01-12?????B
          df.nlargest(2,?"note")

          output

          ???name??note???????????profession?date_of_birth?group
          1??Jane????94??Mechanical?engineer????2002-08-14?????B
          0??John????92??Electrical?engineer????1998-11-01?????A

          isna()方法

          isna()方法功能在于過濾出那些是空值的數(shù)據(jù),首先我們將表格當中的某些數(shù)據(jù)設置成空值
          df.loc[0,?"profession"]?=?np.nan
          df[df.profession.isna()]

          output

          ???name??note?profession?date_of_birth?group
          0??John????92????????NaN????1998-11-01?????A

          notna()方法

          notna()方法上面的isna()方法正好相反的功能在于過濾出那些不是空值的數(shù)據(jù),代碼如下
          df[df.profession.notna()]

          output

          ????name??note???????????profession?date_of_birth?group
          1???Jane????94??Mechanical?engineer????2002-08-14?????B
          2??Emily????87???????Data?scientist????1996-01-12?????B
          3???Lisa????82???????????Accountant????2002-10-24?????A
          4???Matt????90??????????????Athlete????2004-04-05?????C

          assign方法

          pandas當中的assign方法作用是直接向數(shù)據(jù)集當中來添加一列

          df_1?=?df.assign(score=np.random.randint(0,100,size=5))
          df_1

          output

          ????name??note???????????profession?date_of_birth?group??score
          0???John????92??Electrical?engineer????1998-11-01?????A?????19
          1???Jane????94??Mechanical?engineer????2002-08-14?????B?????84
          2??Emily????87???????Data?scientist????1996-01-12?????B?????68
          3???Lisa????82???????????Accountant????2002-10-24?????A?????70
          4???Matt????90??????????????Athlete????2004-04-05?????C?????39

          explode方法

          explode()方法直譯的話,是爆炸的意思,我們經(jīng)常會遇到這樣的數(shù)據(jù)集

          ??Name????????????Hobby
          0???呂布??[打籃球,?玩游戲,?喝奶茶]
          1???貂蟬???????[敲代碼,?看電影]
          2???趙云????????[聽音樂,?健身]
          Hobby列當中的每行數(shù)據(jù)都以列表的形式集中到了一起,而explode()方法則是將這些集中到一起的數(shù)據(jù)拆開來,代碼如下
          ?Name?Hobby
          0???呂布???打籃球
          0???呂布???玩游戲
          0???呂布???喝奶茶
          1???貂蟬???敲代碼
          1???貂蟬???看電影
          2???趙云???聽音樂
          2???趙云????健身

          當然我們會展開來之后,數(shù)據(jù)會存在重復的情況,

          df.explode('Hobby').drop_duplicates().reset_index(drop=True)

          output

          ?Name?Hobby
          0???呂布???打籃球
          1???呂布???玩游戲
          2???呂布???喝奶茶
          3???貂蟬???敲代碼
          4???貂蟬???看電影
          5???趙云???聽音樂
          6???趙云????健身
          往期精彩回顧






          瀏覽 27
          點贊
          評論
          收藏
          分享

          手機掃一掃分享

          分享
          舉報
          評論
          圖片
          表情
          推薦
          點贊
          評論
          收藏
          分享

          手機掃一掃分享

          分享
          舉報
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  人人爱人人射 | 日日躁天天躁AAAAXxXX痛 | 黄偏网站在线观看 | 毛片内射 | 国产精品热久久 |