劃重點!12種高效Numpy & Pandas使用技巧!
點擊 i小馬哥,選擇加星標
精彩內容不迷路

選自TowardsDataScience,機器之心編譯

x?=?np.array([12,?10,?12,?0,?6,?8,?9,?1,?16,?4,?6,?0])index_val?=?np.argpartition(x,?-4)[-4:]
index_val
array([1,?8,?2,?0],?dtype=int64)np.sort(x[index_val])
array([10,?12,?12,?16])
array1?=?np.array([0.12,0.17,0.24,0.29])
array2?=?np.array([0.13,0.19,0.26,0.31])#?with?a?tolerance?of?0.1,?it?should?return?False:
np.allclose(array1,array2,0.1)
False#?with?a?tolerance?of?0.2,?it?should?return?True:
np.allclose(array1,array2,0.2)
True
x?=?np.array([3,?17,?14,?23,?2,?2,?6,?8,?1,?2,?16,?0])np.clip(x,2,5)
array([3,?5,?5,?5,?2,?2,?5,?5,?2,?2,?5,?2])
#?Random?integers
array?=?np.random.randint(20,?size=12)
array
array([?0,??1,??8,?19,?16,?18,?10,?11,??2,?13,?14,??3])#??Divide?by?2?and?check?if?remainder?is?1
cond?=?np.mod(array,?2)==1
cond
array([False,??True,?False,??True,?False,?False,?False,??True,?False,?True,?False,??True])#?Use?extract?to?get?the?values
np.extract(cond,?array)
array([?1,?19,?11,?13,??3])#?Apply?condition?on?extract?directly
np.extract(((array?3)?|?(array?>?15)),?array)
array([?0,??1,?19,?16,?18,??2])
y?=?np.array([1,5,6,8,1,7,3,6,9])#?Where?y?is?greater?than?5,?returns?index?position
np.where(y>5)
array([2,?3,?5,?7,?8],?dtype=int64),)#?First?will?replace?the?values?that?match?the?condition,?
#?second?will?replace?the?values?that?does?not
np.where(y>5,?"Hit",?"Miss")
array([ Miss ,? Miss ,? Hit ,? Hit ,? Miss ,? Hit ,? Miss ,? Hit ,? Hit ],dtype=)
a?=?np.array([1,5,6,8,1,7,3,6,9])print("50th?Percentile?of?a,?axis?=?0?:?",??
??????np.percentile(a,?50,?axis?=0))
50th?Percentile?of?a,?axis?=?0?:??6.0b?=?np.array([[10,?7,?4],?[3,?2,?1]])print("30th?Percentile?of?b,?axis?=?0?:?",??
??????np.percentile(b,?30,?axis?=0))
30th?Percentile?of?b,?axis?=?0?:??[5.1?3.5?1.9]

具有異構類型列的表格數(shù)據(jù),如 SQL 表或 Excel 表;
有序和無序 (不一定是固定頻率) 的時間序列數(shù)據(jù);
帶有行/列標簽的任意矩陣數(shù)據(jù)(同構類型或者是異構類型);
其他任意形式的統(tǒng)計數(shù)據(jù)集。事實上,數(shù)據(jù)根本不需要標記就可以放入 Pandas 結構中。
容易處理浮點數(shù)據(jù)和非浮點數(shù)據(jù)中的 缺失數(shù)據(jù)(用 NaN 表示);
大小可調整性: 可以從 DataFrame 或者更高維度的對象中插入或者是刪除列;
顯式數(shù)據(jù)可自動對齊: 對象可以顯式地對齊至一組標簽內,或者用戶可以簡單地選擇忽略標簽,使 Series、 DataFrame 等自動對齊數(shù)據(jù);
靈活的分組功能,對數(shù)據(jù)集執(zhí)行拆分-應用-合并等操作,對數(shù)據(jù)進行聚合和轉換;
簡化將數(shù)據(jù)轉換為 DataFrame 對象的過程,而這些數(shù)據(jù)基本是 Python 和 NumPy 數(shù)據(jù)結構中不規(guī)則、不同索引的數(shù)據(jù);
基于標簽的智能切片、索引以及面向大型數(shù)據(jù)集的子設定;
更加直觀地合并以及連接數(shù)據(jù)集;
更加靈活地重塑、轉置(pivot)數(shù)據(jù)集;
軸的分級標記 (可能包含多個標記);
具有魯棒性的 IO 工具,用于從平面文件 (CSV 和 delimited)、 Excel 文件、數(shù)據(jù)庫中加在數(shù)據(jù),以及從 HDF5 格式中保存 / 加載數(shù)據(jù);
時間序列的特定功能: 數(shù)據(jù)范圍的生成以及頻率轉換、移動窗口統(tǒng)計、數(shù)據(jù)移動和滯后等。
import?io
import?requests#?I?am?using?this?online?data?set?just?to?make?things?easier?for?you?guys
url?=?"https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/datasets/AirPassengers.csv"
s?=?requests.get(url).content#?read?only?first?10?rows
df?=?pd.read_csv(io.StringIO(s.decode( utf-8 )),nrows=10?,?index_col=0)
#?create?a?dataframe
dframe?=?pd.DataFrame(np.random.randn(4,?3),?columns=list( bde ),?index=[ India ,? USA ,? China ,? Russia ])#compute?a?formatted?string?from?each?floating?point?value?in?frame
changefn?=?lambda?x:? %.2f ?%?x#?Make?changes?element-wise
dframe[ d ].map(changefn)
#?max?minus?mix?lambda?fn
fn?=?lambda?x:?x.max()?-?x.min()#?Apply?this?on?dframe?that?we ve?just?created?above
dframe.apply(fn)
#?Using?the?dataframe?we?created?for?read_csv
filter1?=?df["value"].isin([112])?
filter2?=?df["time"].isin([1949.000000])df?[filter1?&?filter2]
#?creating?sample?series?
data?=?pd.Series([ India ,? Pakistan ,? China ,? Mongolia ])#?Assigning?issue?that?we?face
data1=?data
#?Change?a?value
data1[0]= USA
#?Also?changes?value?in?old?dataframe
data#?To?prevent?that,?we?use
#?creating?copy?of?series?
new?=?data.copy()#?assigning?new?values?
new[1]= Changed?value #?printing?data?
print(new)?
print(data)
#?We ll?use?the?same?dataframe?that?we?used?for?read_csv
framex?=??df.select_dtypes(include="float64")#?Returns?only?time?column
#?Create?a?sample?dataframe
school?=?pd.DataFrame({ A :?[ Jay ,? Usher ,? Nicky ,? Romero ,? Will ],?
?????? B :?[ Masters ,? Graduate ,? Graduate ,? Masters ,? Graduate ],?
?????? C :?[26,?22,?20,?23,?24]})#?Lets?create?a?pivot?table?to?segregate?students?based?on?age?and?course
table?=?pd.pivot_table(school,?values?= A ,?index?=[ B ,? C ],?
?????????????????????????columns?=[ B ],?aggfunc?=?np.sum,?fill_value="Not?Available")?
table
文末福利
后臺回復關鍵字:破解,獲取Pycharm 破解版,親測有效哦 后臺回復關鍵字:自學,獲取一份精心整理的 5本 Python 經典用書 后臺回復關鍵字:國慶,獲取50本電子書。 后臺回復關鍵字:1109,獲取PYTHON進階書。
