今天為大家分析一個(gè)英國(guó)在線零售商的交易數(shù)據(jù)集
img著重為大家介紹一下如何運(yùn)用RFM模型對(duì)客戶進(jìn)行分類

并且解決如下問(wèn)題:
1.以星期為單位,周幾的銷售額最高?請(qǐng)可視化顯示。
2.利用 RFM 模型,對(duì) United Kingdom 的用戶進(jìn)行分類,分為如下所述的六類。
a) 選用合適的圖形,可視化展示每一類用戶數(shù)占總數(shù)的比例;可視化展示每一類用戶消費(fèi)額占總消費(fèi)額的比例。
b) 對(duì)每一類用戶,顯示其每個(gè)月的消費(fèi)總額的變化情況。(顯示在一個(gè) figure 中
以下是具體的代碼
import?pandas?as?pd
讀取數(shù)據(jù),并且對(duì)數(shù)據(jù)格式作出一些調(diào)整,方便后續(xù)分析
#?讀取并且查看數(shù)據(jù)
df?=?pd.read_csv("Online?Retail(1).csv")
df.head()
| InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country |
|---|
| 0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010/12/1 8:26 | 2.55 | 17850.0 | United Kingdom |
| 1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
| 2 | 536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 2010/12/1 8:26 | 2.75 | 17850.0 | United Kingdom |
| 3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
| 4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010/12/1 8:26 | 3.39 | 17850.0 | United Kingdom |
#?將時(shí)間數(shù)據(jù)設(shè)置為pandas的數(shù)據(jù)格式
df["InvoiceDate"]?=?pd.to_datetime(df["InvoiceDate"])
#?注:由于最近一次購(gòu)物(Recency)是針對(duì)某個(gè)時(shí)間點(diǎn)計(jì)算的,而最后訂貨日期是 2011-12-09,因此
#?我們把 2011-12-10?當(dāng)作今天,來(lái)計(jì)算 Recency。
df["target_time"]?=?pd.to_datetime("2011-12-10")
#?計(jì)算?Recency
df["Recency"]?=?pd.to_datetime(df["target_time"])?-?pd.to_datetime(df["InvoiceDate"])
#?計(jì)算每個(gè)訂單的總金額
df["total"]?=?df["Quantity"]?*?df["UnitPrice"]
df.head()
| InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | target_time | Recency | total |
|---|
| 0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010-12-01 08:26:00 | 2.55 | 17850.0 | United Kingdom | 2011-12-10 | 373 days 15:34:00 | 15.30 |
| 1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom | 2011-12-10 | 373 days 15:34:00 | 20.34 |
| 2 | 536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 2010-12-01 08:26:00 | 2.75 | 17850.0 | United Kingdom | 2011-12-10 | 373 days 15:34:00 | 22.00 |
| 3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom | 2011-12-10 | 373 days 15:34:00 | 20.34 |
| 4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom | 2011-12-10 | 373 days 15:34:00 | 20.34 |
設(shè)置分析所需的星期、月份等屬性,方便按此聚合
df["星期"]?=?df["InvoiceDate"].dt.dayofweek+1
df["月份"]?=?df["InvoiceDate"].dt.month
df.head()
| InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | target_time | Recency | total | 星期 | 月份 |
|---|
| 0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010-12-01 08:26:00 | 2.55 | 17850.0 | United Kingdom | 2011-12-10 | 373 days 15:34:00 | 15.30 | 3 | 12 |
| 1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom | 2011-12-10 | 373 days 15:34:00 | 20.34 | 3 | 12 |
| 2 | 536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 2010-12-01 08:26:00 | 2.75 | 17850.0 | United Kingdom | 2011-12-10 | 373 days 15:34:00 | 22.00 | 3 | 12 |
| 3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom | 2011-12-10 | 373 days 15:34:00 | 20.34 | 3 | 12 |
| 4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom | 2011-12-10 | 373 days 15:34:00 | 20.34 | 3 | 12 |
按星期聚合總銷售額
df_by_w?=?df.groupby("星期").agg({"total":"sum"}).reset_index()
df_by_w
| 星期 | total |
|---|
| 0 | 1 | 1588609.431 |
| 1 | 2 | 1966182.791 |
| 2 | 3 | 1734147.010 |
| 3 | 4 | 2112519.000 |
| 4 | 5 | 1540610.811 |
| 5 | 7 | 805678.891 |
from?pyecharts?import?options?as?opts
from?pyecharts.charts?import?Bar
bar?=?(
????Bar()
????.add_xaxis(df_by_w["星期"].to_list())
????.add_yaxis("銷售額",?df_by_w["total"].to_list())
????.set_global_opts(title_opts=opts.TitleOpts(title="銷售額對(duì)比圖"))
)
以星期為單位,周幾的銷售額最高?答案是周四
bar.render_notebook()
img利用 RFM 模型,對(duì) United Kingdom 的用戶進(jìn)行分類
按客戶ID聚合,對(duì)發(fā)票訂單號(hào)進(jìn)行計(jì)數(shù),對(duì)每個(gè)發(fā)票的銷售額進(jìn)行匯總,對(duì)recency取最小值(作為該顧客的recency值)
df_by_c?=?df.groupby("CustomerID").agg({"InvoiceNo":"count",?"total":"sum",?"Recency":"min"}).reset_index()
df_by_c.columns?=?["CustomerID","Frequency","Monetary","Recency"]
df_by_c.head()
| CustomerID | Frequency | Monetary | Recency |
|---|
| 0 | 12346.0 | 2 | 0.00 | 325 days 13:43:00 |
| 1 | 12347.0 | 182 | 4310.00 | 2 days 08:08:00 |
| 2 | 12348.0 | 31 | 1797.24 | 75 days 10:47:00 |
| 3 | 12349.0 | 73 | 1757.55 | 18 days 14:09:00 |
| 4 | 12350.0 | 17 | 334.40 | 310 days 07:59:00 |
求出各列的四分位數(shù)
df_by_c["Frequency"].quantile([0.25,0.5,0.75])
0.25?????17.0
0.50?????42.0
0.75????102.0
Name:?Frequency,?dtype:?float64
df_by_c["Monetary"].quantile([0.25,0.5,0.75])
0.25?????293.3625
0.50?????648.0750
0.75????1611.7250
Name:?Monetary,?dtype:?float64
將Recency這列的數(shù)據(jù)格式轉(zhuǎn)為整數(shù)(取day這個(gè)屬性就可以了)
df_by_c["Recency"]?=?df_by_c["Recency"].dt.days
df_by_c["Recency"].quantile([0.25,0.5,0.75])
0.25?????16.0
0.50?????50.0
0.75????143.0
Name:?Recency,?dtype:?float64
按照剛才求得的各項(xiàng)指標(biāo)的四分位數(shù),制定相應(yīng)的函數(shù),計(jì)算RFM的值
def?decide_F(score):
????if?score>102:
????????return?1
????elif?score>42:
????????return?2
????elif?score>17:
????????return?3
????else:
????????return?4
????
def?decide_M(score):
????if?score>1611.725:
????????return?1
????elif?score>648.075:
????????return?2
????elif?score>293.3625:
????????return?3
????else:
????????return?4
????
def?decide_R(score):
????if?score<16:
????????return?1
????elif?score<50:
????????return?2
????elif?score<143:
????????return?3
????else:
????????return?4
df_by_c["F"]?=?df_by_c.apply(lambda?x:?decide_F(x.Frequency),?axis?=?1)
df_by_c["M"]?=?df_by_c.apply(lambda?x:?decide_F(x.Monetary),?axis?=?1)
df_by_c["R"]?=?df_by_c.apply(lambda?x:?decide_R(x.Recency),?axis?=?1)
看一下求得的值
df_by_c.head()
| CustomerID | Frequency | Monetary | Recency | F | M | R |
|---|
| 0 | 12346.0 | 2 | 0.00 | 325 | 4 | 4 | 4 |
| 1 | 12347.0 | 182 | 4310.00 | 2 | 1 | 1 | 1 |
| 2 | 12348.0 | 31 | 1797.24 | 75 | 3 | 1 | 3 |
| 3 | 12349.0 | 73 | 1757.55 | 18 | 2 | 1 | 2 |
| 4 | 12350.0 | 17 | 334.40 | 310 | 4 | 1 | 4 |
根據(jù)定義,制定對(duì)客戶進(jìn)行類型劃分的函數(shù)
def?decide_type(f,?m,?r):
????if?f?==?1?and?m?==?1?and?r?==?1:
????????return?"最佳客戶"
????if?f?==?1?and?m?==?1?and?r?==?3:
????????return?"近流失客戶"
????if?f?==?1?and?m?==?1?and?r?==?4:
????????return?"流失客戶"
????if?f?==?4?and?m?==?4?and?r?==?4:
????????return?"流失廉價(jià)客戶"
????if?f?==?1:
????????return?"忠誠(chéng)客戶"
????if?m?==?1:
????????return?"大金主"
????????
????
df_by_c["客戶類型"]?=??df_by_c.apply(lambda?x:?decide_type(x.F,?x.M?,x.R),?axis?=?1)
df_by_c.head()
| CustomerID | Frequency | Monetary | Recency | F | M | R | 客戶類型 |
|---|
| 0 | 12346.0 | 2 | 0.00 | 325 | 4 | 4 | 4 | 流失廉價(jià)客戶 |
| 1 | 12347.0 | 182 | 4310.00 | 2 | 1 | 1 | 1 | 最佳客戶 |
| 2 | 12348.0 | 31 | 1797.24 | 75 | 3 | 1 | 3 | 大金主 |
| 3 | 12349.0 | 73 | 1757.55 | 18 | 2 | 1 | 2 | 大金主 |
| 4 | 12350.0 | 17 | 334.40 | 310 | 4 | 1 | 4 | 大金主 |
df_by_c_count?=?df_by_c.groupby("客戶類型").size().sort_values(ascending=False)
df_by_c_count
客戶類型
大金主???????3064
最佳客戶???????533
忠誠(chéng)客戶???????347
近流失客戶??????154
流失客戶????????47
流失廉價(jià)客戶??????39
dtype:?int64
datas?=?list(zip(df_by_c_count.index.to_list(),?df_by_c_count.to_list()))
datas
[('大金主',?3064),
?('最佳客戶',?533),
?('忠誠(chéng)客戶',?347),
?('近流失客戶',?154),
?('流失客戶',?47),
?('流失廉價(jià)客戶',?39)]
from?pyecharts?import?options?as?opts
from?pyecharts.charts?import?Pie
def?create_pie(datas,?title)?->?Pie:
????"""?創(chuàng)建餅圖對(duì)象
????文檔地址:https://pyecharts.org/#/zh-cn/basic_charts?id=pie%ef%bc%9a%e9%a5%bc%e5%9b%be
????@param?datas:?數(shù)據(jù),形式為[('類型1',?數(shù)據(jù)1),?('類型2',?數(shù)據(jù)2),?('類型3',?數(shù)據(jù)3)]
????@param?title:?圖表的標(biāo)題
????"""
????pie?=?Pie()
????pie.add("",?datas)
????pie.set_global_opts(
????????title_opts=opts.TitleOpts(title=title),
????????legend_opts=opts.LegendOpts(pos_right="right")
????)
????pie.set_series_opts(label_opts=opts.LabelOpts(formatter="{b}:?{c}:?go7utgvlrp%"))
????return?pie
a)可視化展示每一類用戶數(shù)占總數(shù)的比例
pie?=?create_pie(datas,?"餅圖-客戶類型對(duì)比")
pie.render_notebook()
img取出"CustomerID", ?"客戶類型"這兩列,準(zhǔn)備與df合并
df1?=?df_by_c[["CustomerID",??"客戶類型"]]
new?=?pd.merge(df,?df1,?on="CustomerID")
new.head()
| InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | target_time | Recency | total | 星期 | 月份 | 客戶類型 |
|---|
| 0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010-12-01 08:26:00 | 2.55 | 17850.0 | United Kingdom | 2011-12-10 | 373 days 15:34:00 | 15.30 | 3 | 12 | 流失客戶 |
| 1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom | 2011-12-10 | 373 days 15:34:00 | 20.34 | 3 | 12 | 流失客戶 |
| 2 | 536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 2010-12-01 08:26:00 | 2.75 | 17850.0 | United Kingdom | 2011-12-10 | 373 days 15:34:00 | 22.00 | 3 | 12 | 流失客戶 |
| 3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom | 2011-12-10 | 373 days 15:34:00 | 20.34 | 3 | 12 | 流失客戶 |
| 4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom | 2011-12-10 | 373 days 15:34:00 | 20.34 | 3 | 12 | 流失客戶 |
by_c_t?=?new.groupby("客戶類型")["total"].sum().sort_values(ascending=False)
by_c_t
客戶類型
最佳客戶??????4106703.970
大金主???????2472418.943
忠誠(chéng)客戶??????1270195.150
近流失客戶??????364926.870
流失客戶????????86803.711
流失廉價(jià)客戶?????-11011.550
Name:?total,?dtype:?float64
datas?=?list(zip(by_c_t.index.to_list(),?by_c_t.to_list()))
datas
[('最佳客戶',?4106703.970000173),
?('大金主',?2472418.9429999804),
?('忠誠(chéng)客戶',?1270195.1499999189),
?('近流失客戶',?364926.870000011),
?('流失客戶',?86803.71099999893),
?('流失廉價(jià)客戶',?-11011.549999999996)]
b) 可視化展示每一類用戶消費(fèi)額占總消費(fèi)額的比例
pie?=?create_pie(datas,?"餅圖-客戶銷售額對(duì)比")
pie.render_notebook()
img