最近常問候選人的一個案例

A/B對比:Mann–Whitney U test(曼-惠特尼U檢驗);
A/B/N對比:Kruskal-Wallis
是不是我們有95%的把握實驗組比對照組好啦?



每個活躍用戶付費 = 付費轉(zhuǎn)化率 * 付費用戶客單價

轉(zhuǎn)化率一般可以使用Beta分布作為轉(zhuǎn)化率的先驗分布,然后使用實驗數(shù)據(jù)更新Beta分布作為后驗分布,再使用抽樣的方法計算我們感興趣的提升概率。(最終還可以去看看我們的假設(shè)分布是否合理,如果不合理,還要改變先驗)
客單價使用Gamma分布作為客單價的先驗分布,使用實驗數(shù)據(jù)更新Gamma分布作為后驗分布,再使用抽樣的方法計算我們感興趣的提升概率

from scipy.stats import betafig,ax = plt.subplots(1, 1)#這里α、β取值都較小,其中conversion_rate是事前的估計值prior_alpha = round(conversion_rate, 2) + 0.1prior_beta = 0.1 + 1 - round(conversion_rate, 2)#假設(shè)轉(zhuǎn)化率的先驗分布prior = beta(prior_alpha, prior_beta)#看看分布 圖x = np.linspace(0,1,1000)ax.plot(x, prior.pdf(x), label=f'prior Beta({int(round(conversion_rate, 1)*) + 1}, {20 + 1 - int(round(conversion_rate, 1)*20)})')ax.set_xlabel('Conversion Probability')ax.set_ylabel('Density')ax.set_title('Chosen Prior')ax.legend()

results = test_data.groupby('test_group').agg({'imei': pd.Series.nunique, 'pay_amt': [np.count_nonzero ,np.sum]})results.columns = ['sampleSize','converted','pay_amt']results['conversionRate'] = results['converted']/results['sampleSize']results['revenuePerSale'] = results['pay_amt']/results['converted']#使用實驗數(shù)據(jù)更新轉(zhuǎn)化率分布control = beta(prior_alpha + results.loc['對照組', 'converted'], prior_beta + results.loc['對照組', 'sampleSize'] - results.loc['對照組', 'converted'])treatment = beta(prior_alpha + results.loc['實驗組', 'converted'], prior_beta + results.loc['實驗組', 'sampleSize'] - results.loc['實驗組', 'converted'])plt.rcParams['font.sans-serif'] = ['SimHei'] # 用來正常顯示中文標簽plt.rcParams['axes.unicode_minus'] = False # 用來正常顯示負號fig, ax = plt.subplots()x = np.linspace(0,0.05,3000)ax.plot(x, control.pdf(x), label='對照組')ax.plot(x, treatment.pdf(x), label='實驗組')ax.set_xlabel('Conversion Probability')ax.set_ylabel('Density')ax.set_title('Experiment Posteriors')ax.legend()
import decimaldecimal.getcontext().prec = 4control_simulation = np.random.beta(prior_alpha + results.loc['對照組', 'converted'], prior_beta + results.loc['對照組', 'sampleSize'] - results.loc['對照組', 'converted'], size=30000)treatment_simulation = np.random.beta(prior_alpha + results.loc['實驗組', 'converted'], prior_beta + results.loc['實驗組', 'sampleSize'] - results.loc['實驗組', 'converted'], size=30000)treatment_won = [i <= j for i,j in zip(control_simulation, treatment_simulation)]chance_of_beating_control = np.mean(treatment_won)print(f'Chance of treatment beating control is {decimal.getcontext().create_decimal(chance_of_beating_control)}')
from scipy.stats import gammacontrol_rr = gamma(a=(1 + results.loc['對照組', 'converted']), scale=(10/(1 + 0.1 * results.loc['對照組', 'converted']*results.loc['對照組', 'revenuePerSale'])))treatment_rr = gamma(a=(1 + results.loc['實驗組', 'converted']), scale=(10/(1 + 0.1 * results.loc['實驗組', 'converted']*results.loc['實驗組', 'revenuePerSale'])))fig, ax = plt.subplots()x = np.linspace(0,4,5000)ax.plot(x, control_rr.pdf(x), label='對照組')ax.plot(x, treatment_rr.pdf(x), label='實驗組')ax.set_xlabel('Rate Parameter')ax.set_ylabel('Density')ax.set_title('Experiment Posteriors')ax.legend()

control_conversion_simulation = np.random.beta(7 + results.loc['對照組', 'converted'], 15 + results.loc['對照組', 'sampleSize'] - results.loc['對照組', 'converted'], size=100000)treatment_conversion_simulation = np.random.beta(7 + results.loc['實驗組', 'converted'], 15 + results.loc['實驗組', 'sampleSize'] - results.loc['實驗組', 'converted'], size=100000)control_revenue_simulation = np.random.gamma(shape=(1 + results.loc['對照組', 'converted']), scale=(10/(1 + (0.1)*results.loc['對照組', 'converted']*results.loc['對照組', 'revenuePerSale'])), size=100000)treatment_revenue_simulation = np.random.gamma(shape=(1 + results.loc['實驗組', 'converted']), scale=(10/(1 + (0.1)*results.loc['實驗組', 'converted']*results.loc['實驗組', 'revenuePerSale'])), size=100000)control_avg_purchase = [i/j for i,j in zip(control_conversion_simulation, control_revenue_simulation)]treatment_avg_purchase = [i/j for i,j in zip(treatment_conversion_simulation, treatment_revenue_simulation)]fig, axx = np.linspace(0,4,1000)ax.hist(control_avg_purchase, density=True, label='對照組', histtype='stepfilled', bins=100)ax.hist(treatment_avg_purchase, density=True, label='實驗組', histtype='stepfilled', bins=100)ax.set_xlabel('Avg Revenue per User')ax.set_ylabel('Density')ax.set_title('Experiment Posteriors')ax.legend()



推薦閱讀
評論
圖片
表情
