一、Thompson 抽样算法
-
原理
-
代码实现
数据
Ad 1 Ad 2 Ad 3 Ad 4 Ad 5 Ad 6 Ad 7 Ad 8 Ad 9 Ad 10 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 ...
from matplotlib import pyplot as plt import pandas as pd import random dataset = pd.read_csv("Ads_CTR_Optimisation.csv") d = 10 N = 10000 numbers_of_rewards_1 = [0] * d numbers_of_rewards_0 = [0] * d ads_selected = [] total_reward = 0 for n in range(N): ad = 0 max_random = 0 for i in range(d): random_beta = random.betavariate(numbers_of_rewards_1[i] + 1, numbers_of_rewards_0[i] + 1) if random_beta > max_random: max_random = random_beta ad = i ads_selected.append(ad) reward = dataset.values[n, ad] if reward: numbers_of_rewards_1[ad] += 1 else: numbers_of_rewards_0[ad] += 1 total_reward += reward print(total_reward) plt.hist(ads_selected) plt.title("Histogram of ads selections") plt.xlabel("Ads") plt.ylabel("Number of times each ad was selected") plt.show()
由数据看出4号广告被投放的次数基本占据90%,并且10000次投放点击量2603这个数是变化的但基本在2600左右浮动,相比于置信区间上界算法有非常大的提升
-
Thompson抽样算法-TSA vs 置信区间上界-UCB