一、置信区间上界算法
-
原理
-
代码实现
数据:
Ad 1 Ad 2 Ad 3 Ad 4 Ad 5 Ad 6 Ad 7 Ad 8 Ad 9 Ad 10 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 ...
随机广告投放:
import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv('Ads_CTR_Optimisation.csv') # Implementing Random Selection import random N = 10000 d = 10 ads_selected = [] total_reward = 0 for n in range(0, N): ad = random.randrange(d) ads_selected.append(ad) reward = dataset.values[n, ad] total_reward = total_reward + reward # Visualising the results plt.hist(ads_selected) plt.title('Histogram of ads selections') plt.xlabel('Ads') plt.ylabel('Number of times each ad was selected') plt.show()
可看出对于每个广告投的次数基本一样,10000个次投放被点击次数1247,很少Upper Confidence Bound(强化学习-置信区间上界法)
from matplotlib import pyplot as plt from math import sqrt, log import pandas as pd dataset = pd.read_csv("Ads_CTR_Optimisation.csv") """ 1. 在第n轮里,对于每一个广告i,我们计算一下两个量 . Ni(n) - 在n轮之前, 广告i被选择投放的总次数 . Ri(n) - 在第n轮之前, 广告i的总奖励, 也就是广告i的点 2. 根据以上两个量,我们计算以下数据: . 在第n轮之前, 广告i的平均奖励 ri(n) = Ri(n)/Ni(n) . 第n轮的置信区间[ri(n) - 🔺i(n), ri(n) + 🔺i(n)], 其中 🔺i(n) = √((3log(n)) / zNi(n)) 3. 我们选择拥有最大的UCB(置信区间上界)的广告i, 其中UCB为 ri(n) + 🔺i(n) """ d = 10 N = 10000 numbers_of_selections = [0] * d sums_of_rewards = [0] * d ads_selected = [] total_reward = 0 for n in range(N): ad = 0 max_upper_bound = 0 for i in range(d): if numbers_of_selections[i] > 0: average_reward = sums_of_rewards[i] / numbers_of_selections[i] delta_i = sqrt(3 / 2 * log(n + 1) / numbers_of_selections[i]) upper_bound = average_reward + delta_i else: upper_bound = 1e400 if upper_bound > max_upper_bound: max_upper_bound = upper_bound ad = i ads_selected.append(ad) reward = dataset.values[n, ad] numbers_of_selections[ad] += 1 sums_of_rewards[ad] += reward total_reward += reward # print(total_reward) # print(ads_selected) plt.hist(ads_selected) plt.title("Histogram of ads selections") plt.xlabel("Ads") plt.ylabel("Number of times each ad was selected") plt.show()
可看出对于4号广告投的次比较多,10000个次投放被点击次数2178,有很大的提升