UCB
老虎机奖惩机制
简单商业案例:
给用户定点投放广告,获取最佳投放策略
ad =10
consumer =10000
- 获取用户反馈信息
Ad 1,Ad 2,Ad 3,Ad 4,Ad 5,Ad 6,Ad 7,Ad 8,Ad 9,Ad 10
consumer1 1,0,0,0,1,0,0,0,1,0
consumer2 0,0,0,0,0,0,0,0,1,0
consumer3 0,0,0,0,0,0,0,0,0,0
consumer4 0,1,0,0,0,0,0,1,0,0
consumer5 0,0,0,0,0,0,0,0,0,0
consumer6 1,1,0,0,0,0,0,0,0,0
consumer7 0,0,0,1,0,0,0,0,0,0
consumer8 1,1,0,0,1,0,0,0,0,0
.
.
.
# Importing the dataset
导入数据集
# Implementing UCB
构造关键:
average_reward //每个广告的平均价值
delta_i //伯努利函数
**upper_bound//置信区间上界
for n in range(0, N):
ad = 0
max_upper_bound = 0
for i in range(0, d):
if (numbers_of_selections[i] > 0):
average_reward = sums_of_rewards[i] / numbers_of_selections[i]
delta_i