推荐系统深度学习实战之Wide_Deep

推荐系统与深度学习的结合。在推荐系统中,记忆体现的准确性,而泛化体现的是新颖性,wide_deep能将两者相结合。

1、Memorization 和 Generalization

     这个是从人类的认知学习过程中演化来的。人类的大脑很复杂,它可以记忆(memorize)下每天发生的事情(麻雀可以飞,鸽子可以飞)然后泛化(generalize)这些知识到之前没有看到过的东西(有翅膀的动物都能飞)。

     但是泛化的规则有时候不是特别的准,有时候会出错(有翅膀的动物都能飞吗)。那怎么办那,没关系,记忆(memorization)可以修正泛化的规则(generalized rules),叫做特例(企鹅有翅膀,但是不能飞)。

     这就是Memorization和Generalization的来由或者说含义。

2、Wide & Deep模型

     实际上,Wide模型就是一个广义线性模型,Deep就是指Deep Neural Network。Wide Linear Model用于memorization;Deep Neural Network用于generalization。 同时训练Wide模型和Deep模型,并将两个模型的结果的加权和作为最终的预测结果。

Wide模型:FTRL

Deep模型:AdaGrad

关注如下公众号回复“wd”获取完整代码:

有酒有风

 

_CSV_COLUMNS = ['target', 'ps_ind_01', 'ps_ind_02_cat', 'ps_ind_03',
       'ps_ind_04_cat', 'ps_ind_05_cat', 'ps_ind_06_bin', 'ps_ind_07_bin',
       'ps_ind_16_bin', 'ps_ind_17_bin', 'ps_ind_18_bin', 'ps_reg_01',
       'ps_car_03_cat', 'ps_car_04_cat', 'ps_car_05_cat', 'ps_car_06_cat',
       'ps_calc_10', 'ps_calc_11', 'ps_calc_12', 'ps_calc_13', 'ps_calc_14']
 
def input_fn(data_file, num_epochs,shuffle):
    df_data = pd.read_csv(
        tf.gfile.Open(data_file),
        names=_CSV_COLUMNS,
        skipinitialspace=True,
        engine="python",
        skiprows=1)
    labels = df_data["target"]
    df_data = df_data.drop("target",axis=1)

    return tf.estimator.inputs.pandas_input_fn(
        x=df_data,
        y=labels,
        batch_size=100,
        num_epochs=num_epochs,
        shuffle = shuffle,
        num_threads=5)
#使用tf.feature_column分别对离散和连续特征进行处理
#连续特征在wide和deep部分都会用到
ps_ind_01 = tf.feature_column.numeric_column('ps_ind_01')
ps_ind_03 = tf.feature_column.numeric_column('ps_ind_03')
ps_reg_01 = tf.feature_column.numeric_column('ps_reg_01')
ps_calc_10 = tf.feature_column.numeric_column('ps_calc_10')
ps_calc_11 = tf.feature_column.numeric_column('ps_calc_11')
ps_calc_12 = tf.feature_column.numeric_column('ps_calc_12')
ps_calc_13 = tf.feature_column.numeric_column('ps_calc_13')
ps_calc_14 = tf.feature_column.numeric_column('ps_calc_14')

#离散特征
ps_ind_06_bin = tf.feature_column.categorical_column_with_identity(key = 'ps_ind_06_bin',num_buckets=2)
ps_ind_07_bin = tf.feature_column.categorical_column_with_identity(key = 'ps_ind_07_bin',num_buckets=2)
ps_ind_16_bin = tf.feature_column.categorical_column_with_identity(key = 'ps_ind_16_bin',num_buckets=2)
ps_ind_17_bin = tf.feature_column.categorical_column_with_identity(key = 'ps_ind_17_bin',num_buckets=2)
ps_ind_18_bin = tf.feature_column.categorical_column_with_identity(key = 'ps_ind_18_bin',num_buckets=2)

ps_ind_02_cat = tf.feature_column.categorical_column_with_vocabulary_list(
    key = 'ps_ind_02_cat',vocabulary_list = [ 2,  1,  4,  3, -1])
ps_ind_04_cat = tf.feature_column.categorical_column_with_vocabulary_list(
    key = 'ps_ind_04_cat',vocabulary_list = [1, 0, -1])
ps_ind_05_cat = tf.feature_column.categorical_column_with_vocabulary_list(
    key = 'ps_ind_05_cat',vocabulary_list = [ 0,1,4,3,6 ,5,-1,2])
ps_car_03_cat = tf.feature_column.categorical_column_with_vocabulary_list(
    key = 'ps_car_03_cat',vocabulary_list = [-1,0,1])
ps_car_04_cat = tf.feature_column.categorical_column_with_vocabulary_list(
    key = 'ps_car_04_cat',vocabulary_list = [0, 1, 8, 9, 2,6, 3, 7, 4, 5])
ps_car_05_cat = tf.feature_column.categorical_column_with_vocabulary_list(
    key = 'ps_car_05_cat',vocabulary_list = [ 1, -1,0])
ps_car_06_cat = tf.feature_column.categorical_column_with_vocabulary_list(
    key = 'ps_car_06_cat',vocabulary_list = [4,11,14,13,6,15,3,0,1,10,12,9,17,7,8,5,2,16])

 

crossed_columns = [
    tf.feature_column.crossed_column(
        ['ps_ind_02_cat', 'ps_car_03_cat'], hash_bucket_size=100),
    tf.feature_column.crossed_column(
        ['ps_ind_02_cat', 'ps_car_04_cat', 'ps_car_05_cat'], hash_bucket_size=100
    ),
    tf.feature_column.crossed_column(
    ['ps_car_03_cat', 'ps_car_06_cat'], hash_bucket_size=100)
]

#在学习过程中线性模型接受所有类型特征列,深度神经网络分类器DNNClassifier仅接收密集特征列dense column,
#其他类型特征列必须用指示列indicatorColumn或嵌入列embedingColumn进行包裹
raw_input_col = [ ps_ind_06_bin ,
      ps_ind_07_bin ,
      ps_ind_16_bin ,
      ps_ind_17_bin ,
      ps_ind_18_bin ,
      ps_ind_02_cat ,
      ps_ind_04_cat ,
      ps_ind_05_cat ,
      ps_car_03_cat ,
      ps_car_04_cat ,
      ps_car_05_cat ,
      ps_car_06_cat
    ]

deep_columns = [
    ps_ind_01 ,
    ps_ind_03 ,
    ps_reg_01 ,
    ps_calc_10 ,
    ps_calc_11 ,
    ps_calc_12 ,
    ps_calc_13 ,
    ps_calc_14 ,
    tf.feature_column.indicator_column(ps_ind_06_bin),
    tf.feature_column.indicator_column(ps_ind_07_bin),
    tf.feature_column.indicator_column(ps_ind_16_bin),
    tf.feature_column.indicator_column(ps_ind_17_bin),
    tf.feature_column.indicator_column(ps_ind_18_bin),
    tf.feature_column.indicator_column(ps_ind_02_cat),
    tf.feature_column.indicator_column(ps_ind_04_cat),
    tf.feature_column.indicator_column(ps_ind_05_cat),
    tf.feature_column.indicator_column(ps_car_03_cat),
    tf.feature_column.indicator_column(ps_car_04_cat),
    tf.feature_column.indicator_column(ps_car_05_cat),
    tf.feature_column.embedding_column(ps_car_06_cat,dimension= 5)#此处是为了表示embedding_column的作用,在维度较大时使用
]
#Wide&Deep模型
model_dir = './DeepLearning/wide_deep'
model = tf.estimator.DNNLinearCombinedClassifier(model_dir = model_dir,
                                                linear_feature_columns = raw_input_col+crossed_columns,
                                                dnn_feature_columns = deep_columns,
                                                dnn_hidden_units = [100,50])

train_epochs = 6
epochs_per_eval = 2
batch_size = 40
train_file = r'F:\MachineLearning\CTR\train_s.csv'
test_file = r'F:\MachineLearning\CTR\tests.csv'
model.train(input_fn= input_fn(train_file, epochs_per_eval,True))
results = model.evaluate(input_fn= input_fn(test_file, epochs_per_eval,True))
for key in sorted(results):
    print("%s: %s" % (key, results[key]))

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值