basic data exploration for JData
user data
- user data row number: 103616
- user_id distinct number:103616
- user age distribution:
index | user_age | user_cnt |
---|
0 | 0 | 12803 |
1 | 1 | 6 |
2 | 2 | 7999 |
3 | 3 | 46525 |
4 | 4 | 30828 |
5 | 5 | 3407 |
6 | 6 | 2048 |
- user sex distribution:
index | sex | user_cnt |
---|
0 | 0 | 45547 |
1 | 1 | 7585 |
2 | 2 | 50484 |
- user_lv_cd distribution:
index | user_lv_cd | user_cnt |
---|
0 | 1 | 2328 |
1 | 2 | 7519 |
2 | 3 | 21689 |
3 | 4 | 32205 |
4 | 5 | 39875 |
product data
- product data row number: 24187
- sku_id distinct number:24187
- sku_id distinct attr1:4
index | attr1 | product_cnt |
---|
0 | -1 | 1701 |
1 | 1 | 4760 |
2 | 2 | 3582 |
3 | 3 | 14144 |
- sku_id distinct attr2:3
index | attr2 | product_cnt |
---|
0 | -1 | 4050 |
1 | 1 | 13513 |
2 | 2 | 6624 |
- sku_id distinct attr3:3
index | attr3 | product_cnt |
---|
0 | -1 | 3815 |
1 | 1 | 8394 |
2 | 2 | 11978 |
- sku_id distinct cate:1
index | cate | product_cnt |
---|
0 | 8 | 24187 |
- sku_id distinct brand:102
- comment data row number: 558552
- sku_id distinct number:46546
- has_bad_comment distribution:
index | has_bad_comment | comment_cnt |
---|
0 | 0 | 292978 |
1 | 1 | 265574 |
action data
- action data row number: 79597328
- action data user_id number: 103565
- action data sku_id number: 31485
- action data sku_id in product number: 4284
- sku_id has more than one cate of brand: No
- action data cate=8 distinct sku_id number: 4284
- action data cate=8 sku_id not in product number: 0
- 20160401 last 5 day cate=8 buy distinct user_id number: 811
- 20160401 last 5 day cate=8 buy distinct
训练样本的数量
不做滑动窗口提取样本时
index | label | user_id |
---|
0 | 0 | 102156 |
1 | 1 | 1460 |
- offline test samples:
index | label | user_id |
---|
0 | 0 | 102404 |
1 | 1 | 1212 |
- online train samples:
index | label | user_id |
---|
0 | 0 | 102404 |
1 | 1 | 1212 |