依然是按照老样子拿到了我们的训练集测试集还有提交模板
竞赛网站:
https://www.kaggle.com/c/nlp-getting-started/overview/description
数据初步可视化
import numpy as np
import pandas as pd
from sklearn import feature_extraction, linear_model, model_selection, preprocessing
train = pd.read_csv('路径/train.csv')
test = pd.read_csv('路径/test.csv')
此处以keyword作为例子,查看不同的度量对于预测结果的影响
#划分数据集,查看相应变量中的
target1=train.keyword[train.target == 1].value_counts()
target0=train.keyword[train.target == 0].value_counts