1 语料分析-二分类情感分析 好OR坏
import seaborn as sns #统计不同标签数量的
import pandas as pd #导入数据的
import matplotlib.pyplot as plt #画图
train_data=pd.read_csv(r"F:\Data\SST-2\train.tsv",sep="\t")#分别读取训练tsv和验证tsv
valid_data=pd.read_csv(r"F:\Data\SST-2\dev.tsv",sep="\t")
plt.style.use('fivethirtyeight')#作图的风格
sns.countplot(x="label", data=train_data)#统计数据
plt.title("train_data")
plt.show()
sns.countplot(x="label", data=valid_data)
plt.title("valid_data")
plt.show()
我们一般将数据定位1:1,此样本可以进行一些数据增强
sns学习文章https://blog.csdn.net/weixin_44322234/article/details/115129289?ops_request_misc=%257B%2522request%255Fid%2522%253A%252261742507-288E-40F1-A132-58E51DC2CFAC%2522%252C%2522scm%2522%253A%252220140713.130102334..%2522%257D&request_id=61742507-288E-40F1-A132-58E51DC2CFAC&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~sobaiduend~default-2-115129289-null-null.142^v100^pc_search_result_base3&utm_term=python%20sns&spm=1018.2226.3001.4187
train_data["sentence_length"] = list(map(lambda x: len(x), train_data["sentence"]))#https://blog.csdn.net/PY0312/article/details/88956795 lamble用法
#map映射与lambda结合 map(lambda x: x ** 2, [1, 2, 3, 4, 5]) 结果为[1, 4, 9, 16, 25]
sns.countplot(x="sentence_length", data=train_data)##第一个绘制的每个句子长度的分布数量,第二个是句子长度的分布情况
plt.xticks([])
plt.show()
sns.displot(train_data["sentence_length"])
plt.yticks([])
plt.show()