读取文件:
f= pd.read_csv('文件名', encoding='gb2312')
读取文件中某些列的数据
data = f6.loc[:, [“列名1”,“列名2”]]
假设data有这几列{“a”,“b”,“c”}
如果a这一列的数据是{60ml,250ml,250ml,60ml,250ml,250ml,60ml,60ml,250ml,250ml,60ml,}
要只取{60,250,250,60,250,250,60,60,250,250,60,}则可以直接写成:
data["a"]= data["a"].str.extract('(\d+)', expand=False)
data["a"].astype(int)
如果a这一列数据有小数如{8.86℃,8.86℃,8.86℃,8.86℃,8.86℃},如果有负数则正则改为:-?\d+(?:.\d+)?
data["a"]= data["a"].str.extract('(\d+(?:\.\d+)?)', expand=False)
data["a"].astype(float)
如果某一列是百分数如{55%,63%,72%,52%,72%}
data["a"]= data["a"].str.extract('(\d+(?:\.\d+)?)', expand=False)
data["a"].astype(float)=data["a"].astype(float)*0.01
这样就可以得到{0.55,0.63,0.72,0.52,0.72}
参考资料:
http://www.voidcn.com/article/p-svajvhlh-btn.html
https://zhidao.baidu.com/question/141474539847665805.html