防止忘记,在此做个记录
目录:
- 读取csv
- 筛选
- 排序
- 写入csv(解决中文乱码参数:encoding='utf_8_sig')
1.读取csv
import pandas as pd
file = pd.read_csv(r'D:\projects\PycharmProjects\final_wangwei\final_news_all.csv', usecols=['entity_id', 'post_title','publish_year','publish_month'])
2.根据列值筛选
news=file[((file['publish_year']==2018) & (file['publish_month']>4))|((file['publish_year']==2019) & (file['publish_month']<5))]
3.根据某列值排序(升序)
news=news.sort_values('publish_month',ascending=True)
4.对于pandas.core.frame.DataFrame提取某列,并转换为list
news['entity_id'].values.tolist()
5.根据某列统计
news['publish_month'].value_counts()
6.读取txt,concat
df_empty = pd.DataFrame(columns=['doc'])
data1=pd.read_csv('linshi/5079161.txt',names=["doc"])
df=pd.concat([df_empty,data1,data2,data3,data4],axis=0) #纵向
7.获取当前时间
import time
print(time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())))
8.按行创建dataframe
sdp=pd.DataFrame.from_items([('months',months),('shoucangs',shoucangs),('dianzans',dianzans),('pingluns',pingluns)])