pandas处理数据方法整合

最新推荐文章于 2024-07-19 22:30:00 发布

chuntingting

最新推荐文章于 2024-07-19 22:30:00 发布

阅读量361

点赞数 1

文章标签：数据分析 pandas python

本文链接：https://blog.csdn.net/chuntingting/article/details/109266341

版权

1.如何将一个表中的role字段取值“客户”和“坐席”处理为0和1

data['role']=data['role'].apply(lambda x:1 if x =='客户' else 0)

2.对表中数据进行行处理的方法，如提取每句话中的关键词

使用apply方法对每一行数据进行处理

data['keywords']=data.apply(lambda x :get_keyword(x['role'],x['content']),axis=1)

3.去除一段话中的所有标点符号

data["label_txt"]=data.apply(lambda x: re.sub("[/n\s+\.\!\/_,$%^*(+\"\']+|[+——！，。？、~@#￥%……&*（）]", "",x['label_txt']),axis=1)

4.使用pandas将dataframe数据写入csv文件

df[['asr_txt','call_id','re_prob']].to_csv(result_path,encoding='GBK',index=False)

5.使用pandas读取txt文件，文件内容以tab分割

df=pd.read_table(r'.\test_0927.txt',sep='\t',encoding='utf-8')

6.返回一个目录下的所有文件

files=os.listdir(filepath)

7.使用zip方法构造一个dataframe

df=pd.DataFrame(zip(label_list,content_list),columns=['label','content'])

8.重置索引，并删除原来索引

data.reset_index(drop=True,inplace=True)
#drop=True 表示删除原有索引
#inplace=True 表示该更新对data数据生效

9.判断一个词是否在一句话中

if re.search(word,sentence):
	result.append(word)

10.使用“，”将词进行拼接

ky=','.join([word.strip() for word in keywords])

11.对一组词中，每个词出现的次数进行统计

def hotword(content,date):
    a=Counter(content)
    df=pd.DataFrame(columns=['word','count','date'])
    df['word']=[i for i in a.elements()]   #获取a中的所有的键，返回的是一个对象，可以通过list来转化它
    df['date']=date
    df['count']=[a[word] for word in df['word'] ]
    return df

12.生成云图

stylecloud.gen_stylecloud(text=' '.join(wordscloud), 
                          max_words=500,
                          collocations=False,
                          font_path=r'./data/simhei.ttf',
                          icon_name='fas fa-thumbs-up',
                          size=612,
                          output_name='豆瓣正向评分词云图.png')
Image(filename='豆瓣正向评分词云图.png')

13.将一个字符转为整数类型

df['count']=df['count'].astype('int64')

14.对dataframe类型数据进行排序

qushi=df.groupby(['word']).sum().reset_index().sort_values(by='count',ascending=False)

15.python中*args和**kwargs区别：

*args：是以元组的方式存放参数
**kwargs：以字典的方式存放

16.将excel中的一行转为字典

data=pd.read_excel('./data.xlsx')
data.iloc[0,:].to_dict()

chuntingting

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫