![](https://img-blog.csdnimg.cn/20201014180756927.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
pandas
NoOne-csdn
永远年轻,永远热泪盈眶
展开
-
pandas np.nan 替换成None
背景pandas处理合并数据时,空值有时候会被复制为np.nannp.nan默认为float类型,下一步处理有时候会相对麻烦需要把np.nan替换为None尝试df.replace(np.nan, None, inplace=True)不起作用solution df=df.where(df.notnull(), None)完美解决...原创 2020-05-27 11:57:55 · 6065 阅读 · 5 评论 -
panda筛选后赋值
background标题强制赋值时报错 df.user_id[df['user_id']==31120]=-1A value is trying to be set on a copy of a slice from a DataFramesolution选择user_id=31120的用户,改变user_id的值为-1df.loc[df['user_id']==31120,'user_id']=1原创 2020-05-20 16:30:43 · 533 阅读 · 0 评论 -
pandas 分组,显示去重后列表元素和个数显示
uniquenuniqueaggs={'day_time':['nunique','unique']} gp=df.groupby(['user_id','subject_id']).agg(aggs)print(gp) day_time nunique原创 2020-05-11 15:22:01 · 1037 阅读 · 0 评论 -
pandas groupby之后合并字符串
源df目标df解决df.groupby('user_id')['practice_name'].apply(list).reset_index()原创 2020-03-11 15:00:46 · 3127 阅读 · 0 评论 -
深度学习笔记(pandas,spark,keras,TF关联小知识)
label one-hot编码发现from keras.utils.np_utils import to_categorical 的to_categorical函数和pandas.get_dummies()实现的功能相同。都是对目标对象 one-hot编码to_categorical(y, num_classes=None, dtype=‘float32’)def get_dummies...原创 2019-10-16 16:05:41 · 369 阅读 · 0 评论 -
pandas 创建DataFrame
class pandas_tool(): def __init__(self): self.dutikuengine = create_engine( "mysql+pymysql://*******m:3306/tiku?charset=utf8") pass#字典创建 def get_by_dict(self): ...原创 2019-02-20 20:35:45 · 133 阅读 · 0 评论 -
pandas 合并相同ID的字符串
背景需求:题目查重,需合并选项。现已经获得目标题目ID以及选项,目标:相同question_id,合并选项思路:data=df.groupby(by='question_id')['choice'].sum()print(data)报错:原因: choice中有空致存在,最后更改如下:df=pd.read_csv('choices.csv')df=d...原创 2019-02-19 15:07:51 · 2118 阅读 · 2 评论 -
apandas 分组并字符串拼接
源:目标:方法:choiceres=choicedf.pivot_table(values='choice',index='question_id',aggfunc=lambda x:x.str.cat())原创 2018-12-26 11:55:09 · 1750 阅读 · 3 评论 -
pandas 找出含有特定字符串的行
res=res[res['choice'].str.contains("<img")]原创 2018-12-27 10:31:48 · 34730 阅读 · 8 评论 -
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame(待解决)
更新:依稀记得前面一个df reset_index就好了,具体原因待查明原创 2018-12-01 08:30:47 · 706 阅读 · 0 评论 -
pandas处理后的数据一次性插入mysql 大概20万条 zip 函数
mysql数据表结构如下: 利用pandas处理后的数据格式如下:方法一:with self.tikuengine1.connect() as conn: result.to_sql('t_exam', con=conn, if_exists='replace', chunksize=3000)速度较快,但是每次时追加或者替换,且不能设置主键(待验证)方法...原创 2018-12-04 11:08:59 · 1633 阅读 · 0 评论 -
pandas groupby 作用多个函数
gp=df.groupby(['user_id','item_id']).sum().reset_index()print(gp.head())gp = df.groupby(['user_id', 'item_id']).agg(['sum','count']).reset_index()print(gp.head())原创 2019-02-27 11:17:07 · 3889 阅读 · 0 评论 -
pandas 时间戳转时间保留北京时间日期(to_datetime )
``` user_id create_time0 38441 15410016021 38442 15410016642 38443 15410017443 38444 15410019264 38445 15410020125 38446 15410024136 38447 15410...原创 2019-03-14 12:19:51 · 8927 阅读 · 0 评论 -
pandas 分组后取第N行
目的:把question_id 对应的user_answer转成ABCDsolutiondfa=df.groupby('question_id').nth(0).reset_index()dfa['flag']='A'dfb=df.groupby('question_id').nth(1).reset_index()dfb['flag']='B'dfc=df.groupby('qu...原创 2019-05-27 10:12:37 · 3691 阅读 · 1 评论 -
pandas pivot_table并改变多重索引multiindex
背景:原dfdf1 = df.pivot_table(index=["user_id"], columns=["question_id"], values=["score","duration"]).fillna(0)print(df1)透视之后目标透视图:solution:c1=df['question_id'].drop_duplicates()c2=['score',...原创 2019-06-11 17:31:24 · 5154 阅读 · 3 评论 -
pandas series 保留两位小数
转换之前 question_id sum count mean0 1 15 24 0.6250001 2 1 1 1.0000002 3 6 16 0.3750003 4 3 5 ...原创 2019-07-04 16:21:00 · 23633 阅读 · 2 评论 -
pandas:一列分解成多列 series.str.split(',',expand=True);pyspark 一列分解成多列
源shuju question_id id0 17576 70391,703941 17576 70391,70392,70393,703942 17576 70391,703923 404...原创 2019-07-10 12:07:41 · 17566 阅读 · 4 评论 -
日常笔记之 txt文件转csv文件
0719 txt文件转csvmethod1 a=list(csv.reader(open('a.txt', 'r'), delimiter='\t')) df=pd.DataFrame(a)df.to_csv("res.csv")method2 with open('a.txt', 'r') as infile, open('t.csv', 'w') as outfile: ...原创 2019-07-19 10:42:20 · 349 阅读 · 0 评论 -
pandas 小知识(持续更新ing)
判断是否有NULL空数据demoprint(train_data.dtypes)Id int64MSSubClass int64MSZoning objectLotFrontage float64LotArea int64…MoSold int64Yr...原创 2019-10-11 11:14:19 · 184 阅读 · 0 评论 -
pandas merge 实际应用(important)
实际背景:将两个 df 如第一图所示,合并成如第二图所示的数据格式解决方法:import pandas as pddata = {'item': list("111122223333"), "subject": list("123412341234"), "rate": list("qazwsxedcrfv" ...原创 2018-11-21 17:03:19 · 266 阅读 · 0 评论 -
UserWarning: findfont: Font family ['sans-serif'] not found
MAC系统:pandas 画图时候出现错误原创 2018-10-29 15:50:04 · 7538 阅读 · 2 评论 -
pandas assign添加新的列或者覆盖原有的列
原创 2018-09-26 14:31:36 · 9092 阅读 · 0 评论 -
pandas 字符串类型转换成时间类型 object to datetime64[ns]
import pandas as pdfrom matplotlib import pyplot as pltfrom datetime import datetimefilename='sitka_weather_2014.csv'#AKSTdf=pd.read_csv(filename)print(df.dtypes)df['AKST'] = pd.to_date...原创 2018-09-26 12:17:26 · 20687 阅读 · 0 评论 -
pandas 时间序列
import pandas as pdimport numpy as nps=pd.Series(pd.date_range('20131201 00:00:00',periods=4))# print(s.dtypes) #datetime64[ns]# print(s)# print(s.dt.hour)# print(s.dt.second)# print(s.dt.da...转载 2018-09-20 20:12:57 · 199 阅读 · 0 评论 -
pandas 强制类型转换 df.astype
import pandas as pdfrom matplotlib import pyplot as pltfrom datetime import datetimefilename='sitka_weather_2014.csv'df=pd.read_csv(filename)print(df.dtypes)df[' Min Humidity']=df[' Min ...原创 2018-09-26 11:46:01 · 36416 阅读 · 2 评论 -
pandas 之series and dataframe 行索引和列索引重命名
import pandas as pdimport numpy as npimport matplotlib.pyplot as plt'''The rename() method allows you to relabel an axis based on some mapping (a dict or Series) or an arbitrary function.'''s=p...翻译 2018-09-20 17:27:15 · 11902 阅读 · 0 评论 -
pandas series and datarame align操作
import pandas as pdimport numpy as npimport matplotlib.pyplot as plt'''The align() method is the fastest way to simultaneously align two objects. It supports a join argument (related to joining ...翻译 2018-09-20 16:44:15 · 862 阅读 · 0 评论 -
pandas 之reindex and altering labels and reindex_like
先附程序源码:import pandas as pdimport numpy as npimport matplotlib.pyplot as plt'''Reindex and altering labels''''''reindex() is the fundamental data alignment method in pandas. It is used to implem...翻译 2018-09-20 16:23:30 · 298 阅读 · 0 评论 -
gif,ax=subplot()
"""fig,ax=plt.subplot(), 建立一个fig对象,建立一个axis对象。等同:fig=plt.figure()ax=fid.add_subplot(111)"""'''def subplots(nrows=1, ncols=1, sharex=False, sharey=False, squeeze=True, subplot...原创 2018-10-29 17:24:15 · 517 阅读 · 0 评论 -
pandas dataframe 添加行和列
import numpy as npimport pandas as pddf=pd.DataFrame(np.random.randn(3,4),columns=list("ABCD"),index=list("xyz"))# print(df)res1=df.apply(lambda x:x.sum())# print(res1)# print(type(res1)) #&l...原创 2018-10-31 20:28:18 · 6433 阅读 · 0 评论 -
pandas agg
import numpy as npimport pandas as pd'''参考官网'''# 一普通操作df = pd.DataFrame({'A': [1, 2, 3], 'B': [1., 2., 3.], 'C': ['foo', 'bar', 'baz'], ...翻译 2018-10-31 21:03:17 · 1974 阅读 · 0 评论 -
pandas 筛选某个值在某个列表中 isin
#筛选某个值在某个列表中df = df[df['subject_1'].isin([1, 2, 13, 18, 25])]原创 2018-11-14 14:49:53 · 24338 阅读 · 2 评论 -
pandas 知道几个index的值,求这些index所对应的数据
''' Unnamed: 0 duration user_idsubject_1 0 907296360955 957658 55481543891 1935835234382 16803739 177721554272 ...原创 2018-11-14 14:20:14 · 3402 阅读 · 0 评论 -
pandas 列索引转换,透视,保留小数点两位,改变列的顺序,改变类型,存文件,数据库读写
import timefrom datetime import datetimefrom sqlalchemy import create_engine, Column ,Integer,DateTime,DECIMALimport pandas as pdanswerengine=create_engine('mysql+pymysql://***')questionengine...原创 2018-11-14 11:22:33 · 4660 阅读 · 0 评论 -
pandas 在透视表pivot_table中取更新时间的前5列(未解决)
原创 2018-11-15 15:37:44 · 663 阅读 · 0 评论 -
pandas 筛选去除重复的数据
import pandas as pddata={'demo':[1,1,2,2,1,2,2,3,4,5,6,98,4,2,4,5,2,5,6,7]}df=pd.DataFrame(data)a=df.drop_duplicates(subset=['demo'],keep='first')print(a)原创 2018-11-15 14:38:16 · 7350 阅读 · 0 评论 -
pandas applymap,apply,map实例应用
applymap作用于每一个元素apply作用于一行或者一列map作用于series的元素import pandas as pdimport numpy as np'''applymapapplymap的区别'''# eg1def func_tuple(x): return x * xseries = pd.Series([1, 2, 3, 4])...原创 2018-11-02 11:58:29 · 912 阅读 · 0 评论 -
pandas dataframe pivot_table
一个比较有代表性的例子 参考官网df=pd.DataFrame({ "value":np.random.randn(36)},index=pd.date_range('2011-01-01',freq='M',periods=36))print(df)结构如下:dff=pd.pivot_table(df,index=df.index.month,columns=d...原创 2018-11-01 14:49:25 · 5398 阅读 · 0 评论 -
pandas DataFrame 选取数据
import numpy as npimport pandas as pdfrom conf.conf import engineconnectsql = """select id,action,uid,time from `*****`"""df = pd.read_sql(sql, engineconnect().connect())df = df.astype({"uid...原创 2018-10-30 17:20:19 · 2307 阅读 · 1 评论 -
pandas数据合并
import pymysqlimport pandas as pdhost=''port=4000user=''password='111111'db=''db= pymysql.connect(host=host, port=port, user=user, ...原创 2018-09-17 15:52:36 · 152 阅读 · 0 评论