pandas 基础

最新推荐文章于 2020-03-02 14:15:12 发布

zouyutu5296

最新推荐文章于 2020-03-02 14:15:12 发布

阅读量279

点赞数

分类专栏： pandas

本文链接：https://blog.csdn.net/zouyutu5296/article/details/78654879

版权

pandas 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

读入

Pd.read_table(‘’,sep=’|’)
Type(UFO)
UFO.City = UFO[‘city’]

如果列名是两个单词，不能用.的方式，尽量不用

Movie.shape
     .dtypes
     .describe()

Dataframe or series Methods/acts with ()
Attributes/description without()

Rename : Ufo.columns is a dictionary

Ufo.rename(columns = {‘   ’:’  ’,’  ’:’  ’})
Ufo = pd.read_csv( ....,names={},header=0)
Ufo.columns = ufo.columns.str.replace(‘ ’,’_’)

Remove:

Ufo.drop(‘Colors’, axis=1, inplace=True) #inplace means the change can be seen
Ufo.drop([‘Colors’,’city’], axis=1, inplace=True) #[]means is a list
Ufo.drop([0,1], axis=0, inplace=True) #remove the first two rows, the row’s name is called label
Df.drop(df.columns[[1,2]],axis =1)

sort:

Movies.title.sort_values(ascending=False)  #series meathods,not change the data
Movies.sort_values(‘title’) #not change the data
Movies.sort_value([‘title’,’rank’]) #first sort the title then is the rank   columns

Groupby rows

filter

Booleans = [] For length in movies.duration: If length>100: Booleans.append(True) If length<100: Booleans.append(False)

create a list with the length the same as the dataframe

Is_long = pd.Series(booleans) #Series(list) Movies[is_long] #only get the result with duratio>100

Is_long = movies.duration >=200 Movies[ls_long] #Series compared a number. Output a dataframe If you want to get a series, you can append [‘dafsaf’] Movies[(movies.duration >=200) & (movies.genra == ‘D’)] Movies.genre.isin([‘Crime’,’Drama’,’adsf’])

Loc/iloc:
Df.loc[0,:] Df.loc[0:2,:] Df.loc[:,’City’:’State’] Df.loc[:,[‘City’,’State’]] Df[df.City==’asf’] Df.loc[ufo.City == ‘dsaf’,:] Df.loc[ufo.City==’a’,’v’] the same as df.loc[ofo.City == ‘a’].v but the fommer one is more safe Df.iloc[:,1:4] #the first number is row, and the second is columns Df[[‘a’,’b’]] the same as df.lcov[:,[‘a’,’b’]] Df.ix[‘a’,0] #row and column Df.ix[1,’a’] #row location and column name
Position 1:4 1 is included and 4 is excluded
Lable ‘a’:’b’ a is included and b is included

Inplace:

Df = df.set_index(‘Time’) Ufo.fillna(method=’bfill’).tail()

Sort(drinks.continent.unique()) Df[‘xontinent’] = drinks.continent.astype(‘category’)

Df[‘quality’] = df.quality.astype(‘category’,categories =[‘good’,very good’,’exellent’],orders=True) #the logic < < Df.sort_values(‘quality’) Df.loc[df.quality > ‘good’,:]

Ufo.sample(n=3,random_state=42) Ufo.sample(fra=0.73,random_state=99) Ufo.loc[~ufo.index.isin(train.index),:]

Df[‘sexnum’]=Df.sex.map({‘female’:0,’male’:1}) #create a new columns if the sex is ‘female’,the sexnum is 0
Pd.get_dummies(train.Sex,prefix=’Sex’).iloc[:,1:]

create a column named ‘Sex_male’

Pd.get_dummies(train.Embarked,prefix=’Embarked’).iloc[:,1:] Pd.concat([train,embarked_dummies],axis=1) Pd.get_dummies(train,columns=[‘a’,’b’],drop_ first =True)

Time :
ufo.Time.str.slice(-5,-3).astype(int) #slice’4/18/2003 12:00’ 12 ufo.Time = pd.to_datetime(ufo.Time)
* you can try ufo.dtypes ,you can see it is a datetime64*
API Reference

Ts = Pd.to_datetime(‘1/1/1999’) Ufo.loc[ufo.Time >= ts,:] (ufo.Time.max()-ufo.Time.min()).days Ufo[‘Year’] = ufo.Time.dt.year ufo.Year.value_counts().sort_index().plot()

df = pd.read_csv('328.csv',header=None) df.iloc[:,292] = '20'+ df.iloc[:,292].astype(str) date = pd.to_datetime(df.iloc[:,292], format = '%Y%m%d') df.iloc[:,292] = date.apply(lambda x: x.strftime('%Y-%m-%d')) df = df.set_index(292) df.index.name='Date' df = df.drop([0,1,2,3],axis=1) start_time = dt.time(hour=0, minute=0,second=0).isoformat(timespec='minutes') end_time = dt.time(hour=23, minute=59,second=59).isoformat(timespec='minutes') rng = pd.date_range(start=start_time,end=end_time,freq='5Min') rng = rng.format(formatter=lambda x: x.strftime('%H:%M')) df = df.T df.index=rng

Set on a copy warning

Movies.contend_rating.value_counts() Df.loc[df.content_rating==’not rated’,’content_rating’] = np.nan

Overview
Pd.sample(frac=0.1)

Plot

“`P = df.plot(figsize =(20,5))
P.set_title(‘hello’,fontsize=40)
P.set_xlable(‘x axis’)
df4.plot.hist(alpha=0.5)

Plt.hist(sever.Time,25)“`

apply works on a row / column basis of a DataFrame, applymap works element-wise on a DataFrame, and map works element-wise on a Series.

Search
searchfor = ['宛东','电容','电抗'] df = df[df['设备名称'].str.contains('|'.join(searchfor))|]

zouyutu5296

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
pandas 基础

读入Pd.read_table(‘’,sep=’|’)Type(UFO)UFO.City = UFO[‘city’]如果列名是两个单词，不能用.的方式，尽量不用Movie.shape .dtypes .describe()Dataframe or series Methods/acts with () Attributes/description
复制链接

扫一扫