pandas 基础

读入

Pd.read_table(‘’,sep=’|’)
Type(UFO)
UFO.City = UFO[‘city’]

如果列名是两个单词,不能用.的方式,尽量不用

Movie.shape
     .dtypes
     .describe()

Dataframe or series Methods/acts with ()
Attributes/description without()

Rename : Ufo.columns is a dictionary

Ufo.rename(columns = {‘   ’:’  ’,’  ’:’  ’})
Ufo = pd.read_csv( ....,names={},header=0)
Ufo.columns = ufo.columns.str.replace(‘ ’,’_’)

Remove:

Ufo.drop(‘Colors’, axis=1, inplace=True) #inplace means the change can be seen
Ufo.drop([‘Colors’,’city’], axis=1, inplace=True) #[]means is a list
Ufo.drop([0,1], axis=0, inplace=True) #remove the first two rows, the row’s name is called label
Df.drop(df.columns[[1,2]],axis =1)

sort:

Movies.title.sort_values(ascending=False)  #series meathods,not change the data
Movies.sort_values(‘title’) #not change the data
Movies.sort_value([‘title’,’rank’]) #first sort the title then is the rank   columns 

Groupby rows

filter


Booleans = []
For length in movies.duration:
If length>100:
Booleans.append(True)
If length<100:
Booleans.append(False)

create a list with the length the same as the dataframe

Is_long = pd.Series(booleans) #Series(list)
Movies[is_long] #only get the result with duratio>100

Is_long = movies.duration >=200
Movies[ls_long]
#Series compared a number. Output a dataframe
If you want to get a series, you can append [‘dafsaf’]
Movies[(movies.duration >=200) & (movies.genra == ‘D’)]
Movies.genre.isin([‘Crime’,’Drama’,’adsf’])

Loc/iloc:
Df.loc[0,:]
Df.loc[0:2,:]
Df.loc[:,’City’:’State’]
Df.loc[:,[‘City’,’State’]]
Df[df.City==’asf’]
Df.loc[ufo.City == ‘dsaf’,:]
Df.loc[ufo.City==’a’,’v’] the same as df.loc[ofo.City == ‘a’].v but the fommer one is more safe
Df.iloc[:,1:4] #the first number is row, and the second is columns
Df[[‘a’,’b’]] the same as df.lcov[:,[‘a’,’b’]]
Df.ix[‘a’,0] #row and column
Df.ix[1,’a’] #row location and column name

Position 1:4 1 is included and 4 is excluded
Lable ‘a’:’b’ a is included and b is included

Inplace:

Df = df.set_index(‘Time’)
Ufo.fillna(method=’bfill’).tail()

Sort(drinks.continent.unique())
Df[‘xontinent’] = drinks.continent.astype(‘category’)

Df[‘quality’] = df.quality.astype(‘category’,categories =[‘good’,very good’,’exellent’],orders=True) #the logic < <
Df.sort_values(‘quality’)
Df.loc[df.quality > ‘good’,:]

Ufo.sample(n=3,random_state=42)
Ufo.sample(fra=0.73,random_state=99)
Ufo.loc[~ufo.index.isin(train.index),:]

Df[‘sexnum’]=Df.sex.map({‘female’:0,’male’:1}) #create a new columns if the sex is ‘female’,the sexnum is 0
Pd.get_dummies(train.Sex,prefix=’Sex’).iloc[:,1:]

create a column named ‘Sex_male’

Pd.get_dummies(train.Embarked,prefix=’Embarked’).iloc[:,1:]
Pd.concat([train,embarked_dummies],axis=1)
Pd.get_dummies(train,columns=[‘a’,’b’],drop_ first =True)

Time :
ufo.Time.str.slice(-5,-3).astype(int) #slice’4/18/2003 12:00’ 12
ufo.Time = pd.to_datetime(ufo.Time)

* you can try ufo.dtypes ,you can see it is a datetime64*
API Reference

Ts = Pd.to_datetime(‘1/1/1999’)
Ufo.loc[ufo.Time >= ts,:]
(ufo.Time.max()-ufo.Time.min()).days
Ufo[‘Year’] = ufo.Time.dt.year
ufo.Year.value_counts().sort_index().plot()

df = pd.read_csv('328.csv',header=None)
df.iloc[:,292] = '20'+ df.iloc[:,292].astype(str)
date = pd.to_datetime(df.iloc[:,292], format = '%Y%m%d')
df.iloc[:,292] = date.apply(lambda x: x.strftime('%Y-%m-%d'))
df = df.set_index(292)
df.index.name='Date'
df = df.drop([0,1,2,3],axis=1)
start_time = dt.time(hour=0, minute=0,second=0).isoformat(timespec='minutes')
end_time = dt.time(hour=23, minute=59,second=59).isoformat(timespec='minutes')
rng = pd.date_range(start=start_time,end=end_time,freq='5Min')
rng = rng.format(formatter=lambda x: x.strftime('%H:%M'))
df = df.T
df.index=rng

Set on a copy warning

Movies.contend_rating.value_counts()
Df.loc[df.content_rating==’not rated’,’content_rating’] = np.nan

Overview
Pd.sample(frac=0.1)

Plot

“`P = df.plot(figsize =(20,5))
P.set_title(‘hello’,fontsize=40)
P.set_xlable(‘x axis’)
df4.plot.hist(alpha=0.5)

Plt.hist(sever.Time,25)“`

apply works on a row / column basis of a DataFrame, applymap works element-wise on a DataFrame, and map works element-wise on a Series.

Search
searchfor = ['宛东','电容','电抗']
df = df[df['设备名称'].str.contains('|'.join(searchfor))|]

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值