最近在网易云听了听pandas versus excel这个课,感觉Timothy老师讲的蛮好的,对所讲解的知识点进行一下总结,已备之后翻阅。
1.创建一个excel文件
- 主要使用to_excel(‘filename’)
import os
import pandas as pd
DIR = os.path.dirname(__file__)
filename = os.path.join(DIR,'output.xlsx')
df = pd.DataFrame({'id':[1,2,3],'name':['liu','jiang','kobe']})
df = df.set_index('id')
df.to_excel(filename)
2.读取一个excel
- 使用read_excel(‘filename’,header=),None时是没有头,1是以第二行为头,以此类推
- shape:返回行列数目
- columns:访问和设置列名
- head(),tail():前后几行
- set_index(‘ID’, inplace=True) 第一个是以某列为索引,设置inplace为True避免使用people = people.set_index()
- List item
import pandas as pd
people = pd.read_excel('People.xlsx')#header=None, 1
print(people.shape)
people.columns = ['ID', 'Type', 'Title', 'FirstName', 'MiddleName', 'LastName']
people.set_index('ID',inplace=True)
people.to_excel('People.xlsx')
print(people.columns)
print(people.head())
print('-----------'*10)
print(people.tail())
3.行、列、单元格
3.1.序列Series:与字典d = {‘x’:100,‘y’:200,‘z’:300}很像,两种方式创建
import pandas as pd
d = {'x':100,'y':200,'z':300}
s1 = pd.Series(d)
l1 = [100,200,300]
l2 = ['x','y','z']
s2 = pd.Series(l1,index=l2)
3.2.Series与DataFrame创建excel
s1 = pd.Series([1,2,3],index=[1,2,3],name='A')
s2 = pd.Series([10,20,30],index=[1,2,3],name='B')
s3 = pd.Series([100,200,300],index=[2,3,4],name='C')#索引并集,填充nan
df = pd.DataFrame({s1.name:s1,s2.name:s2,s3.name:s3})
# df = pd.DataFrame([s1,s2,s3])#反转了
df.to_excel('filename')
3.3.数据填充
- skiprows:跳过几行
- usecols:使用哪几列
- index_col:设置索引
- dtype:转换数据类型,当空值时默认是float64型,要转换下才可操作
- .at[] :定位到索引值所在行
import pandas as pd
from datetime import date,timedelta
def add_month(d,md):
yd = md//12
m = d.month + md%12
if m != 12:
yd += m//12
m = m % 12
return date(d.year+yd,m,d.day)
books = pd.read_excel('Books.xlsx',skiprows=3,usecols='C:F',index_col=None,dtype={'ID':str,'InStore':str,'Date':str})
start = date(2018,1,1)
for i in books.index:
# books.at[i,'ID'] = i + 1 #根据DataFrame直接修改
books['ID'].at[i]= i +1 #变成Series修改
books['InStore'].at[i]= 'Yes' if i%2==0 else 'No'
# books['Date'].at[i] = start + timedelta(days=i)
books['Date'].at[i] = add_month(start,i)
print(books)
3.4.计算填充
- 将列进行运算填充到另一列
import pandas as pd
books = pd.read_excel('books.xlsx',index_col="ID")
books['Price'] = books['ListPrice']*books['Discount']
# books['Price'] = books['Price'].apply(lambda x:x+2) #列自运算
books.to_excel('booksop.xlsx')
print(books)
3.5数据排序
- sort_valuse:默认ascendin=False是从大到小
import pandas as pd
s = pd.read_excel('List.xlsx',index_col='ID')
s.sort_values(by=['Worthy','Price'],inplace=True,ascending=[True,False])
print(s)
3.6数据筛选
def age_range(a):
return 18<=a<30
def level_a(g):
return 85<=g<+100
import pandas as pd
s = pd.read_excel('Students.xlsx',index_col='ID')
s = s.loc[s['Age'].apply(age_range)].loc[s['Score'].apply(level_a)]
# s = s.loc[s['Age'].apply(lambda a:18<=a<30)]\
# .loc[s['Score'].apply(lambda g:85<=g<+100)]
print(s)