1. DataFrame的创建
1.1. 从已有数据集创建
1.1.1. 从excel文件中读取
import numpy as np
import pandas as pd
df1 = pd.read_excel('./data/messi club data.xls', header = 0) # import messi's club data
df2 = pd.read_excel('./data/messi national team data.xls', header = 0) # import messi's national team data
display(df1.head())
display(df2.head())
如果第一行不是列名,而是数据元素,那么可以改成
df1 = pd.read_excel('./data/messi club data.xls', header = None) # import messi's club data
df2 = pd.read_excel('./data/messi national team data.xls', header = None) # import messi's national team data
display(df1.head())
display(df2.head())
1.1.2. 从csv文件中读取
import pandas as pd
df = pd.read_csv('./data/taiwan_dataset.csv', header=0)
df.head()
1.2. 自己创建DataFrame
1.2.1. 基于字典dictionary创建
基于固定数据的方式创建:
data = {
"age": [19, 17, 22, 27],
"sex": ['male', 'female', 'male', 'male']}
df = pd.DataFrame(data)
df
值得注意的是,当每一列只有一个数据时,仍然需要加 [ ],或者加“index=[0]”。
data = {
"西甲总获胜场次": laliga_total_win_num,
"西甲主场获胜场次": laliga_home_win_num,
"西甲客场获胜场次": laliga_away_win_num,
"西甲总进球场次": laliga_total_havegoals_num,
"西甲主场进球场次": laliga_home_havegoals_num,
"西甲客场进球场次": laliga_away_havegoals_num,
"西甲主场进球数": laliga_home_goals,
"西甲客场进球数": laliga_away_goals}
laliga_moredata = pd.DataFrame(data, index=[0])
laliga_moredata
基于随机生成的方式创建:
import numpy as np
data = {
"a": np.random.rand(3),
"b": np.random.rand(3),
"c": np.random.rand(3)}
df = pd.DataFrame(data)
df
1.2.1. 基于数组array创建
基于固定数据的方式创建:
data = np.array([[1,2,3,4],
[5,6,7,8],
[9,10,11,12],
[13,14,15,16]])
df = pd.DataFrame(data)
df
基于随机生成的方式创建:
data = np.random