Gitmodel|数据分析 Task1

最新推荐文章于 2023-03-24 11:32:28 发布

wilima84

最新推荐文章于 2023-03-24 11:32:28 发布

阅读量935

点赞数 1

文章标签：数据分析 pandas python

本文链接：https://blog.csdn.net/weixin_46000743/article/details/125773303

版权

Pandas 动手学入门

本课程基于DataWhale开源教程《Joyful-Pandas》与开源项目 ajcr/100-pandas-puzzles其中译版开发. 在数据分析的实践中，Pandas是一个非常重要的库，它集成了非常多的统计分析功能，《Joyful-Pandas》是一份非常详尽，并且适合新手的Pandas中文教程，同时它也是Pandas官方认证的中文教程！但笔者学习过程中，始终觉得编程学习“纸上得来终觉浅”，对于刚接触一个新的库的“躬行”笔者认为最好的方式就是对照着官方文档以及QuickStart吧例子自己尝试一遍，并且在实践中学会如何查文档，这是一个开发者必备的技能.因此本教程就是基于这样的思想，给出对应的操作以及对应的文档地址，学习者需要自行查阅文档，完成下面的实践.

餐前准备

在这部分中，我们主要学习如何导入Pandas以及查看版本信息，难度是比较简单的.

请你阅读Joyful-Pandas第二章，完成下方练习.
http://joyfulpandas.datawhale.club/Content/ch2.html

1. 以 pd 别名导入 pandas 库

import pandas as pd

2. 打印出pandas 库的版本信息

print('pandas库的版本信息是：{}'.format(pd.__version__))

pandas库的版本信息是：1.2.4

3. 打印 pandas 依赖包及其版本信息

pd.show_versions(True)

{'system': {'commit': '2cb96529396d93b46abab7bbc73a208e708c642e', 'python': '3.8.8.final.0', 'python-bits': 64, 'OS': 'Windows', 'OS-release': '10', 'Version': '10.0.19041', 'machine': 'AMD64', 'processor': 'Intel64 Family 6 Model 165 Stepping 5, GenuineIntel', 'byteorder': 'little', 'LC_ALL': None, 'LANG': None, 'LOCALE': {'language-code': 'Chinese (Simplified)_China', 'encoding': '936'}}, 'dependencies': {'pandas': '1.2.4', 'numpy': '1.22.3', 'pytz': '2021.1', 'dateutil': '2.8.1', 'pip': '21.0.1', 'setuptools': '52.0.0.post20210125', 'Cython': '0.29.23', 'pytest': '6.2.3', 'hypothesis': None, 'sphinx': '4.0.1', 'blosc': None, 'feather': None, 'xlsxwriter': '1.3.8', 'lxml.etree': '4.6.3', 'html5lib': '1.1', 'pymysql': None, 'psycopg2': None, 'jinja2': '2.11.3', 'IPython': '7.22.0', 'pandas_datareader': None, 'bs4': '4.9.3', 'bottleneck': '1.3.2', 'fsspec': '0.9.0', 'fastparquet': None, 'gcsfs': None, 'matplotlib': '3.3.4', 'numexpr': '2.7.3', 'odfpy': None, 'openpyxl': '3.0.7', 'pandas_gbq': None, 'pyarrow': None, 'pyxlsb': None, 's3fs': None, 'scipy': '1.6.2', 'sqlalchemy': '1.4.7', 'tables': '3.6.1', 'tabulate': None, 'xarray': None, 'xlrd': '2.0.1', 'xlwt': '1.3.0', 'numba': '0.53.1'}}

备齐食材

在这部分中，主要介绍了Pandas一个重要的数据结构DataFrame

请你阅读Joyful-Pandas第二章，完成下方练习.
http://joyfulpandas.datawhale.club/Content/ch2.html

4. 使用数据 data 和行索引 labels 创建一个 DataFrame df

import numpy as np

有下面这样的一个数据字典 data 以及列表格式的标签数据 labels:

data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
        'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
        'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(data, index=labels)
df

	animal	age	visits	priority
a	cat	2.5	1	yes
b	cat	3.0	3	yes
c	snake	0.5	2	no
d	dog	NaN	3	yes
e	dog	5.0	2	no
f	cat	2.0	3	no
g	snake	4.5	1	no
h	cat	NaN	1	yes
i	dog	7.0	2	no
j	dog	3.0	1	no

5. 显示该 DataFrame 及其数据相关的基本信息（提示：DataFrame 直接调用的方法）

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, a to j
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   animal    10 non-null     object 
 1   age       8 non-null      float64
 2   visits    10 non-null     int64  
 3   priority  10 non-null     object 
dtypes: float64(1), int64(1), object(2)
memory usage: 400.0+ bytes

df.describe()

	age	visits
count	8.000000	10.000000
mean	3.437500	1.900000
std	2.007797	0.875595
min	0.500000	1.000000
25%	2.375000	1.000000
50%	3.000000	2.000000
75%	4.625000	2.750000
max	7.000000	3.000000

6. 返回 DataFrame df 的前4行数据

df.head(4)

	animal	age	visits	priority
a	cat	2.5	1	yes
b	cat	3.0	3	yes
c	snake	0.5	2	no
d	dog	NaN	3	yes

7. 从 DataFrame df 选择标签为 animal 和 age 的列

df[['animal','age']]

	animal	age
a	cat	2.5
b	cat	3.0
c	snake	0.5
d	dog	NaN
e	dog	5.0
f	cat	2.0
g	snake	4.5
h	cat	NaN
i	dog	7.0
j	dog	3.0

8. 在 [3, 4, 8] 行中，列为 ['animal', 'age'] 的数据

df.iloc[[3,4,8]][['animal','age']]

	animal	age
d	dog	NaN
e	dog	5.0
i	dog	7.0

df.loc[df.index[[3,4,8]],['animal','age']]

	animal	age
d	dog	NaN
e	dog	5.0
i	dog	7.0

9. 选择列visits大于 1 的行

df[df['visits']>1]

	animal	age	visits	priority
b	cat	3.0	3	yes
c	snake	0.5	2	no
d	dog	NaN	3	yes
e	dog	5.0	2	no
f	cat	2.0	3	no
i	dog	7.0	2	no

df.loc[df['visits']>1]

	animal	age	visits	priority
b	cat	3.0	3	yes
c	snake	0.5	2	no
d	dog	NaN	3	yes
e	dog	5.0	2	no
f	cat	2.0	3	no
i	dog	7.0	2	no

10. 选择 age 为缺失值的行

df[df['age'].isnull()]

	animal	age	visits	priority
d	dog	NaN	3	yes
h	cat	NaN	1	yes

df[df['age'].isna()]

	animal	age	visits	priority
d	dog	NaN	3	yes
h	cat	NaN	1	yes

11. 选择 animal 是cat且age 小于 3 的行

df[(df['animal']=='cat') & (df['age']<3)]

	animal	age	visits	priority
a	cat	2.5	1	yes
f	cat	2.0	3	no

df.query('animal=="cat" and age<3')

	animal	age	visits	priority
a	cat	2.5	1	yes
f	cat	2.0	3	no

12. 选择 age 在 2 到 4 之间的数据（包含边界值）

df[df['age'].between(2,4)]

	animal	age	visits	priority
a	cat	2.5	1	yes
b	cat	3.0	3	yes
f	cat	2.0	3	no
j	dog	3.0	1	no

df.query('2<=age<=4')

	animal	age	visits	priority
a	cat	2.5	1	yes
b	cat	3.0	3	yes
f	cat	2.0	3	no
j	dog	3.0	1	no

13. 将 ‘f’ 行的 age 改为 1.5

df.loc['f','age']=1.5
df

	animal	age	visits	priority
a	cat	2.5	1	yes
b	cat	3.0	3	yes
c	snake	0.5	2	no
d	dog	NaN	3	yes
e	dog	5.0	2	no
f	cat	1.5	3	no
g	snake	4.5	1	no
h	cat	NaN	1	yes
i	dog	7.0	2	no
j	dog	3.0	1	no

14. 对 visits 列的数据求和

df['visits'].sum()

15. 计算每种 animal age 的平均值

df.groupby('animal')['age'].mean()

animal
cat      2.333333
dog      5.000000
snake    2.500000
Name: age, dtype: float64

16. 新增一行数据 k，数据自定义，然后再删除新追加的 k 行

df.loc['k'] = [6.0,'snake',2,'no']
df

	animal	age	visits	priority
a	cat	2.5	1	yes
b	cat	3.0	3	yes
c	snake	0.5	2	no
d	dog	NaN	3	yes
e	dog	5.0	2	no
f	cat	1.5	3	no
g	snake	4.5	1	no
h	cat	NaN	1	yes
i	dog	7.0	2	no
j	dog	3.0	1	no
k	6.0	snake	2	no

df.drop('k',inplace=True)

17. 统计每种 animal 的个数

df['animal'].value_counts()

cat      4
dog      4
snake    2
Name: animal, dtype: int64

df.groupby('animal')['animal'].count()

animal
cat      4
dog      4
snake    2
Name: animal, dtype: int64

18. 先根据 age 降序排列，再根据 visits 升序排列（结果 i 列在前面，d 列在最后面）

df.sort_values(by=['age','visits'], ascending=[False, True], inplace=True)
df

	animal	age	visits	priority
i	dog	7.0	2	no
e	dog	5.0	2	no
g	snake	4.5	1	no
j	dog	3.0	1	no
b	cat	3.0	3	yes
a	cat	2.5	1	yes
f	cat	1.5	3	no
c	snake	0.5	2	no
h	cat	NaN	1	yes
d	dog	NaN	3	yes

19. 将 priority 列的 yes 和 no 用 True 和 False 替换

df['priority'] = df['priority'].map({'yes':True,'no':False})
df

	animal	age	visits	priority
i	dog	7.0	2	False
e	dog	5.0	2	False
g	snake	4.5	1	False
j	dog	3.0	1	False
b	cat	3.0	3	True
a	cat	2.5	1	True
f	cat	1.5	3	False
c	snake	0.5	2	False
h	cat	NaN	1	True
d	dog	NaN	3	True

20. 将 animal 列的 snake 用 python 替换

df['animal'] = df['animal'].replace('snake','python')

21. 对于每种 animal 和 visit，求出平均年龄。换句话说，每一行都是动物，每一列都是访问次数，其值是平均年龄（提示：使用数据透视表）

df.pivot_table(index='animal', columns='visits', values='age', aggfunc='mean')

visits	1	2	3
animal
cat	2.5	NaN	2.5
dog	3.0	6.0	NaN
python	4.5	0.5	NaN

22. 读取data文件夹下的boston.csv文件，并尝试利用上面的方法自行分析，得出一些简单的结论.

df2 = pd.read_csv('./data/boston.csv')
df2

	CRIM	ZN	INDUS	CHAS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT	MEDV
0	0.00632	18.0	2.31	0	0.538	6.575	65.2	4.0900	1	296.0	15.3	396.90	4.98	24.0
1	0.02731	0.0	7.07	0	0.469	6.421	78.9	4.9671	2	242.0	17.8	396.90	9.14	21.6
2	0.02729	0.0	7.07	0	0.469	7.185	61.1	4.9671	2	242.0	17.8	392.83	4.03	34.7
3	0.03237	0.0	2.18	0	0.458	6.998	45.8	6.0622	3	222.0	18.7	394.63	2.94	33.4
4	0.06905	0.0	2.18	0	0.458	7.147	54.2	6.0622	3	222.0	18.7	396.90	5.33	36.2
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
501	0.06263	0.0	11.93	0	0.573	6.593	69.1	2.4786	1	273.0	21.0	391.99	9.67	22.4
502	0.04527	0.0	11.93	0	0.573	6.120	76.7	2.2875	1	273.0	21.0	396.90	9.08	20.6
503	0.06076	0.0	11.93	0	0.573	6.976	91.0	2.1675	1	273.0	21.0	396.90	5.64	23.9
504	0.10959	0.0	11.93	0	0.573	6.794	89.3	2.3889	1	273.0	21.0	393.45	6.48	22.0
505	0.04741	0.0	11.93	0	0.573	6.030	80.8	2.5050	1	273.0	21.0	396.90	7.88	11.9

506 rows × 14 columns

df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 14 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   CRIM     506 non-null    float64
 1   ZN       506 non-null    float64
 2   INDUS    506 non-null    float64
 3   CHAS     506 non-null    int64  
 4   NOX      506 non-null    float64
 5   RM       506 non-null    float64
 6   AGE      506 non-null    float64
 7   DIS      506 non-null    float64
 8   RAD      506 non-null    int64  
 9   TAX      506 non-null    float64
 10  PTRATIO  506 non-null    float64
 11  B        506 non-null    float64
 12  LSTAT    506 non-null    float64
 13  MEDV     506 non-null    float64
dtypes: float64(12), int64(2)
memory usage: 55.5 KB

df2.describe()

	CRIM	ZN	INDUS	CHAS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT	MEDV
count	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000
mean	3.613524	11.363636	11.136779	0.069170	0.554695	6.284634	68.574901	3.795043	9.549407	408.237154	18.455534	356.674032	12.653063	22.532806
std	8.601545	23.322453	6.860353	0.253994	0.115878	0.702617	28.148861	2.105710	8.707259	168.537116	2.164946	91.294864	7.141062	9.197104
min	0.006320	0.000000	0.460000	0.000000	0.385000	3.561000	2.900000	1.129600	1.000000	187.000000	12.600000	0.320000	1.730000	5.000000
25%	0.082045	0.000000	5.190000	0.000000	0.449000	5.885500	45.025000	2.100175	4.000000	279.000000	17.400000	375.377500	6.950000	17.025000
50%	0.256510	0.000000	9.690000	0.000000	0.538000	6.208500	77.500000	3.207450	5.000000	330.000000	19.050000	391.440000	11.360000	21.200000
75%	3.677083	12.500000	18.100000	0.000000	0.624000	6.623500	94.075000	5.188425	24.000000	666.000000	20.200000	396.225000	16.955000	25.000000
max	88.976200	100.000000	27.740000	1.000000	0.871000	8.780000	100.000000	12.126500	24.000000	711.000000	22.000000	396.900000	37.970000	50.000000

从df2.info()可以看出14列数据种不存在缺失值，从de2.describe()中可以看出数据最小值为0，也就是数据不存在负值，另外诸如ZH列、TAX列以及B列数据分布范围较广且方差较大，可能存在异常值，需要通过统计分析进一步进行处理。

至此，相信你已经可以通过阅读文档，掌握了pandas中DataFrame的一些基本操作，下面我们来讲点更进一步的操作，它是上述这些操作的一些巧妙结合

食材搭配

在这个部分中，我们主要介绍对于DataFrame中的数据的一些操作：去重、数据计算、取值······

请你阅读Joyful-Pandas第二章，完成下方练习. http://joyfulpandas.datawhale.club/Content/ch2.html

df = pd.DataFrame({'A': [1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 7]})
df

	A
0	1
1	2
2	2
3	3
4	4
5	5
6	5
7	5
8	6
9	7
10	7

23. 请用pandas操作输出上面的df中A列出现的元素的唯一值（即：出现过的所有元素的集合）

df['A'].unique()

array([1, 2, 3, 4, 5, 6, 7], dtype=int64)

24. 输出df中的唯一值及其对应出现的频数

df['A'].value_counts()

5    3
2    2
7    2
1    1
3    1
4    1
6    1
Name: A, dtype: int64

25. 将df进行数据降重

df.drop_duplicates('A')

	A
0	1
1	2
3	3
4	4
5	5
8	6
9	7

26. 给定一组随机数据

df = pd.DataFrame(np.random.random(size=(5, 3))) 
df

	0	1	2
0	0.580241	0.632042	0.354775
1	0.122504	0.784829	0.920090
2	0.231789	0.597295	0.149788
3	0.568011	0.135924	0.011110
4	0.353484	0.835216	0.300393

使每个元素减去所在行的平均值？

df.sub(df.mean(1),axis=0)

	0	1	2
0	0.057888	0.109689	-0.167577
1	-0.486637	0.175688	0.310949
2	-0.094502	0.271004	-0.176502
3	0.329663	-0.102424	-0.227239
4	-0.142881	0.338852	-0.195971

27. 返回下列df数字总和最小那列的标签

df = pd.DataFrame(np.random.random(size=(5, 10)), columns=list('abcdefghij'))
df

	a	b	c	d	e	f	g	h	i	j
0	0.681472	0.375433	0.157866	0.844610	0.515149	0.843195	0.482867	0.019393	0.146855	0.819014
1	0.718139	0.829071	0.240966	0.995575	0.388320	0.178618	0.379783	0.418137	0.549539	0.939299
2	0.299162	0.336460	0.445597	0.935320	0.964780	0.669580	0.150017	0.983547	0.028738	0.398523
3	0.155546	0.860910	0.719335	0.989113	0.032604	0.180492	0.867443	0.021331	0.559367	0.466884
4	0.181180	0.062531	0.813016	0.982059	0.113649	0.412184	0.857099	0.564338	0.420419	0.322087

df.sum(axis=0).sort_values().index[0]

'i'

df.sum(axis=0).idxmin()

'i'

28. 计算一个 DataFrame 有多少不重复的行？

df = pd.DataFrame(np.random.randint(0, 2, size=(10, 3)))
df

	0	1	2
0	1	0	0
1	1	0	1
2	1	0	1
3	0	0	1
4	0	1	1
5	1	1	1
6	0	1	0
7	1	1	1
8	0	1	1
9	1	1	1

df.drop_duplicates().shape[0]

29. DataFrame 数据如下，A 和 B 都是 0-100 之间（包括边界值）的数值，对 A 进行分段分组（i.e. (0, 10], (10, 20], …），求每组内 B 的和。输出应该和下述一致：

A
(0, 10]      635
(10, 20]     360
(20, 30]     315
(30, 40]     306
(40, 50]     750
(50, 60]     284
(60, 70]     424
(70, 80]     526
(80, 90]     835
(90, 100]    852

df = pd.DataFrame(np.random.RandomState(8765).randint(1, 101, 
                  size=(100, 2)), 
                  columns = ["A", "B"])
df

	A	B
0	46	29
1	75	22
2	49	63
3	33	43
4	71	75
...	...	...
95	60	87
96	57	40
97	86	19
98	50	56
99	97	94

100 rows × 2 columns

df.groupby(pd.cut(df['A'], bins=range(0, 110, 10), right=True))['B'].sum()

A
(0, 10]      635
(10, 20]     360
(20, 30]     315
(30, 40]     306
(40, 50]     750
(50, 60]     284
(60, 70]     424
(70, 80]     526
(80, 90]     835
(90, 100]    852
Name: B, dtype: int32

用pandas进行数据清洗

下面的df是我们用到的数据集

df = pd.DataFrame({'From_To': ['LoNDon_paris', 'MAdrid_miLAN', 'londON_StockhOlm', 
                               'Budapest_PaRis', 'Brussels_londOn'],
              'FlightNumber': [10045, np.nan, 10065, np.nan, 10085],
              'RecentDelays': [[23, 47], [], [24, 43, 87], [13], [67, 32]],
                   'Airline': ['KLM(!)', '<Air France> (12)', '(British Airways. )', 
                               '12. Air France', '"Swiss Air"']})
df

	From_To	FlightNumber	RecentDelays	Airline
0	LoNDon_paris	10045.0	[23, 47]	KLM(!)
1	MAdrid_miLAN	NaN	[]	<Air France> (12)
2	londON_StockhOlm	10065.0	[24, 43, 87]	(British Airways. )
3	Budapest_PaRis	NaN	[13]	12. Air France
4	Brussels_londOn	10085.0	[67, 32]	"Swiss Air"

30. FlightNumber列中的某些值缺失（它们是NaN）。这些数字是有规律的，即每行增加 10，因此NaN需要放置 10055 和 10075。修改df以填充这些缺失的数字并使该列成为整数列（而不是浮点列）

df['FlightNumber'] = df['FlightNumber'].interpolate().astype(int)  # 插值函数为linear
df

	From_To	FlightNumber	RecentDelays	Airline
0	LoNDon_paris	10045	[23, 47]	KLM(!)
1	MAdrid_miLAN	10055	[]	<Air France> (12)
2	londON_StockhOlm	10065	[24, 43, 87]	(British Airways. )
3	Budapest_PaRis	10075	[13]	12. Air France
4	Brussels_londOn	10085	[67, 32]	"Swiss Air"

31. From_To列作为两个单独的列会更好！拆分下划线分隔符_前后的每个字符串. 将其拆分成两列，存放在一个名为“temp”的临时 DataFrame，将列名 ‘From’ 和 ‘To’ 分配给这个临时DataFrame.

temp = df['From_To'].str.split('_',expand=True)
temp.columns = ['From','To']
temp

	From	To
0	LoNDon	paris
1	MAdrid	miLAN
2	londON	StockhOlm
3	Budapest	PaRis
4	Brussels	londOn

32. 注意城市名称的大小写是混合在一起的。标准化字符串，以便只有第一个字母是大写的（例如“londON”应该变成“London”。）

temp['From'] = temp['From'].str.capitalize()
temp['To'] = temp['To'].str.capitalize()
temp

	From	To
0	London	Paris
1	Madrid	Milan
2	London	Stockholm
3	Budapest	Paris
4	Brussels	London

33. 将From_To列从df中删去，将temp处理好的数据合并到df中

df = df.drop('From_To', axis=1)
df2 = pd.concat([df, temp], axis=1)
df2

	FlightNumber	RecentDelays	Airline	From	To
0	10045	[23, 47]	KLM(!)	London	Paris
1	10055	[]	<Air France> (12)	Madrid	Milan
2	10065	[24, 43, 87]	(British Airways. )	London	Stockholm
3	10075	[13]	12. Air France	Budapest	Paris
4	10085	[67, 32]	"Swiss Air"	Brussels	London

**34.**在AirLine列中，您可以看到航空公司名称周围出现了一些额外的符号。只提取航空公司名称。例如’(British Airways. )‘应该变成’British Airways’.

df2['Airline'] = df2['Airline'].str.extract('([a-zA-Z\s]+)', expand=False).str.strip()

df2

	FlightNumber	RecentDelays	Airline	From	To
0	10045	[23, 47]	KLM	London	Paris
1	10055	[]	Air France	Madrid	Milan
2	10065	[24, 43, 87]	British Airways	London	Stockholm
3	10075	[13]	Air France	Budapest	Paris
4	10085	[67, 32]	Swiss Air	Brussels	London

35. 在 RecentDelays 列中，值已作为列表输入到 DataFrame 中。我们希望每个第一个值在它自己的列中，每个第二个值在它自己的列中，依此类推。如果没有第 N 个值，则该值应为 NaN。

将 Series 列表展开为名为的 DataFrame delays，重命名列delay_1，delay_2等等，并将不需要的 RecentDelays 列替换df为delays

df3=df2['RecentDelays'].apply(pd.Series)
df3

	0	1	2
0	23.0	47.0	NaN
1	NaN	NaN	NaN
2	24.0	43.0	87.0
3	13.0	NaN	NaN
4	67.0	32.0	NaN

df3.columns = ['delay_{}'.format(n) for n in range(1, len(df3.columns)+1)]  
df2 = df2.drop('RecentDelays', axis=1).join(df3) 
df2

	FlightNumber	Airline	From	To	delay_1	delay_2	delay_3
0	10045	KLM	London	Paris	23.0	47.0	NaN
1	10055	Air France	Madrid	Milan	NaN	NaN	NaN
2	10065	British Airways	London	Stockholm	24.0	43.0	87.0
3	10075	Air France	Budapest	Paris	13.0	NaN	NaN
4	10085	Swiss Air	Brussels	London	67.0	32.0	NaN

43, 87]
British Airways
London
Stockholm

3
10075
[13]
Air France
Budapest
Paris

4
10085
[67, 32]
Swiss Air
Brussels
London

将 Series 列表展开为名为的 DataFrame delays，重命名列delay_1，delay_2等等，并将不需要的 RecentDelays 列替换df为delays

df3=df2['RecentDelays'].apply(pd.Series)
df3

	0	1	2
0	23.0	47.0	NaN
1	NaN	NaN	NaN
2	24.0	43.0	87.0
3	13.0	NaN	NaN
4	67.0	32.0	NaN

df3.columns = ['delay_{}'.format(n) for n in range(1, len(df3.columns)+1)]  
df2 = df2.drop('RecentDelays', axis=1).join(df3) 
df2

	FlightNumber	Airline	From	To	delay_1	delay_2	delay_3
0	10045	KLM	London	Paris	23.0	47.0	NaN
1	10055	Air France	Madrid	Milan	NaN	NaN	NaN
2	10065	British Airways	London	Stockholm	24.0	43.0	87.0
3	10075	Air France	Budapest	Paris	13.0	NaN	NaN
4	10085	Swiss Air	Brussels	London	67.0	32.0	NaN