- Pandas是一个强大的Python数据分析的工具包,是基于Numpy构建的
- Pandas的主要功能
- 具备对其功能的数据结构DataFrame、Series
- 集成时间序列功能
- 提供丰富的数学运算和操作
- 灵活处理缺失数据
- 安装方法:pip install pandas
- 引用方法:import pandas as pd
1 Series-一维数据对象
- Series是一种类似于一维数组的对象,由一维数据和一组与之相关的数据标签(索引)组成。
- 创建方式:pd.Series([1,2,3])
- 获取值数组和索引数组:values属性和index属性
- Series比较像列表(数组)和字典的结合体
1.1 Series-使用特性
- Series支持array的特性(下标):
- 从arrar创建Series:Series(array)
- 与标量进行运算:sr*2
- 两个长度一样的Series运算:sr1+sr2
- 索引:sr[0], sr[[1,2,4]]
- 切片:sr[0:2]
- 通用函数:np.abs(sr)
- 布尔值索引:sr[sr>5]
- Series支持字典的特性(标签):
- 从字典创建Series:Series(dict)
- in运算:‘a’ in sr
- 键索引:sr[‘a’], sr[[‘a’,‘b’,‘d’]]
import pandas as pd
pd.Series([2,3,4,5])
0 2
1 3
2 4
3 5
dtype: int64
pd.Series([2,3,4,5], index=['a','b','c','d'])
a 2
b 3
c 4
d 5
dtype: int64
import numpy as np
pd.Series(np.arange(5))
0 0
1 1
2 2
3 3
4 4
dtype: int32
pd.Series({'a':1,
'b':2,
'c':3,
'd':4})
a 1
b 2
c 3
d 4
dtype: int64
sr = pd.Series([2,3,4,5], index=['a','b','c','d'])
sr
a 2
b 3
c 4
d 5
dtype: int64
sr[0]
2
sr['a']
2
sr+2
a 4
b 5
c 6
d 7
dtype: int64
sr+sr
a 4
b 6
c 8
d 10
dtype: int64
sr[0:2]
a 2
b 3
dtype: int64
np.sum(sr)
14
sr[sr>3]
c 4
d 5
dtype: int64
sr = pd.Series({'a':1,'b':2})
sr
a 1
b 2
dtype: int64
'a' in sr
True
sr.index
Index(['a', 'b'], dtype='object')
sr.values
array([1, 2], dtype=int64)
sr = pd.Series([2,3,4,5], index=['a','b','c','d'])
sr
a 2
b 3
c 4
d 5
dtype: int64
sr[['a','c']]
a 2
c 4
dtype: int64
sr['a':'c']
a 2
b 3
c 4
dtype: int64
1.2 Series-整数索引
sr = pd.Series(np.arange(20))
sr
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
dtype: int32
sr2 = sr[10:].copy()
sr2
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
dtype: int32
sr2[10]
10
sr2.loc[10]
10
sr2.iloc[9]
19
sr2.iloc[0:3]
10 10
11 11
12 12
dtype: int32
1.3 Series-数据对齐
sr1 = pd.Series([12,23,34],index=['c','a','d'])
sr2 = pd.Series([11,20,10],index=['d','c','a'])
sr1+sr2
a 33
c 32
d 45
dtype: int64
Pandas在运行两个Series对象的运算时,会按索引(标签)进行对齐,然后运算。
sr1 = pd.Series([12,23,34],index=['c','a','d'])
sr2 = pd.Series([11,20,10,21],index=['d','c','a','b'])
sr1+sr2
a 33.0
b NaN
c 32.0
d 45.0
dtype: float64
sr1 = pd.Series([12,23,34],index=['c','a','d'])
sr2 = pd.Series([11,20,10],index=['b','c','a'])
sr1+sr2
a 33.0
b NaN
c 32.0
d NaN
dtype: float64
sr1 = pd.Series([12,23,34],index=['c','a','d'])
sr2 = pd.Series([11,20,10],index=['b','c','a'])
sr1.add(sr2, fill_value=0)
a 33.0
b 11.0
c 32.0
d 34.0
dtype: float64
1.4 Series-缺失数据
sr = sr1+sr2
sr.isnull()
a False
b True
c False
d True
dtype: bool
sr[sr.notnull()]
a 33.0
c 32.0
dtype: float64
sr[~sr.isnull()]
a 33.0
c 32.0
dtype: float64
sr.dropna()
a 33.0
c 32.0
dtype: float64
sr.fillna(0)
a 33.0
b 0.0
c 32.0
d 0.0
dtype: float64
sr
a 33.0
b NaN
c 32.0
d NaN
dtype: float64
sr.fillna(sr.mean())
a 33.0
b 32.5
c 32.0
d 32.5
dtype: float64
2 DataFrame-二维数据对象
- DataFrame是一个表格型的数据结构,含有一组有序的列。DataFrame可以被看做是由Series组成的字典,并且共用一个行索引。
- 创建方式:
- pd.DataFrame({‘one’:[1,3.5.7],‘two’:[2,4,6,8]})
- csv文件读取与写入:
- df.read_csv(‘filename.csv’)
- df.to_csv(‘filename.csv’)
d1 = pd.DataFrame({'one':[1,3,5,7],'two':[2,4,6,8]})
d1
d1 = pd.DataFrame({'one':[1,3,5,7],'two':[2,4,6,8]},index=['a','b','c','d'])
d1
d1 = pd.DataFrame({'one':pd.Series([1,3,5],index=['a','b','c']),'two':pd.Series([2,4,6,8],index=['a','b','c','d'])})
d1
| one | two |
---|
a | 1.0 | 2 |
---|
b | 3.0 | 4 |
---|
c | 5.0 | 6 |
---|
d | NaN | 8 |
---|
d1.dtypes
one float64
two int64
dtype: object
注:
- 1.当使用Series组成一个DataFrame的时候,两个Series位置按照标签对齐;
- 2.因为有nan(浮点型),所以“one”整列自动变成浮点型。
d2 = pd.read_csv('test.csv')
d2
d1.to_csv('test2.csv')
2.1 DataFrame-常用属性
- index:获取索引
- T:转置
- columns:获取列索引
- values:获取值数组
- describe():获取快速统计(这是一个方法)
d1 = pd.DataFrame({'one':pd.Series([1,3,5],index=['a','b','c']),'two':pd.Series([2,4,6,8],index=['a','b','c','d'])})
d1
| one | two |
---|
a | 1.0 | 2 |
---|
b | 3.0 | 4 |
---|
c | 5.0 | 6 |
---|
d | NaN | 8 |
---|
d1.values
array([[ 1., 2.],
[ 3., 4.],
[ 5., 6.],
[nan, 8.]])
d1.T
| a | b | c | d |
---|
one | 1.0 | 3.0 | 5.0 | NaN |
---|
two | 2.0 | 4.0 | 6.0 | 8.0 |
---|
d1.dtypes
one float64
two int64
dtype: object
d1.describe()
| one | two |
---|
count | 3.0 | 4.000000 |
---|
mean | 3.0 | 5.000000 |
---|
std | 2.0 | 2.581989 |
---|
min | 1.0 | 2.000000 |
---|
25% | 2.0 | 3.500000 |
---|
50% | 3.0 | 5.000000 |
---|
75% | 4.0 | 6.500000 |
---|
max | 5.0 | 8.000000 |
---|
2.2 DataFrame-索引切片
- DataFrame是一个二维数据类型,所以有行索引和列索引。
- DataFrame同样可以通过标签和位置两种方法进行索引和切片
- loc属性和iloc属性
- 使用方法:逗号隔开,前面是行索引,后面是列索引
- 行/列索引部分可以是常规索引、切片、布尔值索引、花式索引任意搭配
d1 = pd.DataFrame({'one':pd.Series([1,3,5],index=['a','b','c']),'two':pd.Series([2,4,6,8],index=['a','b','c','d'])})
d1
| one | two |
---|
a | 1.0 | 2 |
---|
b | 3.0 | 4 |
---|
c | 5.0 | 6 |
---|
d | NaN | 8 |
---|
d1['one']['a']
1.0
d1.loc['a','one']
1.0
d1.loc['a',:]
one 1.0
two 2.0
Name: a, dtype: float64
d1.loc['d','one']
nan
d1.loc[d1.one.isnull(),'one']
d NaN
Name: one, dtype: float64
2.3 DataFrame-数据对齐与缺失值处理
- DataFrame对象在运算时,同样会进行数据对齐,其行索引和列索引分别对齐。
- DataFrame处理缺失数据的方法:
- dropna(axis=0,how=‘any’)
- fillna()
- isnull()
- notnull()
d2 = pd.DataFrame({'two':[1,2,3,4],'one':[4,5,6,7]},index=['c','d','b','a'])
d2
d1+d2
| one | two |
---|
a | 8.0 | 6 |
---|
b | 9.0 | 7 |
---|
c | 9.0 | 7 |
---|
d | NaN | 10 |
---|
d1.fillna(0)
| one | two |
---|
a | 1.0 | 2 |
---|
b | 3.0 | 4 |
---|
c | 5.0 | 6 |
---|
d | 0.0 | 8 |
---|
d1.dropna()
import numpy as np
d1.loc['d','two'] = np.nan
d1.loc['c','two'] = np.nan
d1
| one | two |
---|
a | 1.0 | 2.0 |
---|
b | 3.0 | 4.0 |
---|
c | 5.0 | NaN |
---|
d | NaN | NaN |
---|
d1.dropna(how='all')
| one | two |
---|
a | 1.0 | 2.0 |
---|
b | 3.0 | 4.0 |
---|
c | 5.0 | NaN |
---|
d2 = d1.dropna(how='all')
d2.dropna(axis=1)
2.4 pandas-其他常用方法
- mean(axis=0,skipna=False):对列(行)求平均值
- sum(axis=1):对列(行)求和
- sort_index(axis,…,ascending=True):对列(行)索引排序
- sort_values(by,axis,ascending):按某一列(行)的值排序
- Numpy的通用函数同样适用于pandas
d1
| one | two |
---|
a | 1.0 | 2.0 |
---|
b | 3.0 | 4.0 |
---|
c | 5.0 | NaN |
---|
d | NaN | NaN |
---|
d1.mean()
one 3.0
two 3.0
dtype: float64
d1.mean(axis=1)
a 1.5
b 3.5
c 5.0
d NaN
dtype: float64
d1.mean(axis='columns')
a 1.5
b 3.5
c 5.0
d NaN
dtype: float64
d1.sort_values(by='two')
| one | two |
---|
a | 1.0 | 2.0 |
---|
b | 3.0 | 4.0 |
---|
c | 5.0 | NaN |
---|
d | NaN | NaN |
---|
d1.sort_values(by='two', ascending=False)
| one | two |
---|
b | 3.0 | 4.0 |
---|
a | 1.0 | 2.0 |
---|
c | 5.0 | NaN |
---|
d | NaN | NaN |
---|
d1.sort_values(by='a', axis=1,ascending=False)
| two | one |
---|
a | 2.0 | 1.0 |
---|
b | 4.0 | 3.0 |
---|
c | NaN | 5.0 |
---|
d | NaN | NaN |
---|
d1.sort_index()
| one | two |
---|
a | 1.0 | 2.0 |
---|
b | 3.0 | 4.0 |
---|
c | 5.0 | NaN |
---|
d | NaN | NaN |
---|
3 pandas-时间对象
3.1 pandas-时间对象处理
- 时间序列类型
- 时间戳:特定时刻
- 固定时期:如2020年12月
- 时间间隔:起始时间-结束时间
- Python标准库处理时间对象:datetime
- 灵活处理时间对象:dateutil
- 成组处理时间对象:pandas
import datetime
datetime.datetime.strptime('2020-01-01','%Y-%m-%d')
datetime.datetime(2020, 1, 1, 0, 0)
import dateutil
dateutil.parser.parse('2020-01-01')
datetime.datetime(2020, 1, 1, 0, 0)
dateutil.parser.parse('02/03/2020')
datetime.datetime(2020, 2, 3, 0, 0)
dateutil.parser.parse('20200203')
datetime.datetime(2020, 2, 3, 0, 0)
pd.to_datetime(['2001-01-01','2002/01/01'])
DatetimeIndex(['2001-01-01', '2002-01-01'], dtype='datetime64[ns]', freq=None)
3.2 pandas-时间对象生成
pd.date_range?
pd.date_range('2010-01-01','2010-05-01')
DatetimeIndex(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-04',
'2010-01-05', '2010-01-06', '2010-01-07', '2010-01-08',
'2010-01-09', '2010-01-10',
...
'2010-04-22', '2010-04-23', '2010-04-24', '2010-04-25',
'2010-04-26', '2010-04-27', '2010-04-28', '2010-04-29',
'2010-04-30', '2010-05-01'],
dtype='datetime64[ns]', length=121, freq='D')
pd.date_range('2010-01-01',periods=10)
DatetimeIndex(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-04',
'2010-01-05', '2010-01-06', '2010-01-07', '2010-01-08',
'2010-01-09', '2010-01-10'],
dtype='datetime64[ns]', freq='D')
pd.date_range('2010-01-01',periods=10,freq='w')
DatetimeIndex(['2010-01-03', '2010-01-10', '2010-01-17', '2010-01-24',
'2010-01-31', '2010-02-07', '2010-02-14', '2010-02-21',
'2010-02-28', '2010-03-07'],
dtype='datetime64[ns]', freq='W-SUN')
pd.date_range('2010-01-01',periods=10,freq='w-MON')
DatetimeIndex(['2010-01-04', '2010-01-11', '2010-01-18', '2010-01-25',
'2010-02-01', '2010-02-08', '2010-02-15', '2010-02-22',
'2010-03-01', '2010-03-08'],
dtype='datetime64[ns]', freq='W-MON')
pd.date_range('2010-01-01',periods=10,freq='B')
DatetimeIndex(['2010-01-01', '2010-01-04', '2010-01-05', '2010-01-06',
'2010-01-07', '2010-01-08', '2010-01-11', '2010-01-12',
'2010-01-13', '2010-01-14'],
dtype='datetime64[ns]', freq='B')
dt = pd.date_range('2010-01-01',periods=10,freq='B')
dt[0]
Timestamp('2010-01-01 00:00:00', freq='B')
dt[0].to_pydatetime()
datetime.datetime(2010, 1, 1, 0, 0)
pd.date_range('2010-01-01',periods=10,freq='1h20min')
DatetimeIndex(['2010-01-01 00:00:00', '2010-01-01 01:20:00',
'2010-01-01 02:40:00', '2010-01-01 04:00:00',
'2010-01-01 05:20:00', '2010-01-01 06:40:00',
'2010-01-01 08:00:00', '2010-01-01 09:20:00',
'2010-01-01 10:40:00', '2010-01-01 12:00:00'],
dtype='datetime64[ns]', freq='80T')
4 pandas-时间序列
- 时间序列就是以时间对象为索引的Series或DataFrame。
- datetime对象作为索引时是存储在DatetimeIndex对象中的。
- 时间序列特殊功能:
- 传入“年”或“月”作为切片当时
- 传入日期范围作为切片方式
- 丰富的函数支持:resample(),truncate(),…
pd.date_range('2010-01-01','2010-05-01')
DatetimeIndex(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-04',
'2010-01-05', '2010-01-06', '2010-01-07', '2010-01-08',
'2010-01-09', '2010-01-10',
...
'2010-04-22', '2010-04-23', '2010-04-24', '2010-04-25',
'2010-04-26', '2010-04-27', '2010-04-28', '2010-04-29',
'2010-04-30', '2010-05-01'],
dtype='datetime64[ns]', length=121, freq='D')
sr = pd.Series(np.arange(1000), index=pd.date_range('2020-01-01', periods=1000))
sr
2020-01-01 0
2020-01-02 1
2020-01-03 2
2020-01-04 3
2020-01-05 4
...
2022-09-22 995
2022-09-23 996
2022-09-24 997
2022-09-25 998
2022-09-26 999
Freq: D, Length: 1000, dtype: int32
sr.index
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
'2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',
'2020-01-09', '2020-01-10',
...
'2022-09-17', '2022-09-18', '2022-09-19', '2022-09-20',
'2022-09-21', '2022-09-22', '2022-09-23', '2022-09-24',
'2022-09-25', '2022-09-26'],
dtype='datetime64[ns]', length=1000, freq='D')
sr['2020-03']
2020-03-01 60
2020-03-02 61
2020-03-03 62
2020-03-04 63
2020-03-05 64
2020-03-06 65
2020-03-07 66
2020-03-08 67
2020-03-09 68
2020-03-10 69
2020-03-11 70
2020-03-12 71
2020-03-13 72
2020-03-14 73
2020-03-15 74
2020-03-16 75
2020-03-17 76
2020-03-18 77
2020-03-19 78
2020-03-20 79
2020-03-21 80
2020-03-22 81
2020-03-23 82
2020-03-24 83
2020-03-25 84
2020-03-26 85
2020-03-27 86
2020-03-28 87
2020-03-29 88
2020-03-30 89
2020-03-31 90
Freq: D, dtype: int32
sr['2020']
2020-01-01 0
2020-01-02 1
2020-01-03 2
2020-01-04 3
2020-01-05 4
...
2020-12-27 361
2020-12-28 362
2020-12-29 363
2020-12-30 364
2020-12-31 365
Freq: D, Length: 366, dtype: int32
sr['2020-05':'2020-10']
2020-05-01 121
2020-05-02 122
2020-05-03 123
2020-05-04 124
2020-05-05 125
...
2020-10-27 300
2020-10-28 301
2020-10-29 302
2020-10-30 303
2020-10-31 304
Freq: D, Length: 184, dtype: int32
sr['2020-05-01':'2020-10-31']
2020-05-01 121
2020-05-02 122
2020-05-03 123
2020-05-04 124
2020-05-05 125
...
2020-10-27 300
2020-10-28 301
2020-10-29 302
2020-10-30 303
2020-10-31 304
Freq: D, Length: 184, dtype: int32
sr.resample('W').sum()
2020-01-05 10
2020-01-12 56
2020-01-19 105
2020-01-26 154
2020-02-02 203
...
2022-09-04 6818
2022-09-11 6867
2022-09-18 6916
2022-09-25 6965
2022-10-02 999
Freq: W-SUN, Length: 144, dtype: int32
sr.resample('M').mean()
2020-01-31 15.0
2020-02-29 45.0
2020-03-31 75.0
2020-04-30 105.5
2020-05-31 136.0
2020-06-30 166.5
2020-07-31 197.0
2020-08-31 228.0
2020-09-30 258.5
2020-10-31 289.0
2020-11-30 319.5
2020-12-31 350.0
2021-01-31 381.0
2021-02-28 410.5
2021-03-31 440.0
2021-04-30 470.5
2021-05-31 501.0
2021-06-30 531.5
2021-07-31 562.0
2021-08-31 593.0
2021-09-30 623.5
2021-10-31 654.0
2021-11-30 684.5
2021-12-31 715.0
2022-01-31 746.0
2022-02-28 775.5
2022-03-31 805.0
2022-04-30 835.5
2022-05-31 866.0
2022-06-30 896.5
2022-07-31 927.0
2022-08-31 958.0
2022-09-30 986.5
Freq: M, dtype: float64
sr.truncate(before='2020-05-01',after='2020-10-01')
2020-05-01 121
2020-05-02 122
2020-05-03 123
2020-05-04 124
2020-05-05 125
...
2020-09-27 270
2020-09-28 271
2020-09-29 272
2020-09-30 273
2020-10-01 274
Freq: D, Length: 154, dtype: int32
5 pandas-文件处理
5.1 pandas-读取文件
pd.read_csv('399300.csv')
| 日期 | 股票代码 | 名称 | 收盘价 | 最高价 | 最低价 | 开盘价 | 前收盘 | 涨跌额 | 涨跌幅 | 成交量 | 成交金额 |
---|
0 | 2021/1/29 | '399300 | 沪深300 | 5351.9646 | 5430.2015 | 5288.0955 | 5413.9684 | 5377.1427 | -25.1781 | -0.4682 | 18217878400 | 390,287,690,019.00 |
---|
1 | 2021/1/28 | '399300 | 沪深300 | 5377.1427 | 5462.2352 | 5360.3766 | 5450.3695 | 5528.0034 | -150.8607 | -2.729 | 17048558500 | 376,166,523,178.00 |
---|
2 | 2021/1/27 | '399300 | 沪深300 | 5528.0034 | 5534.9928 | 5449.6385 | 5505.7708 | 5512.9678 | 15.0356 | 0.2727 | 16019084100 | 376,892,605,839.00 |
---|
3 | 2021/1/26 | '399300 | 沪深300 | 5512.9678 | 5600.9017 | 5505.9962 | 5600.9017 | 5625.9232 | -112.9554 | -2.0078 | 17190459000 | 415,008,069,865.00 |
---|
4 | 2021/1/25 | '399300 | 沪深300 | 5625.9232 | 5655.4795 | 5543.2663 | 5564.1237 | 5569.776 | 56.1472 | 1.0081 | 19704701900 | 508,166,980,802.00 |
---|
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
---|
4625 | 2002/1/10 | '399300 | 沪深300 | 1281.2600 | 1281.2600 | 1281.2600 | 1281.2600 | 1272.65 | 8.61 | 0.6765 | 0 | - |
---|
4626 | 2002/1/9 | '399300 | 沪深300 | 1272.6500 | 1272.6500 | 1272.6500 | 1272.6500 | 1292.71 | -20.06 | -1.5518 | 0 | - |
---|
4627 | 2002/1/8 | '399300 | 沪深300 | 1292.7100 | 1292.7100 | 1292.7100 | 1292.7100 | 1302.08 | -9.37 | -0.7196 | 0 | - |
---|
4628 | 2002/1/7 | '399300 | 沪深300 | 1302.0800 | 1302.0800 | 1302.0800 | 1302.0800 | 1316.46 | -14.38 | -1.0923 | 0 | - |
---|
4629 | 2002/1/4 | '399300 | 沪深300 | 1316.4600 | 1316.4600 | 1316.4600 | 1316.4600 | None | None | None | 0 | - |
---|
4630 rows × 12 columns
pd.read_csv('399300.csv', index_col='日期')
| 股票代码 | 名称 | 收盘价 | 最高价 | 最低价 | 开盘价 | 前收盘 | 涨跌额 | 涨跌幅 | 成交量 | 成交金额 |
---|
日期 | | | | | | | | | | | |
---|
2021/1/29 | '399300 | 沪深300 | 5351.9646 | 5430.2015 | 5288.0955 | 5413.9684 | 5377.1427 | -25.1781 | -0.4682 | 18217878400 | 390,287,690,019.00 |
---|
2021/1/28 | '399300 | 沪深300 | 5377.1427 | 5462.2352 | 5360.3766 | 5450.3695 | 5528.0034 | -150.8607 | -2.729 | 17048558500 | 376,166,523,178.00 |
---|
2021/1/27 | '399300 | 沪深300 | 5528.0034 | 5534.9928 | 5449.6385 | 5505.7708 | 5512.9678 | 15.0356 | 0.2727 | 16019084100 | 376,892,605,839.00 |
---|
2021/1/26 | '399300 | 沪深300 | 5512.9678 | 5600.9017 | 5505.9962 | 5600.9017 | 5625.9232 | -112.9554 | -2.0078 | 17190459000 | 415,008,069,865.00 |
---|
2021/1/25 | '399300 | 沪深300 | 5625.9232 | 5655.4795 | 5543.2663 | 5564.1237 | 5569.776 | 56.1472 | 1.0081 | 19704701900 | 508,166,980,802.00 |
---|
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
---|
2002/1/10 | '399300 | 沪深300 | 1281.2600 | 1281.2600 | 1281.2600 | 1281.2600 | 1272.65 | 8.61 | 0.6765 | 0 | - |
---|
2002/1/9 | '399300 | 沪深300 | 1272.6500 | 1272.6500 | 1272.6500 | 1272.6500 | 1292.71 | -20.06 | -1.5518 | 0 | - |
---|
2002/1/8 | '399300 | 沪深300 | 1292.7100 | 1292.7100 | 1292.7100 | 1292.7100 | 1302.08 | -9.37 | -0.7196 | 0 | - |
---|
2002/1/7 | '399300 | 沪深300 | 1302.0800 | 1302.0800 | 1302.0800 | 1302.0800 | 1316.46 | -14.38 | -1.0923 | 0 | - |
---|
2002/1/4 | '399300 | 沪深300 | 1316.4600 | 1316.4600 | 1316.4600 | 1316.4600 | None | None | None | 0 | - |
---|
4630 rows × 11 columns
df = _
df.index
Index(['2021/1/29', '2021/1/28', '2021/1/27', '2021/1/26', '2021/1/25',
'2021/1/22', '2021/1/21', '2021/1/20', '2021/1/19', '2021/1/18',
...
'2002/1/17', '2002/1/16', '2002/1/15', '2002/1/14', '2002/1/11',
'2002/1/10', '2002/1/9', '2002/1/8', '2002/1/7', '2002/1/4'],
dtype='object', name='日期', length=4630)
df = pd.read_csv('399300.csv',index_col='日期',parse_dates=['日期'])
df
| 股票代码 | 名称 | 收盘价 | 最高价 | 最低价 | 开盘价 | 前收盘 | 涨跌额 | 涨跌幅 | 成交量 | 成交金额 |
---|
日期 | | | | | | | | | | | |
---|
2021-01-29 | '399300 | 沪深300 | 5351.9646 | 5430.2015 | 5288.0955 | 5413.9684 | 5377.1427 | -25.1781 | -0.4682 | 18217878400 | 390,287,690,019.00 |
---|
2021-01-28 | '399300 | 沪深300 | 5377.1427 | 5462.2352 | 5360.3766 | 5450.3695 | 5528.0034 | -150.8607 | -2.729 | 17048558500 | 376,166,523,178.00 |
---|
2021-01-27 | '399300 | 沪深300 | 5528.0034 | 5534.9928 | 5449.6385 | 5505.7708 | 5512.9678 | 15.0356 | 0.2727 | 16019084100 | 376,892,605,839.00 |
---|
2021-01-26 | '399300 | 沪深300 | 5512.9678 | 5600.9017 | 5505.9962 | 5600.9017 | 5625.9232 | -112.9554 | -2.0078 | 17190459000 | 415,008,069,865.00 |
---|
2021-01-25 | '399300 | 沪深300 | 5625.9232 | 5655.4795 | 5543.2663 | 5564.1237 | 5569.776 | 56.1472 | 1.0081 | 19704701900 | 508,166,980,802.00 |
---|
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
---|
2002-01-10 | '399300 | 沪深300 | 1281.2600 | 1281.2600 | 1281.2600 | 1281.2600 | 1272.65 | 8.61 | 0.6765 | 0 | - |
---|
2002-01-09 | '399300 | 沪深300 | 1272.6500 | 1272.6500 | 1272.6500 | 1272.6500 | 1292.71 | -20.06 | -1.5518 | 0 | - |
---|
2002-01-08 | '399300 | 沪深300 | 1292.7100 | 1292.7100 | 1292.7100 | 1292.7100 | 1302.08 | -9.37 | -0.7196 | 0 | - |
---|
2002-01-07 | '399300 | 沪深300 | 1302.0800 | 1302.0800 | 1302.0800 | 1302.0800 | 1316.46 | -14.38 | -1.0923 | 0 | - |
---|
2002-01-04 | '399300 | 沪深300 | 1316.4600 | 1316.4600 | 1316.4600 | 1316.4600 | None | None | None | 0 | - |
---|
4630 rows × 11 columns
df.index
DatetimeIndex(['2021-01-29', '2021-01-28', '2021-01-27', '2021-01-26',
'2021-01-25', '2021-01-22', '2021-01-21', '2021-01-20',
'2021-01-19', '2021-01-18',
...
'2002-01-17', '2002-01-16', '2002-01-15', '2002-01-14',
'2002-01-11', '2002-01-10', '2002-01-09', '2002-01-08',
'2002-01-07', '2002-01-04'],
dtype='datetime64[ns]', name='日期', length=4630, freq=None)
df = pd.read_csv('399300-2.csv',header=None)
df
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
---|
0 | 2021/1/29 | '399300 | 沪深300 | 5351.9646 | 5430.2015 | 5288.0955 | 5413.9684 | 5377.1427 | -25.1781 | -0.4682 | 18217878400 | 390,287,690,019.00 |
---|
1 | 2021/1/28 | '399300 | 沪深300 | 5377.1427 | 5462.2352 | 5360.3766 | 5450.3695 | 5528.0034 | -150.8607 | -2.729 | 17048558500 | 376,166,523,178.00 |
---|
2 | 2021/1/27 | '399300 | 沪深300 | 5528.0034 | 5534.9928 | 5449.6385 | 5505.7708 | 5512.9678 | 15.0356 | 0.2727 | 16019084100 | 376,892,605,839.00 |
---|
3 | 2021/1/26 | '399300 | 沪深300 | 5512.9678 | 5600.9017 | 5505.9962 | 5600.9017 | 5625.9232 | -112.9554 | -2.0078 | 17190459000 | 415,008,069,865.00 |
---|
4 | 2021/1/25 | '399300 | 沪深300 | 5625.9232 | 5655.4795 | 5543.2663 | 5564.1237 | 5569.776 | 56.1472 | 1.0081 | 19704701900 | 508,166,980,802.00 |
---|
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
---|
4625 | 2002/1/10 | '399300 | 沪深300 | 1281.2600 | 1281.2600 | 1281.2600 | 1281.2600 | 1272.65 | 8.61 | 0.6765 | 0 | - |
---|
4626 | 2002/1/9 | '399300 | 沪深300 | 1272.6500 | 1272.6500 | 1272.6500 | 1272.6500 | 1292.71 | -20.06 | -1.5518 | 0 | - |
---|
4627 | 2002/1/8 | '399300 | 沪深300 | 1292.7100 | 1292.7100 | 1292.7100 | 1292.7100 | 1302.08 | -9.37 | -0.7196 | 0 | - |
---|
4628 | 2002/1/7 | '399300 | 沪深300 | 1302.0800 | 1302.0800 | 1302.0800 | 1302.0800 | 1316.46 | -14.38 | -1.0923 | 0 | - |
---|
4629 | 2002/1/4 | '399300 | 沪深300 | 1316.4600 | 1316.4600 | 1316.4600 | 1316.4600 | None | None | None | 0 | - |
---|
4630 rows × 12 columns
df = pd.read_csv('399300-2.csv',header=None, names=['股票代码', '名称', '收盘价', '最高价', '最低价',
'开盘价', '前收盘', '涨跌额', '涨跌幅', '成交量','成交金额 '])
df
| 股票代码 | 名称 | 收盘价 | 最高价 | 最低价 | 开盘价 | 前收盘 | 涨跌额 | 涨跌幅 | 成交量 | 成交金额 |
---|
2021/1/29 | '399300 | 沪深300 | 5351.9646 | 5430.2015 | 5288.0955 | 5413.9684 | 5377.1427 | -25.1781 | -0.4682 | 18217878400 | 390,287,690,019.00 |
---|
2021/1/28 | '399300 | 沪深300 | 5377.1427 | 5462.2352 | 5360.3766 | 5450.3695 | 5528.0034 | -150.8607 | -2.729 | 17048558500 | 376,166,523,178.00 |
---|
2021/1/27 | '399300 | 沪深300 | 5528.0034 | 5534.9928 | 5449.6385 | 5505.7708 | 5512.9678 | 15.0356 | 0.2727 | 16019084100 | 376,892,605,839.00 |
---|
2021/1/26 | '399300 | 沪深300 | 5512.9678 | 5600.9017 | 5505.9962 | 5600.9017 | 5625.9232 | -112.9554 | -2.0078 | 17190459000 | 415,008,069,865.00 |
---|
2021/1/25 | '399300 | 沪深300 | 5625.9232 | 5655.4795 | 5543.2663 | 5564.1237 | 5569.776 | 56.1472 | 1.0081 | 19704701900 | 508,166,980,802.00 |
---|
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
---|
2002/1/10 | '399300 | 沪深300 | 1281.2600 | 1281.2600 | 1281.2600 | 1281.2600 | 1272.65 | 8.61 | 0.6765 | 0 | - |
---|
2002/1/9 | '399300 | 沪深300 | 1272.6500 | 1272.6500 | 1272.6500 | 1272.6500 | 1292.71 | -20.06 | -1.5518 | 0 | - |
---|
2002/1/8 | '399300 | 沪深300 | 1292.7100 | 1292.7100 | 1292.7100 | 1292.7100 | 1302.08 | -9.37 | -0.7196 | 0 | - |
---|
2002/1/7 | '399300 | 沪深300 | 1302.0800 | 1302.0800 | 1302.0800 | 1302.0800 | 1316.46 | -14.38 | -1.0923 | 0 | - |
---|
2002/1/4 | '399300 | 沪深300 | 1316.4600 | 1316.4600 | 1316.4600 | 1316.4600 | None | None | None | 0 | - |
---|
4630 rows × 11 columns
df = pd.read_csv('399300.csv',index_col='日期',parse_dates=['日期'], skiprows=[1,2,3])
df
| 股票代码 | 名称 | 收盘价 | 最高价 | 最低价 | 开盘价 | 前收盘 | 涨跌额 | 涨跌幅 | 成交量 | 成交金额 |
---|
日期 | | | | | | | | | | | |
---|
2021-01-26 | '399300 | 沪深300 | 5512.9678 | 5600.9017 | 5505.9962 | 5600.9017 | 5625.9232 | -112.9554 | -2.0078 | 17190459000 | 415,008,069,865.00 |
---|
2021-01-25 | '399300 | 沪深300 | 5625.9232 | 5655.4795 | 5543.2663 | 5564.1237 | 5569.776 | 56.1472 | 1.0081 | 19704701900 | 508,166,980,802.00 |
---|
2021-01-22 | '399300 | 沪深300 | 5569.7760 | 5573.6594 | 5513.8769 | 5562.3790 | 5564.9693 | 4.8067 | 0.0864 | 19930002000 | 456,622,193,436.00 |
---|
2021-01-21 | '399300 | 沪深300 | 5564.9693 | 5593.1058 | 5490.5626 | 5492.9587 | 5476.4336 | 88.5357 | 1.6167 | 20995019700 | 453,183,684,479.00 |
---|
2021-01-20 | '399300 | 沪深300 | 5476.4336 | 5496.0493 | 5426.5357 | 5439.9111 | 5437.5234 | 38.9102 | 0.7156 | 17091326000 | 373,770,384,496.00 |
---|
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
---|
2002-01-10 | '399300 | 沪深300 | 1281.2600 | 1281.2600 | 1281.2600 | 1281.2600 | 1272.65 | 8.61 | 0.6765 | 0 | - |
---|
2002-01-09 | '399300 | 沪深300 | 1272.6500 | 1272.6500 | 1272.6500 | 1272.6500 | 1292.71 | -20.06 | -1.5518 | 0 | - |
---|
2002-01-08 | '399300 | 沪深300 | 1292.7100 | 1292.7100 | 1292.7100 | 1292.7100 | 1302.08 | -9.37 | -0.7196 | 0 | - |
---|
2002-01-07 | '399300 | 沪深300 | 1302.0800 | 1302.0800 | 1302.0800 | 1302.0800 | 1316.46 | -14.38 | -1.0923 | 0 | - |
---|
2002-01-04 | '399300 | 沪深300 | 1316.4600 | 1316.4600 | 1316.4600 | 1316.4600 | None | None | None | 0 | - |
---|
4627 rows × 11 columns
df = pd.read_csv('399300.csv',index_col='日期',parse_dates=['日期'], na_values=['None','NA','nan'])
5.2 pandas-写入文件
- 写入到csv文件:to_csv函数
- 写入文件函数的主要参数
- sep 指定文件分隔符
- na_rep 指定缺失值转换的字符串,默认为空字符串
- header=False 不输出列名一行
- index=False 不输出索引一列
- columns 指定输出的列,传入列表
df.iloc[1,1] = np.nan
df.to_csv('test.csv', header=False, index=False, na_rep='None',encoding='ANSI')