pandas库(1):csv文件的读取与输出

示例文件:

南京一日地铁刷卡数据csv文件

其中含有字段:date(日期),off_time(出站时间),card_id(公交卡号),card_type(卡种),device_num(设备编号),off_station(出站站点编号),on_time(进站时间),on_station(进站站点编号)

文件部分内容如下:

dateoff_timecard_idcard_typedevice_numoff_stationon_timeon_station
2016-04-2208:25:36000099077199051410222023703000003720160422075032
2016-04-2212:32:43000099077207919710122011011000001020160422122620
2016-04-2221:11:31000099077208318510122011315000001320160422210546
2016-04-2218:05:24000099077208350910122023701000003720160422175633
2016-04-2217:03:04000099077208347110122040210000000220160422163715
2016-04-2208:07:11000099716925153800722035506000005520160422071314
2016-04-2210:46:41000099716925153800722011216000001220160422094757
2016-04-2218:04:25000099716914860200722023802000003820160422172720
2016-04-2212:38:31000099716914860200722010814000000820160422115952
2016-04-2209:36:15000099716923710500722022121000002120160422091548
2016-04-2208:11:36000099716939617100722079406000009420160422075540
2016-04-2219:28:03000099716939617100722079006000009020160422190423
2016-04-2208:34:06000099017216748010129070903000010920160422080705

示例代码:

(1)读取文件

import pandas as pd

subway_info = pd.read_csv("D:MET20160422.csv")

(2)输出数据类型和数据量

代码:

print(subway_info.dtypes)   #输出数据类型

输出:

 date           object
 off_time       object
 card_id         int64
 card_type       int64
 device_num      int64
 off_station     int64
 on_time         int64
 on_station      int64
 dtype: object

代码:

print(subway_info.shape)   #输出数据量

输出:

(1230010, 8)               #表示有1230010条信息,8个字段

(3)输出所有字段信息

代码:

print(subway_info.head(3))   #输出前三条信息

输出:

     date  off_time       card_id  ...  off_station         on_time  on_station
 0  2016-04-22  08:25:36  990771990514  …           37  20160422075032          25
 1  2016-04-22  12:32:43  990772079197  …           10  20160422122620           8
 2  2016-04-22  21:11:31  990772083185  …           13  20160422210546          12

代码:

print(subway_info.tail(3))   #输出末尾三条信息

输出:

           date  off_time  ...         on_time  on_station
 1230007  2016-04-22  10:44:09  …  20160422101028         108
 1230008  2016-04-22  12:35:02  …  20160422121822          43
 1230009  2016-04-22  20:05:27  …  20160422192914          10

代码:

print(subway_info.loc[2])   #输出第三条信息(编号为2)

输出:

 date               2016-04-22
 off_time             21:11:31
 card_id          990772083185
 card_type                 101
 device_num           22011315
 off_station                13
 on_time        20160422210546
 on_station                 12
 Name: 2, dtype: object

(4)输出指定字段信息

代码:

ndb_col = subway_info["on_station"]
print(ndb_col)                             #输出字段on_station的全部信息

输出:

 0           25
 1            8
 2           12
 3           35
 4           44
 5           11
 6           55
 7            9
 8           37
 9           17
 10          90
 11          95
 12         100
 13         109
 14          52
 15          12
 16          42
 17          72
 18         100
 19          22
 20          26
 21         110
 22          97
 23          10
 24          24
 25          17
 26          12
 27          53
 28          10
 29          41
           … 
 1229980     98
 1229981     24
 1229982    101
 1229983    101
 1229984     99
 1229985    113
 1229986    103
 1229987     92
 1229988    104
 1229989    113
 1229990     16
 1229991     97
 1229992     44
 1229993     26
 1229994     99
 1229995    102
 1229996    113
 1229997     42
 1229998     44
 1229999    106
 1230000    103
 1230001      6
 1230002     21
 1230003     16
 1230004     75
 1230005      9
 1230006     31
 1230007    108
 1230008     43
 1230009     10
 Name: on_station, Length: 1230010, dtype: int64

代码:

columns = ["on_station", "off_station"]
ndb_col = subway_info[columns]
print(ndb_col)                                 #输出字段on_station和off_station的全部信息

输出:

           on_station  off_station
 0                25           37
 1                 8           10
 2                12           13
 3                35           37
 4                44            2
 5                11           55
 6                55           12
 7                 9           38
 8                37            8
 9                17           21
 10               90           94
 11               95           90
 12              100          109
 13              109          102
 14               52           55
 15               12           70
 16               42           12
 17               72          100
 18              100           75
 19               22            9
 20               26          110
 21              110           97
 22               97           26
 23               10           12
 24               24           17
 25               17           14
 26               12           53
 27               53           12
 28               10           41
 29               41           10
 …             …          …
 1229980          98          111
 1229981          24          111
 1229982         101          111
 1229983         101          107
 1229984          99          107
 1229985         113          108
 1229986         103          108
 1229987          92          108
 1229988         104          108
 1229989         113          108
 1229990          16          108
 1229991          97          109
 1229992          44          109
 1229993          26          109
 1229994          99          109
 1229995         102          109
 1229996         113          109
 1229997          42          110
 1229998          44          110
 1229999         106          111
 1230000         103          110
 1230001           6          111
 1230002          21          107
 1230003          16          108
 1230004          75          108
 1230005           9          108
 1230006          31          108
 1230007         108          108
 1230008          43          109
 1230009          10          109

(5)输出字段名

代码:

col_names = subway_info.columns.tolist()
print(col_names)                               #输出全部字段名

输出:

['date', 'off_time', 'card_id', 'card_type', 'device_num', 'off_station', 'on_time', 'on_station']

(6)筛选特定字段输出信息

代码:

col_names = subway_info.columns.tolist()
station_columns = []
for i in col_names:
    if i.endswith("station"):                #筛选出以station结尾的字段
        station_columns.append(i)
station_df = subway_info[station_columns]
print(station_df.head(3))                    #输出以station结尾的字段的前三条信息

输出:

      off_station  on_station
 0           37          25
 1           10           8
 2           13          12
  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值