pandas库（1）：csv文件的读取与输出

最新推荐文章于 2023-04-17 11:03:14 发布

沛然ypr

最新推荐文章于 2023-04-17 11:03:14 发布

阅读量717

点赞数

分类专栏：数据分析

本文链接：https://blog.csdn.net/weixin_43824414/article/details/105370820

版权

数据分析专栏收录该内容

7 篇文章 1 订阅

订阅专栏

示例文件：

南京一日地铁刷卡数据csv文件

其中含有字段：date(日期)，off_time(出站时间)，card_id(公交卡号)，card_type(卡种)，device_num(设备编号)，off_station(出站站点编号)，on_time(进站时间)，on_station(进站站点编号)

文件部分内容如下：

date	off_time	card_id	card_type	device_num	off_station	on_time	on_station
2016-04-22	08:25:36	0000990771990514	102	22023703	0000037	201604	22075032
2016-04-22	12:32:43	0000990772079197	101	22011011	0000010	201604	22122620
2016-04-22	21:11:31	0000990772083185	101	22011315	0000013	201604	22210546
2016-04-22	18:05:24	0000990772083509	101	22023701	0000037	201604	22175633
2016-04-22	17:03:04	0000990772083471	101	22040210	0000002	201604	22163715
2016-04-22	08:07:11	0000997169251538	007	22035506	0000055	201604	22071314
2016-04-22	10:46:41	0000997169251538	007	22011216	0000012	201604	22094757
2016-04-22	18:04:25	0000997169148602	007	22023802	0000038	201604	22172720
2016-04-22	12:38:31	0000997169148602	007	22010814	0000008	201604	22115952
2016-04-22	09:36:15	0000997169237105	007	22022121	0000021	201604	22091548
2016-04-22	08:11:36	0000997169396171	007	22079406	0000094	201604	22075540
2016-04-22	19:28:03	0000997169396171	007	22079006	0000090	201604	22190423
2016-04-22	08:34:06	0000990172167480	101	29070903	0000109	201604	22080705

示例代码：

（1）读取文件

import pandas as pd

subway_info = pd.read_csv("D:MET20160422.csv")

（2）输出数据类型和数据量

代码：

print(subway_info.dtypes)   #输出数据类型

输出：

 date           object
 off_time       object
 card_id         int64
 card_type       int64
 device_num      int64
 off_station     int64
 on_time         int64
 on_station      int64
 dtype: object

代码：

print(subway_info.shape)   #输出数据量

输出：

(1230010, 8)               #表示有1230010条信息，8个字段

（3）输出所有字段信息

代码：

print(subway_info.head(3))   #输出前三条信息

输出：

     date  off_time       card_id  ...  off_station         on_time  on_station
 0  2016-04-22  08:25:36  990771990514  …           37  20160422075032          25
 1  2016-04-22  12:32:43  990772079197  …           10  20160422122620           8
 2  2016-04-22  21:11:31  990772083185  …           13  20160422210546          12

代码：

print(subway_info.tail(3))   #输出末尾三条信息

输出：

           date  off_time  ...         on_time  on_station
 1230007  2016-04-22  10:44:09  …  20160422101028         108
 1230008  2016-04-22  12:35:02  …  20160422121822          43
 1230009  2016-04-22  20:05:27  …  20160422192914          10

代码：

print(subway_info.loc[2])   #输出第三条信息（编号为2）

输出：

 date               2016-04-22
 off_time             21:11:31
 card_id          990772083185
 card_type                 101
 device_num           22011315
 off_station                13
 on_time        20160422210546
 on_station                 12
 Name: 2, dtype: object

（4）输出指定字段信息

代码：

ndb_col = subway_info["on_station"]
print(ndb_col)                             #输出字段on_station的全部信息

输出：

 0           25
 1            8
 2           12
 3           35
 4           44
 5           11
 6           55
 7            9
 8           37
 9           17
 10          90
 11          95
 12         100
 13         109
 14          52
 15          12
 16          42
 17          72
 18         100
 19          22
 20          26
 21         110
 22          97
 23          10
 24          24
 25          17
 26          12
 27          53
 28          10
 29          41
           … 
 1229980     98
 1229981     24
 1229982    101
 1229983    101
 1229984     99
 1229985    113
 1229986    103
 1229987     92
 1229988    104
 1229989    113
 1229990     16
 1229991     97
 1229992     44
 1229993     26
 1229994     99
 1229995    102
 1229996    113
 1229997     42
 1229998     44
 1229999    106
 1230000    103
 1230001      6
 1230002     21
 1230003     16
 1230004     75
 1230005      9
 1230006     31
 1230007    108
 1230008     43
 1230009     10
 Name: on_station, Length: 1230010, dtype: int64

代码：

columns = ["on_station", "off_station"]
ndb_col = subway_info[columns]
print(ndb_col)                                 #输出字段on_station和off_station的全部信息

输出：

           on_station  off_station
 0                25           37
 1                 8           10
 2                12           13
 3                35           37
 4                44            2
 5                11           55
 6                55           12
 7                 9           38
 8                37            8
 9                17           21
 10               90           94
 11               95           90
 12              100          109
 13              109          102
 14               52           55
 15               12           70
 16               42           12
 17               72          100
 18              100           75
 19               22            9
 20               26          110
 21              110           97
 22               97           26
 23               10           12
 24               24           17
 25               17           14
 26               12           53
 27               53           12
 28               10           41
 29               41           10
 …             …          …
 1229980          98          111
 1229981          24          111
 1229982         101          111
 1229983         101          107
 1229984          99          107
 1229985         113          108
 1229986         103          108
 1229987          92          108
 1229988         104          108
 1229989         113          108
 1229990          16          108
 1229991          97          109
 1229992          44          109
 1229993          26          109
 1229994          99          109
 1229995         102          109
 1229996         113          109
 1229997          42          110
 1229998          44          110
 1229999         106          111
 1230000         103          110
 1230001           6          111
 1230002          21          107
 1230003          16          108
 1230004          75          108
 1230005           9          108
 1230006          31          108
 1230007         108          108
 1230008          43          109
 1230009          10          109

（5）输出字段名

代码：

col_names = subway_info.columns.tolist()
print(col_names)                               #输出全部字段名

输出：

['date', 'off_time', 'card_id', 'card_type', 'device_num', 'off_station', 'on_time', 'on_station']

（6）筛选特定字段输出信息

代码：

col_names = subway_info.columns.tolist()
station_columns = []
for i in col_names:
    if i.endswith("station"):                #筛选出以station结尾的字段
        station_columns.append(i)
station_df = subway_info[station_columns]
print(station_df.head(3))                    #输出以station结尾的字段的前三条信息

输出：

      off_station  on_station
 0           37          25
 1           10           8
 2           13          12

沛然ypr

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
pandas库（1）：csv文件的读取与输出

示例文件：南京一日地铁刷卡数据csv文件**其中含有字段：**date(日期)，off_time(出站时间)，card_id(公交卡号)，card_type(卡种)，device_num(设备编号)，off_station(出站站点编号)，on_time(进站时间)，on_station(进站站点编号)...
复制链接

扫一扫

专栏目录