28-Python Pandas读取文件

Python Pandas读取文件

当使用 Pandas 做数据分析的时,需要读取事先准备好的数据集,这是做数据分析的第一步。Panda 提供了多种读取数据的方法:

  • read_csv() 用于读取文本文件
  • read_json() 用于读取 json 文件
  • read_sql_query() 读取 sql 语句的,

本节将对上述方法做详细介绍。

CSV文件读取

CSV 又称逗号分隔值文件,是一种简单的文件格式,以特定的结构来排列表格数据。 CSV 文件能够以纯文本形式存储表格数据,比如电子表格、数据库文件,并具有数据交换的通用格式。CSV 文件会在 Excel 文件中被打开,其行和列都定义了标准的数据格式。

将 CSV 中的数据转换为 DataFrame 对象是非常便捷的。和一般文件读写不一样,它不需要你做打开文件、读取文件、关闭文件等操作。相反,您只需要一行代码就可以完成上述所有步骤,并将数据存储在 DataFrame 中。

下面进行实例演示,首先您需要创建一组数据,并将其保存为 CSV 格式,数据如下:

EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO
7369,SMITH,CLERK,7902.0,1980-12-17,800,,20
7499,ALLEN,SALESMAN,7698.0,1981-02-20,1600,300.0,30
7521,WARD,SALESMAN,7698.0,1981-02-22,1250,500.0,30
7566,JONES,MANAGER,7839.0,1981-04-02,2975,,20
7654,MARTIN,SALESMAN,7698.0,1981-09-28,1250,1400.0,30
7698,BLAKE,MANAGER,7839.0,1981-05-01,2850,,30
7782,CLARK,MANAGER,7839.0,1981-06-09,2450,,10
7788,SCOTT,ANALYST,7566.0,1987-04-19,3000,,20
7839,KING,PRESIDENT,,1981-11-17,5000,,10
7844,TURNER,SALESMAN,7698.0,1981-09-08,1500,0.0,30
7876,ADAMS,CLERK,7788.0,1987-05-23,1100,,20
7900,JAMES,CLERK,7698.0,1981-12-03,950,,30
7902,FORD,ANALYST,7566.0,1981-12-03,3000,,20
7934,MILLER,CLERK,7782.0,1982-01-23,1300,,10

注意:将上述数据保存到.txt的文本文件中,然后将文件的扩展名后缀修改为 csv,即可完成 csv 文件的创建。

接下来,我们使用下列代码读写数据:

import pandas as pd
import matplotlib.pyplot as plt
#仅仅一行代码就完成了数据读取,但是注意文件路径不要写错
df  = pd.read_csv("C:\\Users\\qwy\\Desktop\\data\\empdata.csv")
print(df)

输出结果:

    EMPNO   ENAME        JOB     MGR    HIREDATE   SAL    COMM  DEPTNO
0    7369   SMITH      CLERK  7902.0  1980-12-17   800     NaN      20
1    7499   ALLEN   SALESMAN  7698.0  1981-02-20  1600   300.0      30
2    7521    WARD   SALESMAN  7698.0  1981-02-22  1250   500.0      30
3    7566   JONES    MANAGER  7839.0  1981-04-02  2975     NaN      20
4    7654  MARTIN   SALESMAN  7698.0  1981-09-28  1250  1400.0      30
5    7698   BLAKE    MANAGER  7839.0  1981-05-01  2850     NaN      30
6    7782   CLARK    MANAGER  7839.0  1981-06-09  2450     NaN      10
7    7788   SCOTT    ANALYST  7566.0  1987-04-19  3000     NaN      20
8    7839    KING  PRESIDENT     NaN  1981-11-17  5000     NaN      10
9    7844  TURNER   SALESMAN  7698.0  1981-09-08  1500     0.0      30
10   7876   ADAMS      CLERK  7788.0  1987-05-23  1100     NaN      20
11   7900   JAMES      CLERK  7698.0  1981-12-03   950     NaN      30
12   7902    FORD    ANALYST  7566.0  1981-12-03  3000     NaN      20
13   7934  MILLER      CLERK  7782.0  1982-01-23  1300     NaN      10

json读取文件

emp.json:

{"EMPNO":{"0":7369,"1":7499,"2":7521,"3":7566,"4":7654,"5":7698,"6":7782,"7":7788,"8":7839,"9":7844,"10":7876,"11":7900,"12":7902,"13":7934},"ENAME":{"0":"SMITH","1":"ALLEN","2":"WARD","3":"JONES","4":"MARTIN","5":"BLAKE","6":"CLARK","7":"SCOTT","8":"KING","9":"TURNER","10":"ADAMS","11":"JAMES","12":"FORD","13":"MILLER"},"JOB":{"0":"CLERK","1":"SALESMAN","2":"SALESMAN","3":"MANAGER","4":"SALESMAN","5":"MANAGER","6":"MANAGER","7":"ANALYST","8":"PRESIDENT","9":"SALESMAN","10":"CLERK","11":"CLERK","12":"ANALYST","13":"CLERK"},"MGR":{"0":7902.0,"1":7698.0,"2":7698.0,"3":7839.0,"4":7698.0,"5":7839.0,"6":7839.0,"7":7566.0,"8":null,"9":7698.0,"10":7788.0,"11":7698.0,"12":7566.0,"13":7782.0},"HIREDATE":{"0":"1980-12-17","1":"1981-02-20","2":"1981-02-22","3":"1981-04-02","4":"1981-09-28","5":"1981-05-01","6":"1981-06-09","7":"1987-04-19","8":"1981-11-17","9":"1981-09-08","10":"1987-05-23","11":"1981-12-03","12":"1981-12-03","13":"1982-01-23"},"SAL":{"0":800,"1":1600,"2":1250,"3":2975,"4":1250,"5":2850,"6":2450,"7":3000,"8":5000,"9":1500,"10":1100,"11":950,"12":3000,"13":1300},"COMM":{"0":null,"1":300.0,"2":500.0,"3":null,"4":1400.0,"5":null,"6":null,"7":null,"8":null,"9":0.0,"10":null,"11":null,"12":null,"13":null},"DEPTNO":{"0":20,"1":30,"2":30,"3":20,"4":30,"5":30,"6":10,"7":20,"8":10,"9":30,"10":20,"11":30,"12":20,"13":10}}

您可以通过下列方法来读取一个 json 文件,如下所示:

import pandas as pd
df  = pd.read_json("C:\\Users\\qwy\\Desktop\data\\empdata.json")
print(df)

输出结果:

    EMPNO   ENAME        JOB     MGR    HIREDATE   SAL    COMM  DEPTNO
0    7369   SMITH      CLERK  7902.0  1980-12-17   800     NaN      20
1    7499   ALLEN   SALESMAN  7698.0  1981-02-20  1600   300.0      30
2    7521    WARD   SALESMAN  7698.0  1981-02-22  1250   500.0      30
3    7566   JONES    MANAGER  7839.0  1981-04-02  2975     NaN      20
4    7654  MARTIN   SALESMAN  7698.0  1981-09-28  1250  1400.0      30
5    7698   BLAKE    MANAGER  7839.0  1981-05-01  2850     NaN      30
6    7782   CLARK    MANAGER  7839.0  1981-06-09  2450     NaN      10
7    7788   SCOTT    ANALYST  7566.0  1987-04-19  3000     NaN      20
8    7839    KING  PRESIDENT     NaN  1981-11-17  5000     NaN      10
9    7844  TURNER   SALESMAN  7698.0  1981-09-08  1500     0.0      30
10   7876   ADAMS      CLERK  7788.0  1987-05-23  1100     NaN      20
11   7900   JAMES      CLERK  7698.0  1981-12-03   950     NaN      30
12   7902    FORD    ANALYST  7566.0  1981-12-03  3000     NaN      20
13   7934  MILLER      CLERK  7782.0  1982-01-23  1300     NaN      10

SQL数据库读取

如果想要从 SQL 数据库读取数据,首先您应该使用 Python 和数据库建立连接,然后将查询语句传递给 read_sql_query() 方法,下面做简单地演示:

1) 安装pymysql模块
pip install pymysql
2) 建立数据连接
from sqlalchemy import create_engine
# 创建一个MySQL连接器,用户名为root,密码为1234# 地址为127.0.0.1,数据库名称为testdb,编码为UTF-8
engine = create_engine('mysql+pymysql://root:1234@127.0.0.1:3306/testdb?charset=utf8')
print(engine)
3) 数据库读取数据

数据库中的表结构:

在这里插入图片描述

在 SQLite 数据库中创建一张信息表,您可以随意添加一些信息,最后使用下列方法读取数据即可:

from sqlalchemy import create_engine
import pandas as pd
# 创建一个MySQL连接器,用户名为root,密码为1234# 地址为127.0.0.1,数据库名称为testdb,编码为UTF-8
engine = create_engine('mysql+pymysql://root:admin1@127.0.0.1:3306/mydb?charset=utf8')

df = pd.read_sql('emp',con=engine)
print(df)

运行结果:

    EMPNO   ENAME        JOB     MGR   HIREDATE   SAL    COMM  DEPTNO
0    7369   SMITH      CLERK  7902.0 1980-12-17   800     NaN      20
1    7499   ALLEN   SALESMAN  7698.0 1981-02-20  1600   300.0      30
2    7521    WARD   SALESMAN  7698.0 1981-02-22  1250   500.0      30
3    7566   JONES    MANAGER  7839.0 1981-04-02  2975     NaN      20
4    7654  MARTIN   SALESMAN  7698.0 1981-09-28  1250  1400.0      30
5    7698   BLAKE    MANAGER  7839.0 1981-05-01  2850     NaN      30
6    7782   CLARK    MANAGER  7839.0 1981-06-09  2450     NaN      10
7    7788   SCOTT    ANALYST  7566.0 1987-04-19  3000     NaN      20
8    7839    KING  PRESIDENT     NaN 1981-11-17  5000     NaN      10
9    7844  TURNER   SALESMAN  7698.0 1981-09-08  1500     0.0      30
10   7876   ADAMS      CLERK  7788.0 1987-05-23  1100     NaN      20
11   7900   JAMES      CLERK  7698.0 1981-12-03   950     NaN      30
12   7902    FORD    ANALYST  7566.0 1981-12-03  3000     NaN      20
13   7934  MILLER      CLERK  7782.0 1982-01-23  1300     NaN      10
  • 4
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值