pandas4:读写多种格式文件pickle,csv,excel,json,html,sql...

import pandas as pd
scientists = pd.read_csv('./data/scientists.csv')
names = scientists['Name']
print(scientists)
print(names)
                   Name        Born        Died  Age          Occupation
0     Rosaline Franklin  1920-07-25  1958-04-16   37             Chemist
1        William Gosset  1876-06-13  1937-10-16   61        Statistician
2  Florence Nightingale  1820-05-12  1910-08-13   90               Nurse
3           Marie Curie  1867-11-07  1934-07-04   66             Chemist
4         Rachel Carson  1907-05-27  1964-04-14   56           Biologist
5             John Snow  1813-03-15  1858-06-16   45           Physician
6           Alan Turing  1912-06-23  1954-06-07   41  Computer Scientist
7          Johann Gauss  1777-04-30  1855-02-23   77       Mathematician
0       Rosaline Franklin
1          William Gosset
2    Florence Nightingale
3             Marie Curie
4           Rachel Carson
5               John Snow
6             Alan Turing
7            Johann Gauss
Name: Name, dtype: object

1.读写pickle格式

import os
path_name = './output/scientist_name_Series.pickle'
if not os.path.exists(path_name):
    names.to_pickle(path_name)

path_all = './output/scientist_DataFrame.pickle'
if not os.path.exists(path_all):
    scientists.to_pickle(path_all)

#读取pickle文件
scientists_names_frme_pickle = pd.read_pickle(path_name)
print(scientists_names_frme_pickle)
0       Rosaline Franklin
1          William Gosset
2    Florence Nightingale
3             Marie Curie
4           Rachel Carson
5               John Snow
6             Alan Turing
7            Johann Gauss
Name: Name, dtype: object
path_all = './output/scientist_DataFrame.pickle'
if not os.path.exists(path_all):
    scientists.to_pickle(path_all)
scientists_frme_pickle = pd.read_pickle(path_all)
print(scientists_frme_pickle)
                   Name        Born        Died  Age          Occupation
0     Rosaline Franklin  1920-07-25  1958-04-16   37             Chemist
1        William Gosset  1876-06-13  1937-10-16   61        Statistician
2  Florence Nightingale  1820-05-12  1910-08-13   90               Nurse
3           Marie Curie  1867-11-07  1934-07-04   66             Chemist
4         Rachel Carson  1907-05-27  1964-04-14   56           Biologist
5             John Snow  1813-03-15  1858-06-16   45           Physician
6           Alan Turing  1912-06-23  1954-06-07   41  Computer Scientist
7          Johann Gauss  1777-04-30  1855-02-23   77       Mathematician

2.读写CSV格式

2.1 读写csv格式的Serial

import pandas as pd
scientists = pd.read_csv('./data/scientists.csv')
names = scientists['Name']
print(scientists)
print(names)
                   Name        Born        Died  Age          Occupation
0     Rosaline Franklin  1920-07-25  1958-04-16   37             Chemist
1        William Gosset  1876-06-13  1937-10-16   61        Statistician
2  Florence Nightingale  1820-05-12  1910-08-13   90               Nurse
3           Marie Curie  1867-11-07  1934-07-04   66             Chemist
4         Rachel Carson  1907-05-27  1964-04-14   56           Biologist
5             John Snow  1813-03-15  1858-06-16   45           Physician
6           Alan Turing  1912-06-23  1954-06-07   41  Computer Scientist
7          Johann Gauss  1777-04-30  1855-02-23   77       Mathematician
0       Rosaline Franklin
1          William Gosset
2    Florence Nightingale
3             Marie Curie
4           Rachel Carson
5               John Snow
6             Alan Turing
7            Johann Gauss
Name: Name, dtype: object
import os
path_name = './output/scientist_name_Series.CSV'
if not os.path.exists(path_name):
    names.to_csv(path_name)
scientists_name_frme_CSV = pd.read_csv(path_name)
print(scientists_name_frme_CSV)
   Unnamed: 0                  Name
0           0     Rosaline Franklin
1           1        William Gosset
2           2  Florence Nightingale
3           3           Marie Curie
4           4         Rachel Carson
5           5             John Snow
6           6           Alan Turing
7           7          Johann Gauss
path_name = './output/scientist_name_Series1.CSV'
if not os.path.exists(path_name):
    names.to_csv(path_name,sep='*')
scientists_name_frme_CSV = pd.read_csv(path_name)
print(scientists_name_frme_CSV)
                    *Name
0     0*Rosaline Franklin
1        1*William Gosset
2  2*Florence Nightingale
3           3*Marie Curie
4         4*Rachel Carson
5             5*John Snow
6           6*Alan Turing
7          7*Johann Gauss
path_name = './output/scientist_name_Series2.CSV'
if not os.path.exists(path_name):
    names.to_csv(path_name,index=False)
scientists_name_frme_CSV = pd.read_csv(path_name)
print(scientists_name_frme_CSV)
                   Name
0     Rosaline Franklin
1        William Gosset
2  Florence Nightingale
3           Marie Curie
4         Rachel Carson
5             John Snow
6           Alan Turing
7          Johann Gauss

2.2 读写csv格式的DataFrame

path_name = './output/scientist_DataFrame.CSV'
if not os.path.exists(path_name):
    scientists.to_csv(path_name)
scientists_frme_CSV = pd.read_csv(path_name)
print(scientists_frme_CSV)
   Unnamed: 0                  Name        Born        Died  Age  \
0           0     Rosaline Franklin  1920-07-25  1958-04-16   37   
1           1        William Gosset  1876-06-13  1937-10-16   61   
2           2  Florence Nightingale  1820-05-12  1910-08-13   90   
3           3           Marie Curie  1867-11-07  1934-07-04   66   
4           4         Rachel Carson  1907-05-27  1964-04-14   56   
5           5             John Snow  1813-03-15  1858-06-16   45   
6           6           Alan Turing  1912-06-23  1954-06-07   41   
7           7          Johann Gauss  1777-04-30  1855-02-23   77   

           Occupation  
0             Chemist  
1        Statistician  
2               Nurse  
3             Chemist  
4           Biologist  
5           Physician  
6  Computer Scientist  
7       Mathematician  
path_name = './output/scientist_DataFrame2.CSV'
if not os.path.exists(path_name):
    scientists.to_csv(path_name,sep='*')
scientists_frme_CSV = pd.read_csv(path_name)
print(scientists_frme_CSV)
                      *Name*Born*Died*Age*Occupation
0  0*Rosaline Franklin*1920-07-25*1958-04-16*37*C...
1  1*William Gosset*1876-06-13*1937-10-16*61*Stat...
2  2*Florence Nightingale*1820-05-12*1910-08-13*9...
3     3*Marie Curie*1867-11-07*1934-07-04*66*Chemist
4  4*Rachel Carson*1907-05-27*1964-04-14*56*Biolo...
5     5*John Snow*1813-03-15*1858-06-16*45*Physician
6  6*Alan Turing*1912-06-23*1954-06-07*41*Compute...
7  7*Johann Gauss*1777-04-30*1855-02-23*77*Mathem...
path_name = './output/scientist_DataFrame3.CSV'
if not os.path.exists(path_name):
    scientists.to_csv(path_name,index=False)
scientists_frme_CSV = pd.read_csv(path_name)
print(scientists_frme_CSV)
                   Name        Born        Died  Age          Occupation
0     Rosaline Franklin  1920-07-25  1958-04-16   37             Chemist
1        William Gosset  1876-06-13  1937-10-16   61        Statistician
2  Florence Nightingale  1820-05-12  1910-08-13   90               Nurse
3           Marie Curie  1867-11-07  1934-07-04   66             Chemist
4         Rachel Carson  1907-05-27  1964-04-14   56           Biologist
5             John Snow  1813-03-15  1858-06-16   45           Physician
6           Alan Turing  1912-06-23  1954-06-07   41  Computer Scientist
7          Johann Gauss  1777-04-30  1855-02-23   77       Mathematician

3读写excel文件

3.1读写excel文件的Serial

import pandas as pd
scientists = pd.read_csv('./data/scientists.csv')
names = scientists['Name']
print(scientists)
#print(names)
                   Name        Born        Died  Age          Occupation
0     Rosaline Franklin  1920-07-25  1958-04-16   37             Chemist
1        William Gosset  1876-06-13  1937-10-16   61        Statistician
2  Florence Nightingale  1820-05-12  1910-08-13   90               Nurse
3           Marie Curie  1867-11-07  1934-07-04   66             Chemist
4         Rachel Carson  1907-05-27  1964-04-14   56           Biologist
5             John Snow  1813-03-15  1858-06-16   45           Physician
6           Alan Turing  1912-06-23  1954-06-07   41  Computer Scientist
7          Johann Gauss  1777-04-30  1855-02-23   77       Mathematician
names.to_excel('./output/scientist_name_series.xls')
names_df = names.to_frame()
names_df.to_excel('./output/scientist_name_df.xls')
scientists.to_excel('./output/scientist_df1.xls',index=False)
scientists.to_excel('./output/scientist_df2.xls',sheet_name='scientist',index=False)
scientists_name_frme_excel= pd.read_excel('./output/scientist_name_series.xls')
print(scientists_name_frme_excel)
   Unnamed: 0                  Name
0           0     Rosaline Franklin
1           1        William Gosset
2           2  Florence Nightingale
3           3           Marie Curie
4           4         Rachel Carson
5           5             John Snow
6           6           Alan Turing
7           7          Johann Gauss
scientists_name_df_frme_excel= pd.read_excel('./output/scientist_name_df.xls')
print(scientists_name_df_frme_excel)
   Unnamed: 0                  Name
0           0     Rosaline Franklin
1           1        William Gosset
2           2  Florence Nightingale
3           3           Marie Curie
4           4         Rachel Carson
5           5             John Snow
6           6           Alan Turing
7           7          Johann Gauss
scientists_frme_excel= pd.read_excel('./output/scientist_df1.xls')
print(scientists_frme_excel)
                   Name        Born        Died  Age          Occupation
0     Rosaline Franklin  1920-07-25  1958-04-16   37             Chemist
1        William Gosset  1876-06-13  1937-10-16   61        Statistician
2  Florence Nightingale  1820-05-12  1910-08-13   90               Nurse
3           Marie Curie  1867-11-07  1934-07-04   66             Chemist
4         Rachel Carson  1907-05-27  1964-04-14   56           Biologist
5             John Snow  1813-03-15  1858-06-16   45           Physician
6           Alan Turing  1912-06-23  1954-06-07   41  Computer Scientist
7          Johann Gauss  1777-04-30  1855-02-23   77       Mathematician
scientists_frme_excel= pd.read_excel('./output/scientist_df2.xls')
print(scientists_frme_excel)
                   Name        Born        Died  Age          Occupation
0     Rosaline Franklin  1920-07-25  1958-04-16   37             Chemist
1        William Gosset  1876-06-13  1937-10-16   61        Statistician
2  Florence Nightingale  1820-05-12  1910-08-13   90               Nurse
3           Marie Curie  1867-11-07  1934-07-04   66             Chemist
4         Rachel Carson  1907-05-27  1964-04-14   56           Biologist
5             John Snow  1813-03-15  1858-06-16   45           Physician
6           Alan Turing  1912-06-23  1954-06-07   41  Computer Scientist
7          Johann Gauss  1777-04-30  1855-02-23   77       Mathematician

3.2读写excel文件的DataFrame

import xlrd
#work book, sheet
data = xlrd.open_workbook('./output/scientist_df2.xls')
sheet = data.sheet_by_name('scientist')
#sheet = data.sheet_by_index(0)
print(sheet.row_values(1))
print(sheet.col_values(1))
['Rosaline Franklin', '1920-07-25', '1958-04-16', 37.0, 'Chemist']
['Born', '1920-07-25', '1876-06-13', '1820-05-12', '1867-11-07', '1907-05-27', '1813-03-15', '1912-06-23', '1777-04-30']
print('行数=',sheet.nrows)
print('列数=',sheet.ncols)
print(sheet)
行数= 9
列数= 5
<xlrd.sheet.Sheet object at 0x7f339c292650>
print(sheet.cell(0,0).value)
print(sheet.cell(2,3).value)
Name
61.0
print(data.sheet_names())
['scientist']
print(sheet.name)
print(sheet.row_values(1))
print(sheet.col_values(1))
scientist
['Rosaline Franklin', '1920-07-25', '1958-04-16', 37.0, 'Chemist']
['Born', '1920-07-25', '1876-06-13', '1820-05-12', '1867-11-07', '1907-05-27', '1813-03-15', '1912-06-23', '1777-04-30']

4.其他格式 json,html,sql

import pandas as pd
scientists = pd.read_csv('./data/scientists.csv')
#scientists.to_clipboard()
#print(scientists.to_dicti())
print(scientists.to_html('./t.html'))
print(scientists.to_json('./t.json'))
None
None
import sqlite3
import sqlalchemy
engine = sqlalchemy.create_engine('sqlite:///my_db.sqlite')
scientists.to_sql('scientists',engine)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
是的,您说的是正确的。 Pandas 提供了多种方法,可以从各种不同的文件格式中读取数据。常见的文件格式包括 CSVExcelSQL 数据库等。 具体来说,Pandas 中读取不同文件格式的方法如下: - 读取 CSV 文件:使用 `read_csv` 函数。 - 读取 Excel 文件:使用 `read_excel` 函数。 - 读取 SQL 数据库:使用 `read_sql` 函数。 `read_csv` 函数可以读取逗号分隔的文本文件,并将其转换为 DataFrame 对象。其语法格式如下: ```python df = pd.read_csv(filepath_or_buffer, sep=',', header=0, names=None, index_col=None, dtype=None, encoding=None) ``` 其中,参数的含义为: - `filepath_or_buffer`:要读取的 CSV 文件路径或文件对象。 - `sep`:字段分隔符,默认为逗号 `,`。 - `header`:指定哪一行作为列名,默认为 `0`,即第一行作为列名。如果文件中没有列名,可以将其置为 `None`。 - `names`:自定义列名,如果文件中没有列名,可以通过该参数指定列名。 - `index_col`:指定哪一列作为索引,如果不指定,则默认使用自动生成的整数索引。 - `dtype`:指定每一列的数据类型,可以是 `int`、`float`、`str` 等。 - `encoding`:文件编码格式,默认为 UTF-8。 `read_excel` 函数可以读取 Excel 文件,并将其转换为 DataFrame 对象。其语法格式如下: ```python df = pd.read_excel(io, sheet_name=0, header=0, names=None, index_col=None, dtype=None, engine=None) ``` 其中,参数的含义为: - `io`:要读取的 Excel 文件路径、URL、文件对象或类文件对象。 - `sheet_name`:指定要读取的工作表名称或索引,默认为第一个工作表。 - `header`:指定哪一行作为列名,默认为 `0`,即第一行作为列名。如果文件中没有列名,可以将其置为 `None`。 - `names`:自定义列名,如果文件中没有列名,可以通过该参数指定列名。 - `index_col`:指定哪一列作为索引,如果不指定,则默认使用自动生成的整数索引。 - `dtype`:指定每一列的数据类型,可以是 `int`、`float`、`str` 等。 - `engine`:指定要使用的解析引擎,可以是 `openpyxl`、`xlrd` 等。 `read_sql` 函数可以从 SQL 数据库中读取数据,并将其转换为 DataFrame 对象。其语法格式如下: ```python df = pd.read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None) ``` 其中,参数的含义为: - `sql`:要执行的 SQL 查询语句。 - `con`:指定要连接的数据库,可以是 SQLAlchemy 引擎、连接对象或字符串。 - `index_col`:指定哪一列作为索引,如果不指定,则默认使用自动生成的整数索引。 - `coerce_float`:是否将浮点数转换为 Python 中的 float 类型,默认为 True。 - `params`:查询参数,可以是字符串、元组或字典。 - `parse_dates`:指定哪些列需要解析为日期时间类型。 - `columns`:自定义列名,如果文件中没有列名,可以通过该参数指定列名。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值