python读取、写入文件（不全）

最新推荐文章于 2024-04-24 11:57:57 发布

Milanien

最新推荐文章于 2024-04-24 11:57:57 发布

阅读量4.8k

点赞数 1

文章标签： python 文件读取文件写入

python能够打开的文件类型有：txt，xlsx，csv，zip，json，xml，html，images，hdf，pdf，docx，mp3，mp4

1.txt文件

f=open(filepath,'r')或with open(filepath,'r')或codecs.open()或io.open()

lines=f.read()或for line in f.read()或f.readline()或f.readlines()

常用参数

1）filepath

2）mode，有'r','w','a','wb','rb'

3)encoding,常见有utf-8

写入方式f.write(),写入字符串

f.writelines(),参数可以是列表写入多行

f.seek()

（1）选项=0，表示将文件指针指向从文件头部到“偏移量”字节处
（2）选项=1，表示将文件指针指向从文件的当前位置，向后移动“偏移量”字节
（3）选项=2，表示将文件指针指向从文件的尾部，向前移动“偏移量”字节

f.flush() 将修改写入到文件中（无需关闭文件）

f.tell() 获取指针位置

2.xlsx文件

pf = pd.read_excel('train.xlsx',sheetname = 'xx')

read_excel常用参数

1)io，文件路径

2）sheetname，默认为0，即第一张表，None表示返回全表，格式为dict of Dataframe

3）header，默认第一行作为列名

4）skiprows，省略行数

5）skip_footer，省略从尾部开始数的行数

6）index_col，指定某列作为行索引

7）names，指定列的名字

to_excel常用参数

3.csv文件

import pandas as pd
pf = pd.read_csv('train.csv')

read_csv常用参数

1）filepath_or_buffer ,可以是文件handle，StringIO对象，文件路径string或是URL

2）sep，分隔符，如可以是‘，’

3）header，是被用作列名的行的编号，header=0表示把第一行作为列名，header=None时，自动加上列索引

4）names，作为列名的list，会做去重处理

5）dtype，列数据类型

6）nrows，读取多少行

7）chunksize，按块读取时，指定块的行数

8）index_col，指定某一列作为行索引，也可指定多列，形成层次索引。默认不指定，加上从0开始的数字索引。

9）parse_dates=True，可令字符串解析成时间格式。

import pandas as pd
pf = pd.to_csv('train.csv')

to_csv常用参数

1）path_or_buf

2）sep

3) columns,可选列写入

4）encoding

4.zip文件

import zipfile
archive = zipfile.ZipFile('T.zip', 'r')
df = archive.read('train.csv')

5.json

import pandas as pd
df = pd.read_json('train.json')

常用参数

1）path_or_buf

2)orient，json字符串格式

https://blog.csdn.net/qq_24499417/article/details/81428594

6. xml

import xml.etree.ElementTree as ET
tree = ET.parse('/home/sunilray/Desktop/2 sigma/train.xml')

7.html

使用BeautifulSoup库来读取HTML文件

import urllib2 #if you are using python3+ version, import urllib.request

wiki = "https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India"

page = urllib2.urlopen(wiki) #For python 3 use urllib.request.urlopen(wiki)

from bs4 import BeautifulSoup

#Parse the html in the 'page' variable, and store it in Beautiful Soup format
soup = BeautifulSoup(page)

https://www.analyticsvidhya.com/blog/2015/10/beginner-guide-web-scraping-beautiful-soup-python/

8.images

from scipy import misc
f = misc.face()
misc.imsave('face.png', f) # uses the Image module (PIL)
import matplotlib.pyplot as plt
plt.imshow(f)
plt.show()

https://www.analyticsvidhya.com/blog/2014/12/image-processing-python-basics/

9.hdf

import pandas as pd
df = pd.read_hdf('train.h5')

10.pdf

安装pdfminer库

python setup.py install

pdf2txt.py train.pdf  # 测试读取pdf

11.docx

安装docx2txt库：

pip install docx2txt

读取docx文件：

import docx2txt
text = docx2txt.process("file.docx")

12.mp3

http://pymedia.org/tut/index.html

13.mp4

http://zulko.github.io/moviepy/

from moviepy.editor import VideoFileClip
clip = VideoFileClip(‘<video_file>.mp4’)

引用文献

https://blog.csdn.net/hellocsz/article/details/79623142

这篇博文很好，实例多，便于理解

https://blog.csdn.net/sinat_35562946/article/details/81058221

这篇也很好

https://www.cnblogs.com/hackpig/p/8215786.html

讲txt文件

https://blog.csdn.net/u010801439/article/details/80033341

https://www.jianshu.com/p/03e3cfd5519e

https://www.analyticsvidhya.com/blog/2017/03/read-commonly-used-formats-using-python/

Milanien

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
python读取、写入文件（不全）

python能够打开的文件类型有：txt，xlsx，csv，zip，json，xml，html，images，hdf，pdf，docx，mp3，mp41.txt文件f=open(filepath,'r')或with open(filepath,'r')或codecs.open()或io.open()lines=f.read()或for line in f.read()或f.readli...
复制链接

扫一扫