文件类型
文件时数据的抽象和集合
- 文件是存储在辅助存储器上的数据序列
- 文件是数据存储的一种形式
- 文件展现形态:文本文件和二进制文件
文本文件和二进制文件
- 文本文件和二进制文件只是文件的展示方式
- 本质上,所有文件都是二进制形式存储
- 形式上,所有文件采用两种方式展示
# 文本文件
文件是数据的抽象和集合
- 由单一特定编码组成的文件,如utf-8编码
- 由于存在编码,也被看成是存储着长字符串
- 适用于例如:*.txt、*.py
# 二进制文件
文件是数据的抽象和集合
- 直接由比特0和1组成,没有统一字符编码
- 一般存在二进制0和1的组织结构,即文件格式
- 适用于例如:*.png文件、*.avi文件
#f.txt
中国是个伟大的国家!
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
tf = open("f.txt","rt")
print(tf.readline())
tf.close()
====================== RESTART: C:\Python3.7.0\test.py ======================
中国是个伟大的国家!
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
tf = open("f.txt","rb")
print(tf.readline())
tf.close()
====================== RESTART: C:\Python3.7.0\test.py ======================
b'\xd6\xd0\xb9\xfa\xca\xc7\xb8\xf6\xce\xb0\xb4\xf3\xb5\xc4\xb9\xfa\xbc\xd2\xa3\xa1'
文件操作
文件的打开和关闭
a =open(,)
a.close()
读文件
a.read(size)
a,readline(size)
a,readlines(hint)
写文件
a.write(s)
a,writelines(lines)
a,seek(offset)
文件路径
绝对路径
”D:/Python/f.txt“
”D:\\Python\\f.txt“
相对路径
./src/f.txt
../f.txt
f.txt
打开模式
‘r' 只读模式,默认值,如果文件不存在,返回FileNotFoundError
’w' 覆盖写模式,文件不存在则创建,存在则完全覆盖
'x' 创建写模式,文件不存在则创建,存在则返回FileExistsError
'a' 追加写模式,文件不存在则创建,存在则在文件最后追加内容
'b' 二进制文件模式
't' 文本文件模式,默认值
'+' 与/r/w/x/a一同使用,在原功能基础上增加同时读写功能
文件的读取
<f>.read(size =- 1) 读入全部内容,如果给出参数,入读前sieze长度
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
tf = open("f.txt","rt")
print(tf.read(2))
tf.close()
====================== RESTART: C:\Python3.7.0\test.py ======================
中国
<f>.readline(size =- 1) 读入一行内容,如果给出参数,读入该行钱size长度
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
tf = open("f.txt","rt")
print(tf.readline())
tf.close()
====================== RESTART: C:\Python3.7.0\test.py ======================
中国是个伟大的国家!
<f>.readline(hint =- 1) 读入文件所有行,以每行为元素形成列表,如果给出参数,读入前hint行
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
tf = open("f.txt","rt")
print(tf.readlines())
tf.close()
====================== RESTART: C:\Python3.7.0\test.py ======================
['中国是个伟大的国家!']
遍历全文本1
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
fname = input("please input open filename:")
fo = open(fname,"r")
txt = fo.read()#对全文txt进行处理
fo.close()
该程序将文本一次性读入内容,但要是内容过大,会造成内存占用过大
全文本遍历2
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
fname = input("please input open filename:")
fo = open(fname,"r")
txt = fo.read(2)
while txt != "":
txt = fo.read(2)
fo.close()
该程序按数量读入,逐步处理
逐行遍历文件1
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
fname = input("please input open filename:")
fo = open(fname,"r")
for line in fo.readlines():
print(line)
fo.close()
一次读入,分行处理
逐行遍历文件2
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
fname = input("please input open filename:")
fo = open(fname,"r")
for line in fo:
print(line)
fo.close()
分行读入,逐行处理
文件写入
<f>.write(s) 向文件写入一个字符串或字节流
f.write("hello python")
<f>/writelines(lines) 将一个元素全为字符串的列表写入文件
ls = ["中国","法国","美国"]
f.writelines(ls)
中国法国美国
<f>.seek(offset) 改变当前文件操作指针的位置,offset含义如下:
0文件开头,1当前位置,2文件结尾
f.seek(0) #回到文件开头
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
fo = open("output.txt","w+")
ls = ["中国","法国","美国"]
fo.writelines(ls)
for line in fo:
print(line)
fo.close()
文件是写入,却没有输出任何内容,这是因为读指针一直在写入后面
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
fo = open("output.txt","w+")
ls = ["中国","法国","美国"]
fo.writelines(ls)
fo.seek(0)
for line in fo:
print(line)
fo.close()
====================== RESTART: C:\Python3.7.0\test.py ======================
中国法国美国
自动轨迹绘制
行进距离,转向判断(0左转,1右转),转向角度,R,G,B
#接口文件
#data.txt
300,0,144,1,0,0
300,0,144,0,1,0
300,0,144,0,0,1
300,0,144,1,1,0
300,0,108,0,1,1
184,0,72,1,0,1
184,0,72,0,0,0
184,0,72,0,0,0
184,0,72,0,0,0
184,1,72,1,0,1
184,1,72,0,0,0
184,1,72,0,0,0
184,1,72,0,0,0
184,1,72,0,0,0
184,1,720,0,0,0
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import turtle as t
t.title('自动轨迹绘制')
t.setup(800,600,0,0)
t.pencolor("red")
t.pensize(5)
#数据读取
datals = []
f = open("data.txt")
for line in f:
line = line.replace("\n","")
datals.append(list(map(eval,line.split(","))))
f.close()
#自定绘制
for i in range(len(datals)):
t.pencolor(datals[i][3],datals[i][4],datals[i][5])
t.fd(datals[i][0])
if datals[i][1]:
t.right(datals[i][2])
else:
t.left(datals[i][2])
一维数据
如果数据间有序:使用列表类型
ls = [1,2,3]
- 列表类型可以表达一维有序数据
- for循环可以遍历数据,进而对每个数据进行处理
如果数据间无序:使用集合类型
st = {1,23,}
- 集合类型可以表达一维无序数据
- for循环可以遍历数据,进而对每个数据进行处理
一维数据的存储
存储方式1:空格分隔
- 使用一个或多个空格分隔进行存储,不换行
- 缺点:数据中不能存在空格
存储方式2:逗号分隔
- 使用英文半角逗号分隔数据进行存储,不换行
- 缺点:数据中不能存在英文逗号
存储方式3:其他方式
- 使用其他符号或符号组合分隔,建议采用特殊符号
- 缺点:需要根据数据特点定义,通用性较差
一维数据的处理
#以空格分隔方式读取一维数据
中国 美国 日本
txt = open(fname).read()
ls = txt.split()
f,close()
#以$分隔方式读取一维数据
中国$美国$日本
txt = open(fname).read()
ls = txt.split($)
f,close()
#在文件中以空格分隔写入一维数据
ls = ['中国','美国','日本']
f = open(fname,'w')
f.write(' '.join(ls))
f.close()
#在文件中以$分隔写入一维数据
ls = ['中国','美国','日本']
f = open(fname,'w')
f.write('$'.join(ls))
f.close()
二维数据
使用列表类型
[[123,123,123],[123,123,123]]
- 使用两层for循环遍历每个元素
- 外层列表中每个元素可以对应一行,也可以对应一列
CSV格式
CSV:Comma-Separated Values
- 国际通用的一二维数据存储格式,一般*.csv扩展名
- 每行一个一维数据,采用逗号分隔,无空行
- Excel软件可读入输出,一般编辑软件都可以产生
- CSV格式是数据转换通用格式
- 如果某个元素缺失,逗号仍要保留
- 二维数据的表头可以作为数据存储,也可以另行存储
- 逗号为英文半角逗号,逗号与数据之间无额外空格
二维数据存储
- 按行存或者按列存都可以,具体由程序决定
- 一般索引习惯:ls[row][column],先行后列
从CSV格式的文件中读入数据
fo = openn(fname)
ls = []
for line in fo:
line = line.replace("\n","")
ls.append(line.split(","))
fo.close()
将数据写入CSV格式的文件
ls = [[],[],[]]#二维列表
f = open(fname,'w')
for item in ls:
f.write(','.join(item) + '\n')
f.close()
二维数据的逐一处理
采用二层循环
ls = [[],[],[],]#二维列表
for row in ls:
for column in row:
print(ls[row][column])
wordcloud库
wordcloud是优秀的词云展示第三方库
- 词云以词语为基本单位,更加直观和艺术的展示文本
C:\Python3.7.0>pip install wordcloud
Collecting wordcloud
Downloading https://files.pythonhosted.org/packages/23/4e/1254d26ce5d36facdcbb5820e7e434328aed68e99938c75c9d4e2fee5efb/wordcloud-1.5.0-cp37-cp37m-win_amd64.whl (153kB)
100% |████████████████████████████████| 163kB 215kB/s
Collecting pillow (from wordcloud)
Downloading https://files.pythonhosted.org/packages/55/ea/305f61258278790706e69f01c53e107b0830ea5a4a69aa1f2c11fe605ed3/Pillow-5.3.0-cp37-cp37m-win_amd64.whl (1.6MB)
100% |████████████████████████████████| 1.6MB 612kB/s
Collecting numpy>=1.6.1 (from wordcloud)
Downloading https://files.pythonhosted.org/packages/96/d6/53a59338c613e0c3ec7e3052bbf068a5457a005a5f7ad4ae005167c3597e/numpy-1.15.2-cp37-none-win_amd64.whl (13.5MB)
100% |████████████████████████████████| 13.5MB 925kB/s
Installing collected packages: pillow, numpy, wordcloud
Successfully installed numpy-1.15.2 pillow-5.3.0 wordcloud-1.5.0
wordcloud库把词云当作一个WordCloud对象
- wordcloud.WordCloud()代表一个文本对应的词云
- 可以根据文本中词语出现的频率等参数绘制词云
- 绘制词云的形状、尺寸和颜色都可以设定
w = wordcloud.WordCloud()
- 以WordCloud对象为基础
- 配置参数、加载文本、输出文件
w.generate(txt) 向WordCloud对象w中加载文本txt
w.generate("Python and WordCloud")
w.to_file(filename) 将词云输出为图像文件,*.png或*.jpg格式
w.to_file("outfile.png")
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import wordcloud
c = wordcloud.WordCloud()
c.generate("wordcloud by Python")
c.to_file("pywordcloud.png")
====================== RESTART: C:\Python3.7.0\test.py ======================
Traceback (most recent call last):
File "C:\Python3.7.0\test.py", line 6, in <module>
c = wordcloud.WordCloud()
File "D:\Python\Python37\lib\site-packages\wordcloud\wordcloud.py", line 300, in __init__
import matplotlib
ModuleNotFoundError: No module named 'matplotlib'
C:\Python3.7.0>pip install matplotlib
Collecting matplotlib
Downloading https://files.pythonhosted.org/packages/7e/ce/a4b83c538b48841d4d060569cc0e6afeab5caa95993613f15294ce1b1380/matplotlib-3.0.0-cp37-cp37m-win_amd64.whl (8.9MB)
100% |████████████████████████████████| 8.9MB 2.0MB/s
Collecting kiwisolver>=1.0.1 (from matplotlib)
Downloading https://files.pythonhosted.org/packages/7c/be/7ae355b45699460e369ebf88d86058fca26827933974cc3f6b6b7800a324/kiwisolver-1.0.1-cp37-none-win_amd64.whl (57kB)
100% |████████████████████████████████| 61kB 1.6MB/s
Requirement already satisfied: numpy>=1.10.0 in d:\python\python37\lib\site-packages (from matplotlib) (1.15.2)
Collecting cycler>=0.10 (from matplotlib)
Downloading https://files.pythonhosted.org/packages/f7/d2/e07d3ebb2bd7af696440ce7e754c59dd546ffe1bbe732c8ab68b9c834e61/cycler-0.10.0-py2.py3-none-any.whl
Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 (from matplotlib)
Downloading https://files.pythonhosted.org/packages/2b/4a/f06b45ab9690d4c37641ec776f7ad691974f4cf6943a73267475b05cbfca/pyparsing-2.2.2-py2.py3-none-any.whl (57kB)
100% |████████████████████████████████| 61kB 2.2MB/s
Collecting python-dateutil>=2.1 (from matplotlib)
Downloading https://files.pythonhosted.org/packages/cf/f5/af2b09c957ace60dcfac112b669c45c8c97e32f94aa8b56da4c6d1682825/python_dateutil-2.7.3-py2.py3-none-any.whl (211kB)
100% |████████████████████████████████| 215kB 5.4MB/s
Requirement already satisfied: setuptools in d:\python\python37\lib\site-packages (from kiwisolver>=1.0.1->matplotlib) (39.0.1)
Collecting six (from cycler>=0.10->matplotlib)
Downloading https://files.pythonhosted.org/packages/67/4b/141a581104b1f6397bfa78ac9d43d8ad29a7ca43ea90a2d863fe3056e86a/six-1.11.0-py2.py3-none-any.whl
Installing collected packages: kiwisolver, six, cycler, pyparsing, python-dateutil, matplotlib
Successfully installed cycler-0.10.0 kiwisolver-1.0.1 matplotlib-3.0.0 pyparsing-2.2.2 python-dateutil-2.7.3 six-1.11.0
配置对象参数
w = wordcloud.WordCloud(<参数>)
width 指定词云对象生成图片的宽度,默认400像素
w = wordcloud.WordCloud(width = 600)
height 指定词云对象生成图片的高度,默认200像素
w = wordcloud.WordCloud(height = 400)
min_font_size 指定词云中字体最小的字号,默认4号
w = wordcloud.WordCloud(min_font_size = 10)
max_font_size 指定词云中字体最大的字号,根据高度自动调节
w = wordcloud.WordCloud(max_font_size = 10)
font_step 指定词云中字体的步进间隔,默认为1
w = wordcloud.WordCloud(font_step = 2)
font+path 指定字体文件的路径,默认None
w = wordcloud.WordCloud(font_path = "msyh.ttc")
max_words 指定词云显示的最大单词数量,默认200
w = wordcloud.WordCloud(max_words = 20)
stop_words 指定词云的排除词列表,即不显示的单词列表
w = wordcloud.WordCloud(stop_words = {"Python"})
mask 指定词云形状,默认为长方形,需要引用imread()函数
from scipy.misc import imread
mk = imread("*.png")
w = wordcloud.WordCloud(mask = mk)
background_color 指定词云图片的背景颜色,默认为黑色
w = wordcloud.WordCloud(background_color = "white")
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import wordcloud
txt = "life is short,you need python"
w = wordcloud.WordCloud(background_color = "white")
w.generate(txt)
w.to_file("pywordcloud.png")
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import wordcloud
import jieba
txt = "在中国,民办高等教育近年来蓬勃发展,\
但目前主要以职业技术教育为主,还未曾在前沿科\
学研究和高技术领域的高层次人才培养方面进行尝试。"
w = wordcloud.WordCloud(width = 1000,\
font_path = "msyh.ttc",\
height = 700)
w.generate(" ".join(jieba.lcut(txt)))
w.to_file("pywordcloud.png")
政府工作报告词云
python123.io/resources/pye/新时代中国特色社会主义.txt
python123.io/resources/pye/关于实施乡村振兴战略的意见.txt
#新时代中国特色社会主义
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import wordcloud
import jieba
f = open("新时代中国特色社会主义.txt",encoding = "utf-8")
t = f.read()
f.close()
ls = jieba.lcut(t)
txt = " ".join(ls)
w = wordcloud.WordCloud(font_path = "msyh.ttc",\
width = 1000,height = 700,\
background_color = "white")
w.generate(txt)
w.to_file("grwordcloud.png")
#关于实施乡村振兴战略的意见
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import wordcloud
import jieba
f = open("关于实施乡村振兴战略的意见.txt",encoding = "utf-8")
t = f.read()
f.close()
ls = jieba.lcut(t)
txt = " ".join(ls)
w = wordcloud.WordCloud(font_path = "msyh.ttc",\
width = 1000,height = 700,\
background_color = "white")
w.generate(txt)
w.to_file("grwordcloud.png")
限制数量
添加max_words = 15即可
自定义形状
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import wordcloud
import jieba
from scipy.misc import imread
mask = imread("grwordcloud1.png")
f = open("关于实施乡村振兴战略的意见.txt",encoding = "utf-8")
t = f.read()
f.close()
ls = jieba.lcut(t)
txt = " ".join(ls)
w = wordcloud.WordCloud(font_path = "msyh.ttc",\
width = 1000,height = 700,\
background_color = "white",\
mask = mask)
w.generate(txt)
w.to_file("grwordcloud.png")
====================== RESTART: C:\Python3.7.0\test.py ======================
Traceback (most recent call last):
File "C:\Python3.7.0\test.py", line 6, in <module>
from scipy.misc import imread
ModuleNotFoundError: No module named 'scipy'
C:\Python3.7.0>pip install numpy
Requirement already satisfied: numpy in d:\python\python37\lib\site-packages (1.15.2)
C:\Python3.7.0>pip install scipy
Collecting scipy
Downloading https://files.pythonhosted.org/packages/c4/f3/752fd6778a9d07fddb2b02dac5895287e594d2d0d156a2a422c710f6a851/scipy-1.1.0-cp37-none-win_amd64.whl (30.9MB)
100% |████████████████████████████████| 30.9MB 670kB/s
Requirement already satisfied: numpy>=1.8.2 in d:\python\python37\lib\site-packages (from scipy) (1.15.2)
Installing collected packages: scipy
Successfully installed scipy-1.1.0
参考自:https://blog.csdn.net/cylj102908/article/details/62229610
====================== RESTART: C:\Python3.7.0\test.py ======================
Warning (from warnings module):
File "C:\Python3.7.0\test.py", line 8
mask = imread("grwordcloud1.png")
DeprecationWarning: `imread` is deprecated!
`imread` is deprecated in SciPy 1.0.0, and will be removed in 1.2.0.
Use ``imageio.imread`` instead.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\ADMINI~1.000\AppData\Local\Temp\jieba.cache
Loading model cost 0.830 seconds.
Prefix dict has been built succesfully.
参考自:https://blog.csdn.net/zlrai5895/article/details/79517150