使用 open 读取文件
.read()
使用 open 不需要任何包就可以读取文件 r表示read 如果文件和 notebook 在同一个文件夹里则只需要写文件名就可以了,不需要填上完整的路径。 file.read 表示读取这个文件内所有内容
file = open ( 'data.txt' , 'r' )
print ( file . read( ) )
,A,B,C,D
0,foo,one,small,1
1,foo,one,large,2
2,foo,one,large,8
3,foo,two,small,3
4,foo,two,small,3
5,bar,one,large,4
6,bar,one,small,5
7,bar,two,small,6
8,bar,two,large,7
file = open ( 'data.txt' , 'r' )
print ( file . read( 5 ) )
,A,B,
.readlines()
如果把.read 改成.readlines 会出现不同的效果: 把所有东西放在一个列表里,里面的每一项等于原来文件中的每一行,\表示换行。 读取所有内容包括格式。它能够读取指定某一行的内容,.read 不行
file = open ( 'data.txt' , 'r' )
print ( file . readlines( ) )
[',A,B,C,D\n', '0,foo,one,small,1\n', '1,foo,one,large,2\n', '2,foo,one,large,8\n', '3,foo,two,small,3\n', '4,foo,two,small,3\n', '5,bar,one,large,4\n', '6,bar,one,small,5\n', '7,bar,two,small,6\n', '8,bar,two,large,7\n']
file = open ( 'data.txt' , 'r' )
print ( file . readlines( ) [ 5 ] )
4,foo,two,small,3
在读取数据时对里面的内容进行操作(使用 for 循环)
file = open ( 'data.txt' , 'r' )
i = 1
for line in file :
print ( 'read line' , i)
i = i+ 1
print ( line)
read line 1
,A,B,C,D
read line 2
0,foo,one,small,1
read line 3
1,foo,one,large,2
read line 4
2,foo,one,large,8
read line 5
3,foo,two,small,3
read line 6
4,foo,two,small,3
read line 7
5,bar,one,large,4
read line 8
6,bar,one,small,5
read line 9
7,bar,two,small,6
read line 10
8,bar,two,large,7
file = open ( 'data.csv' , 'r' )
i = 1
for line in file :
print ( 'read line' , i)
i = i+ 1
print ( line)
read line 1
,A,B,C,D
read line 2
0,foo,one,small,1
read line 3
1,foo,one,large,2
read line 4
2,foo,one,large,8
read line 5
3,foo,two,small,3
read line 6
4,foo,two,small,3
read line 7
5,bar,one,large,4
read line 8
6,bar,one,small,5
read line 9
7,bar,two,small,6
read line 10
8,bar,two,large,7
写入一个文件
w 表示 write 一定要写 file.close()表示已经写完了 \n 表示换行
file = open ( 'hello.txt' , 'w' )
file . write( 'Hello World!' )
file . close( )
使用 pandas 来读取文件
用 open 打开 txt 文件最方便,有大量数据的话一般使用 pandas 来读取 使用 pandas 能够还原数据写在 csv 中表格的样子。 有一个多出来的一列Uname:0,是因为pandas会给一个index而表格本身也有一个 index,如果想消掉这个,加上 index_col=0
import pandas as pd
df = pd. read_csv( 'data.csv' )
df
Unnamed: 0 A B C D 0 0 foo one small 1 1 1 foo one large 2 2 2 foo one large 8 3 3 foo two small 3 4 4 foo two small 3 5 5 bar one large 4 6 6 bar one small 5 7 7 bar two small 6 8 8 bar two large 7
import pandas as pd
df = pd. read_csv( 'data.csv' , index_col= 0 )
df
A B C D 0 foo one small 1 1 foo one large 2 2 foo one large 8 3 foo two small 3 4 foo two small 3 5 bar one large 4 6 bar one small 5 7 bar two small 6 8 bar two large 7
import pandas as pd
df = pd. read_excel( 'data.xlsx' )
df
A B C D 0 foo one small 1 1 foo one large 2 2 foo one large 8 3 foo two small 3 4 foo two small 3 5 bar one large 4 6 bar one small 5 7 bar two small 6 8 bar two large 7
import pandas as pd
df = pd. read_table( 'data.txt' )
df
,A,B,C,D 0 0,foo,one,small,1 1 1,foo,one,large,2 2 2,foo,one,large,8 3 3,foo,two,small,3 4 4,foo,two,small,3 5 5,bar,one,large,4 6 6,bar,one,small,5 7 7,bar,two,small,6 8 8,bar,two,large,7
import pandas as pd
df = pd. read_table( 'data.txt' , index_col= 0 )
df
,A,B,C,D 0,foo,one,small,1 1,foo,one,large,2 2,foo,one,large,8 3,foo,two,small,3 4,foo,two,small,3 5,bar,one,large,4 6,bar,one,small,5 7,bar,two,small,6 8,bar,two,large,7
import pandas as pd
df = pd. read_table( 'data.txt' , sep = ',' , index_col= 0 )
df
A B C D 0 foo one small 1 1 foo one large 2 2 foo one large 8 3 foo two small 3 4 foo two small 3 5 bar one large 4 6 bar one small 5 7 bar two small 6 8 bar two large 7
df. to_excel( 'dat.xlsx' )
df. to_csv( 'dat.csv' )
df. to_csv( 'dat.txt' )
读取复杂格式的 txt
skiprows 表示有几行跳过不读 header=None 表示每一个数据每一列的名字不存在与数据之中,从0开始数 names 表示自己命名 nrows=5 表示五个五个读
df = pd. read_table( 'data1.txt' , sep = ',' )
df
# real data #num1 num2 num3 num4 message # good data NaN NaN NaN NaN # csv file NaN NaN NaN NaN 1 2 3 4 hello 5 6 7 8 world 9 10 11 12 hello 1 2 3 4 hello 5 6 7 8 good 9 10 11 12 fine
df = pd. read_table( 'data1.txt' , sep = ',' , skiprows = [ 0 , 1 , 2 , 3 ] , header = None , names = [ 'n1' , 'n2' , 'n3' , 'n4' , 'message' ] , index_col = [ 'message' ] , nrows = 5 )
df
n1 n2 n3 n4 message hello 1 2 3 4 world 5 6 7 8 hello 9 10 11 12 hello 1 2 3 4 good 5 6 7 8
df = pd. read_table( 'data1.txt' , sep = ',' , skiprows = [ 0 , 2 , 3 ] , index_col = [ 'message' ] , nrows = 3 )
df
#num1 num2 num3 num4 message hello 1 2 3 4 world 5 6 7 8 hello 9 10 11 12
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_table.html
这个网页里介绍了 pandas.read_table 如果遇到不会的函数时,可以去找这些函数的说明文档查看。