python读csv-python读写csv文件

最新推荐文章于 2024-02-25 16:22:09 发布

weixin_37988176

最新推荐文章于 2024-02-25 16:22:09 发布

阅读量263

点赞数

1用python做数据分析，数据来源一般会有多种格式，而我喜欢以CSV的方式进行存储分析，如果数据本身就是CSV格式的那我可以直接用pandas库自带的方法进行读取即可，但是有时候各种来源的数据不方便用pandas进行直接分析处理，所以就有了一个通用的读写CSV文件的方法。

#1导入相关包

importosimportreimportcsv#1读取csv文件

def read_csv(filename, header=False):

res=[]

with open(filename) as f:

f_csv=csv.reader(f)if header:#默认读取头部文件

headers =next(f_csv)

header=Falsefor row inf_csv:

res.append(row)returnres#2写入csv文件

defwrite_csv(data, filename):

with open(filename,"wb") as f:

f_csv=csv.writer(f)#一行一行写入

for item indata:

f_csv.writerow(item)

2.有时候文件是txt或者从hive等数据库导出来的格式则可以用下面的方法读取数据并进行分析

#3读取文本格式

defread_text(filename, columns, delimeter):#columns：多少列

#delimeter:分隔符

res =[]

with open(filename,"rb") as f:whileTrue:

line=f.readline()ifline:

line= re.sub("[ ]", "", line)#清楚换行符

lines =line.split(delimeter)if len(lines) !=columns:continueres.append(lines)else:break

return res

3.也可以用numpy直接进行读取文件格式：loadtxt(fname, dtype=float,delimiter=None,skiprows=0, usecols=None, unpack=False)fname:文件名，dtype:数据类型，也可使是str，delimiter：分隔符，skiprows:跳过开头几行，usecols:读取某一列或者几列的值比如(0,3)表示读取第一和四列的值。

例子：npload.txt

409208.3269760.9539523qwe

144887.1534691.6739042aad

260521.4418710.8051241zc

7513613.1473940.4289641wed

importnumpy as np

filename= "E:/PythonProject/CommonFunction/input/npload.txt"res= np.loadtxt(filename,dtype=str,delimiter=" ",skiprows=1,usecols=(0,3,4),unpack=False)printtype(res)printres

x,y,z= np.loadtxt(filename,dtype=str,delimiter=" ",skiprows=1,usecols=(0,3,4),unpack=True)print x#第一列

print y#第四列

print z#第五列

结果：

[['14488' '2' 'aad']

['26052' '1' 'zc']

['75136' '1' 'wed']]

['14488' '26052' '75136']

['2' '1' '1']

['aad' 'zc' 'wed']

4.pandas也是一个强大的数据分析工具，直接读取csv,excel文件，或者吧pandas的DataFrame直接存储为csv或者excel格式：例如把上面的数据可以通过write_csv()方法存储为csv格式，然后可以直接用pandas读取。

pd.read_csv(filename, header=None, index_col=0, usecols=(1,2,3), skiprows=0)参数和np.loadtxt()参数解释基本是一样的。read_excel(io,sheet_name=0,header=0,index_col=None,usecols=None,dtype=None,skiprows=None)常用的参数解释都是一样的。相应的存储方法则是to_csv()和to_excel()

importpandas as pd

filename="E:/PythonProject/CommonFunction/input/npload.csv"df= pd.read_csv(filename, header=None, index_col=0, usecols=(1,2,3), skiprows=0)print df.head()

结果：

2 3

8.326976 0.953952 3

7.153469 1.673904 2

1.441871 0.805124 1

13.147394 0.428964 1

5.用的着的小技巧：pandas的pivot方法和numpy的permutation

df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',

...'two'],

...'bar': ['A', 'B', 'C', 'A', 'B', 'C'],

...'baz': [1, 2, 3, 4, 5, 6],

...'zoo': ['x', 'y', 'z', 'q', 'w', 't']})printdf

df.pivot(index='foo', columns='bar', values='baz')#以foo为索引列，以bar列为行，zoo列为值，注（foo, bar）不能重复

结果

bar baz foo zoo

0 A 1 one x

1 B 2 one y

2 C 3 one z

3 A 4 two q

4 B 5 two w

5 C 6 two t

bar A B C

foo

one 1 2 3

two 4 5 6

np.random.seed(0)print np.random.permutation(10)#将0-9随机打乱，可用于随机取数据集

x = range(10)

np.random.shuffle(x)#洗牌，参数需要是一个可迭代的对象

print x

[2 8 4 9 1 6 7 3 0 5]

[3, 5, 1, 2, 9, 8, 0, 6, 7, 4]

注：当然也可以用sklearn的train_test_split方法分割数据集

from sklearn.model_selection import train_test_split

train_set,test_set = train_test_split(SampleData, test_size=0.2, random_state=42)

如果数据集含有标签，可以和标签一块分割 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

weixin_37988176

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python读csv-python读写csv文件

1用python做数据分析，数据来源一般会有多种格式，而我喜欢以CSV的方式进行存储分析，如果数据本身就是CSV格式的那我可以直接用pandas库自带的方法进行读取即可，但是有时候各种来源的数据不方便用pandas进行直接分析处理，所以就有了一个通用的读写CSV文件的方法。#1导入相关包importosimportreimportcsv#1读取csv文件def read_csv(filename,...
复制链接

扫一扫