数据分析4csv文件的处理

最新推荐文章于 2022-12-02 19:07:01 发布

强仔fight

最新推荐文章于 2022-12-02 19:07:01 发布

阅读量595

点赞数

分类专栏：数据分析

本文链接：https://blog.csdn.net/qq_35076836/article/details/88125454

版权

数据分析专栏收录该内容

11 篇文章 0 订阅

订阅专栏

1.读写csv文件

f1 = open(r'D:\hhh.csv', 'r')
f2 = open(r'D:\111.csv','w')
ttt = f1.readline()   //读入第一行标题行 将其作为字符串并赋给名为header的变量
ttt = ttt.strip()  //使用strip函数去掉字符串两端的空格，制表符和换行符
ttt = ttt.split('，') //使用split函数将字符串用逗号拆分成列表
print(ttt)
for row in f1:
    row = row.strip()
    row_list = row.split(',')
    print(row_list)
    f2.write(','.join(map(str,row_list))+'\n')  //map函数将str函数应用于header_list中的每个元素
    //确保每个元素都是字符串。join函数在header_list中的每个值之间插入一个逗号，将列表转换为一个字符串
f1.close()
f2.close()

2.筛选行
①行中的值满足某个条件
基础python写法:

import csv
f1 = open('url1', 'r')
f2 = open('url2', 'w')
file_input = csv.reader(f1)
file_output = csv.writer(f2)
header = next(file_input)       //使用csv模块的next函数读出输入文件的第一行 ，
                                                 赋给名为header的列表变量        
file_output .writerow(header)     //将标题行写入输出文件
for row_list in file_input:
    supplier = str(row_list[0]).strip()   //取出每行供应商名字，赋给名为supplier的变量
    cost = str(row_list[3]).strip('$').replace(',', '') //取出每行数据的成本,赋给名为cost的变量
    if supplier == 'Supplier Z' or float(cost)>600.0:    //检验每行中的这两个数据是否满足条件
        file_output.writerow(row_list)                  //将满足条件的行写入输出文件
f1.close()
f2.close()

pandas写法:

data_frame = pd.read_csv(f1)
data_frame['Cost'] = data_frame['Cost'].str.strip('$').astype(float)
data_frame_meet_condition = data_frame.loc[(data_frame['Supplier Name'].str.contains('Z')) | (data_frame['Cost']>600.0),:]
  //loc函数可以同时选择特定的行与列。在逗号前面设定行筛选条件，逗号后面设定列筛选条件。
data_frame_meet_condition.to_csv(f2,index=False)

②行中的值属于某个集合
基础python写法:

import csv
f1 = open('url1', 'r')
f2 = open('url2', 'w')
file_input = csv.reader(f1)
file_output = csv.writer(f2)
dates = ['1/20/14', '1/21/14']  //创建一个列表变量,其中包含两个特定日期
header = next('file_input')
file_output.writerow(header)
for row_list in file_input:
    date = row_list[4]            //取出每行的日期
    if date in dates:              //检验日期是否属于特定日期
        file_output.writerow(row_list)             //将满足条件的行写入输出文件
f1.close()
f2.close()

pandas写法:

data_frame = pd.read_csv(f1)
data_frame_value_in_set = data_frame.loc[data_frame['Purchase Date'].isin(dates),:]   
               //使用简洁的isin函数
data_frame_value_in_set.to_csv(f2,index=False)

③行中的值匹配于某个模式/正则表达式
基础python写法：

import csv
import re
f1 = open('url1', 'r')
f2 = open('url2', 'w')
file_input = csv.reader(f1)
file_output = csv.writer(f2)
pattern = re.complie(r'(?P<my_pattern>^001-.*)',re.I)
         //使用re模块的compile函数创建一个名为pattern的正则表达式变量
header = next(file_input)
file_output.writerow(header)
for row_list in file_input:
    number = row_list[1]         //取出每行编号
    if pattern.search(number):       //验证标号是否满足正则表达式
        file_output.writerow(row_list)
f1.close()
f2.close()

pandas写法：

data_frame = pd.read_csv(f1)
data_frame_value_matches_pattern = data_frame.loc[data_frame['Invoice Number'].str.startswith("001-"),:]   
               //使用startswith函数来搜取数据
data_frame_value_matches_pattern.to_csv(f2,index=False)

强仔fight

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
数据分析4csv文件的处理

f1 = open(r'D:\hhh.csv', 'r')f2 = open(r'D:\111.csv','w')ttt = f1.readline()ttt = ttt.strip()ttt = ttt.split('，')print(ttt)for row in f1: row = row.strip() row_list = row.split(',') ...
复制链接

扫一扫

专栏目录