1.基础python
vi 5csv_reader_value_matches_pattern.py
#encoding=utf-8
#!/usr/bin/env python3
import csv
import re
import sys
input_file=sys.argv[1]
output_file=sys.argv[2]
pattern=re.compile(r'(?P<my_pattern_group>^001-.*)',re.I) #保留发票编号由:001-开头的行。
with open(input_file,'rb') as csv_in_file:
with open(output_file,'wb') as csv_out_file:
filereader=csv.reader(csv_in_file)
filewriter=csv.writer(csv_out_file)
header=next(filereader) #next函数取头部信息
filewriter.writerow(header) #头部信息写入输出文件
for row_list in filereader:
invoice_number=row_list[1] #第二列的值。
if pattern.search(invoice_number): #检查第二列的值是否有001开头的内容。
filewriter.writerow(row_list) #整行写入输出文件。
#这里要搜索的实际模式是:^001-.*
#re.I,告诉正则表达式大小写敏感的匹配。
#结果
[root@mysql51 python_scripts]# python 5csv_reader_value_matches_pattern.py supplier_data.csv 7output_csv.csv
[root@mysql51 python_scripts]#
[root@mysql51 python_scripts]#
[root@mysql51 python_scripts]# more 7output_csv.csv
Supplier Name,Invoice Number,Part Number,Cost,Purchase Date
Supplier x,001-1001,2341,$500 ,1/20/2014
Supplier x,001-1001,2341,$501 ,1/20/2014
Supplier x,001-1001,5467,$750 ,1/20/2014
Supplier x,001-1001,5467,$750 ,1/20/2014
2.pandas
vi pandas_value_matches_pattern.py
#!/usr/bin/env python3
import pandas as pd
import sys
input_file=sys.argv[1]
output_file=sys.argv[2]
data_frame=pd.read_csv(input_file)
data_frame_value_matches_pattern=data_frame.loc[data_frame['Invoice Number'].str.startswith("001-"),:]
data_frame_value_matches_pattern.to_csv(output_file,index=False)
#结果
python C:\Users\4201.HJSC\PycharmProjects\pythonProject\pandas_value_matches_pattern.py \
C:\Users\4201.HJSC\Desktop\Python_exercise\supplier_data.csv \
C:\Users\4201.HJSC\Desktop\Python_exercise\7output_csv.csv
more 7output_csv.csv
Supplier Name,Invoice Number,Part Number,Cost,Purchase Date
Supplier x,001-1001,2341,$500 ,1/20/2014
Supplier x,001-1001,2341,$501 ,1/20/2014
Supplier x,001-1001,5467,$750 ,1/20/2014
Supplier x,001-1001,5467,$750 ,1/20/2014
3.总结
pandas 的startswith 函数搜索:'Invoice Number'列中001-开头的行。