寒假csv数据处理任务
一个简单的数据处理任务
转眼间,寒假已经过半,前三周的任务大家都完成了吗?正所谓“寒假与你错过的知识终究会在组内考核时重逢”,上一次开组会很多同学反映看书的时候什么都懂,一到自己写代码的时候就忧心忡忡。为了解决大家日益增长的知识需求和止步不前的学习成果之间的矛盾,同时也是为了丰富大家充(wu)实(liao)的业余生活,现特别推出新年大礼包,给大家布置一个简单的数据处理任务。
时间安排
Begin: 2021-2-10 (Wed.) CST
End: 2021-2-21 (Sun.) CST
任务描述
本次任务要处理的数据共101227行,样例如下:
18 Jogging 102271561469000 -13.53 16.89 -6.4
18 Jogging 102271641608000 -5.75 16.89 -0.46
18 Jogging 102271681617000 -2.18 16.32 11.07
18 Jogging 3.36
18 Downstairs 103260201636000 -4.44 7.06 1.95
18 Downstairs 103260241614000 -3.87 7.55 3.3
18 Downstairs 103260321693000 -4.06 8.08 4.79
18 Downstairs 103260365577000 -6.32 8.66 4.94
18 Downstairs 103260403083000 -5.37 11.22 3.06
18 Downstairs 103260443305000 -5.79 9.92 2.53
6 Walking 0 0 0 3.214402
Step 1
将数据集中所有信息异常的行删除。
比如上面的样例中第4行数据只有3个元素,而其他行都有6个元素,所以第4行是信息异常的行,将其删除。再如第12行数据的第3个元素明显也是有问题的,所以它也是信息异常的行,将其删除。
数据集中可能还会存在一些其他异常。
将全部信息处理之后,每行的元素以逗号为分隔符,写入文件test1
。
文件test1
共100471行,样例如下:
6,Walking,23445542281000,-0.72,9.62,0.14982383
6,Walking,23445592299000,-4.02,11.03,3.445948
6,Walking,23470662276000,0.95,14.71,3.636633
...
Step 2
统计文件test1
的数据中所有动作的数目并打印到屏幕,然后将动作数目对100取整后写入test2
文件,多余的信息行抛弃。比如统计出Jogging
的数量为3021
次,则在屏幕上打印Movement: Jogging Amount: 3021
,然后将前3000行信息写入test2
文件。
文件test2
共100200行。
Step 3
读取文件test2
的数据,取每行的后3列元素,以空格为分隔符写入文件test3
。
文件test3
共100200行,样例如下:
-0.72 9.62 0.14982383
-4.02 11.03 3.445948
0.95 14.71 3.636633
...
Step 4
读取文件test3
的数据,每行数据为一组,每组组内的元素以空格为分隔符,组与组之间的数据以逗号为分隔符,每20组元素为一行,写入文件finally
。
文件finally
共5010行,样例如下:
-0.72 9.62 0.14982383,-4.02 11.03 3.445948,0.95 14.71 3.636633,-3.57 5.75 -5.407278,-5.28 8.85 -9.615966,-1.14 15.02 -3.8681788,7.86 11.22 -1.879608,6.28 4.9 -2.3018389,0.95 7.06 -3.445948,-1.61 9.7 0.23154591,6.44 12.18 -0.7627395,5.83 12.07 -0.53119355,7.21 12.41 0.3405087,6.17 12.53 -6.701211,-1.08 17.54 -6.701211,-1.69 16.78 3.214402,-2.3 8.12 -3.486809,-2.91 0 -4.7535014,-2.91 0 -4.7535014,-4.44 1.84 -2.8330324
验收内容
-
4个
*.py
文件test1.py
test2.py
test3.py
finally.py
-
4个运行Python脚本后生成的文件
test1
test2
test3
finally
注:如果对本次任务有疑问,请及时询问19级的学长和学姐。
祝大家春节快乐!
我的处理如下
test1.py
import csv
import sys
input_file = sys.argv[1]
output_file = sys.argv[2]
with open(input_file,'r',newline='') as csv_in_file:
with open(output_file,'w',newline='') as csv_out_file:
filereader = csv.reader(csv_in_file, delimiter=' ')
filewriter = csv.writer(csv_out_file, delimiter=',')
for row_list in filereader:
length = len(row_list)
if length == 6 and int(row_list[2]) != 0:
filewriter.writerow(row_list)
把异常数据进行处理
test2.py
import csv
import sys
Walking_1 = 0
Jogging_1 = 0
Upstairs_1 =0
Downstairs_1 =0
Standing_1 =0
Sitting_1 =0
Walking_2 = 0
Jogging_2 = 0
Upstairs_2 =0
Downstairs_2 =0
Standing_2 =0
Sitting_2 =0
input_file = sys.argv[1]
output_file = sys.argv[2]
with open(input_file, 'r', newline='') as csv_in_file:
with open(output_file, 'w', newline='') as csv_out_file:
filereader = csv.reader(csv_in_file)
filewriter = csv.writer(csv_out_file)
for row_list in filereader:
if row_list[1] == 'Walking':
Walking_1+=1
if row_list[1] == 'Jogging':
Jogging_1+=1
if row_list[1] == 'Upstairs':
Upstairs_1+=1
if row_list[1] == 'Downstairs':
Downstairs_1+=1
if row_list[1] == 'Standing':
Standing_1+=1
if row_list[1] == 'Sitting':
Sitting_1+=1
print('Movement:Walking Amount:')
print(Walking_1)
print('Movement:Jogging Amount:')
print(Jogging_1)
print('Movement:Upstairs Amount:')
print(Upstairs_1)
print('Movement:Downstairs Amount:')
print(Downstairs_1)
print('Movement:Standing Amount:')
print(Standing_1)
print('Movement:Sitting Amount:')
print(Sitting_1)
Walking_1 = Walking_1 - Walking_1%100
Jogging_1 = Jogging_1 - Jogging_1%100
Upstairs_1 = Upstairs_1 - Upstairs_1%100
Downstairs_1 = Downstairs_1 - Downstairs_1%100
Standing_1 = Standing_1 - Standing_1%100
Sitting_1 = Sitting_1 - Sitting_1%100
print(Sitting_1)
print(Jogging_1)
print(Upstairs_1)
print(Downstairs_1)
print(Standing_1)
print(Walking_1)
with open(input_file, 'r', newline='') as csv_in_file:
with open(output_file, 'w', newline='') as csv_out_file:
filereader = csv.reader(csv_in_file)
filewriter = csv.writer(csv_out_file)
for row_list in filereader:
if row_list[1] == 'Jogging' and Jogging_2< Jogging_1:
Jogging_2 +=1
filewriter.writerow(row_list)
if row_list[1] == 'Upstairs' and Upstairs_2< Upstairs_1:
Upstairs_2 +=1
filewriter.writerow(row_list)
if row_list[1] == 'Downstairs' and Downstairs_2 < Downstairs_1:
Downstairs_2 +=1
filewriter.writerow(row_list)
if row_list[1] == 'Standing' and Standing_2 < Standing_1:
Standing_2 +=1
filewriter.writerow(row_list)
if row_list[1] == 'Sitting' and Sitting_2 < Sitting_1:
Sitting_2 +=1
filewriter.writerow(row_list)
if row_list[1] == 'Walking' and Walking_2 < Walking_1:
Walking_2 +=1
print(Downstairs_2)
filewriter.writerow(row_list)
test3.py
import csv
import sys
input_file = sys.argv[1]
output_file = sys.argv[2]
with open(input_file, 'r', newline='') as csv_in_file:
with open(output_file, 'w', newline='') as csv_out_file:
filereader = csv.reader(csv_in_file, delimiter=',')
filewriter = csv.writer(csv_out_file, delimiter=' ')
for row_list in filereader:
filewriter.writerow(row_list[3:6])
test4.py
import csv
import sys
input_file = sys.argv[1]
output_file = sys.argv[2]
with open(input_file, 'r', newline='') as csv_in_file:
with open(output_file, 'w', newline='') as csv_out_file:
filereader = csv.reader(csv_in_file)
filewriter = csv.writer(csv_out_file)
total_date= []
new_list = []
new_row = []
i = 1
for row_list in filereader:
total_date.append(row_list[0])
for data in total_date:
if i != 20:
i += 1
new_list.append(data)
else:
new_list.append(data)
new_row.append(new_list)
i = 1
new_list = []
for row in new_row:
filewriter.writerow(row)
常见的问题是处理列表与字符串形式的转换