数据处理任务(新年礼物)
此次任务总处理101227行数据,样例如下:
18 Jogging 102271561469000 -13.53 16.89 -6.4
18 Jogging 102271641608000 -5.75 16.89 -0.46
18 Jogging 102271681617000 -2.18 16.32 11.07
18 Jogging 3.36
18 Downstairs 103260201636000 -4.44 7.06 1.95
18 Downstairs 103260241614000 -3.87 7.55 3.3
18 Downstairs 103260321693000 -4.06 8.08 4.79
18 Downstairs 103260365577000 -6.32 8.66 4.94
18 Downstairs 103260403083000 -5.37 11.22 3.06
18 Downstairs 103260443305000 -5.79 9.92 2.53
6 Walking 0 0 0 3.214402
Step 1
将数据集中所有信息异常的行删除。
比如上面的样例中第4行数据只有3个元素,而其他行都有6个元素,所以第4行是信息异常的行,将其删除。再如第12行数据的第3个元素明显也是有问题的,所以它也是信息异常的行,将其删除。
数据集中可能还会存在一些其他异常。
将全部信息处理之后,每行的元素以逗号为分隔符,写入文件test1
。
文件test1
共100471行,样例如下:
6,Walking,23445542281000,-0.72,9.62,0.14982383
6,Walking,23445592299000,-4.02,11.03,3.445948
6,Walking,23470662276000,0.95,14.71,3.636633
...
先将数据文件保存在桌面,然后将其转换为csv文件
import csv
input_file = "C:/Users/Peng Runhao/Desktop/OriginalData.txt"
output_file = "C:/Users/Peng Runhao/Desktop/OriginalData.csv"
csv_row = []
with open(input_file,'r',newline='') as txt_in_file:
with open(output_file,'w',newline='') as csv_out_file:
filewriter = csv.writer(csv_out_file,delimiter=',')
for row in txt_in_file:
row_list = row.strip().split()
filewriter.writerow(row_list)
观察数据可知,每行数据正常情况下为6列,当列数<6时为异常数据,且DE列数据等于0时为正常,C列数据等于0时为异常,故若B列数据为0时为异常数据
import csv
input_file = "C:/Users/Peng Runhao/Desktop/OriginalData.csv"
output_file = "C:/Users/Peng Runhao/Desktop/test1.csv"
with open(input_file,'r',newline='') as csv_in_file:
with open(output_file,'w',newline='') as csv_out_file:
filereader = csv.reader(csv_in_file)
filewriter = csv.writer(csv_out_file)
for row in filereader:
if len(row) == 6 and int(row[2]) != 0:
filewriter.writerow(row)
Step2
统计文件test1
的数据中所有动作的数目并打印到屏幕,然后将动作数目对100取整后写入test2
文件,多余的信息行抛弃。
比如统计出Jogging
的数量为3021
次,则在屏幕上打印Movement: Jogging Amount: 3021
,然后将前3000行信息写入test2
文件。
文件test2
共100200行。
import csv
input_file = "C:/Users/Peng Runhao/Desktop/test1.csv"
output_file = "C:/Users/Peng Runhao/Desktop/test2.csv"
poslist = ['Walking','Jogging','Upstairs','Downstairs','Standing','Sitting']
list_0 = []
list_1 = []
list_2 = []
list_3 = []
list_4 = []
list_5 = []
Walking = Jogging = Upstairs = Downstairs = Standing = Sitting = 0
with open(input_file,'r',newline='') as csv_in_file:
with open(output_file,'w',newline='') as csv_out_file:
filereader = csv.reader(csv_in_file)
filewriter = csv.writer(csv_out_file)
for row_list in filereader:
if str(row_list[1]) == 'Walking':
list_0.append(row_list)
Walking += 1
elif str(row_list[1]) == 'Jogging':
Jogging += 1
list_1.append(row_list)
elif str(row_list[1]) == 'Upstairs':
Upstairs += 1
list_2.append(row_list)
elif str(row_list[1]) == 'Downstairs':
Downstairs += 1
list_3.append(row_list)
elif str(row_list[1]) == 'Standing':
Standing += 1
list_4.append(row_list)
else:
Sitting += 1
list_5.append(row_list)
print("Movement: {0:s} Amount: {1:d}".format(poslist[0], Walking))
print("Movement: {0:s} Amount: {1:d}".format(poslist[1], Jogging))
print("Movement: {0:s} Amount: {1:d}".format(poslist[2], Upstairs))
print("Movement: {0:s} Amount: {1:d}".format(poslist[3], Downstairs))
print("Movement: {0:s} Amount: {1:d}".format(poslist[4], Standing))
print("Movement: {0:s} Amount: {1:d}".format(poslist[5], Sitting))
list_0_changed_maxvalue = (int(len(list_0)) // 100) * 100
list_1_changed_maxvalue = (int(len(list_1)) // 100) * 100
list_2_changed_maxvalue = (int(len(list_2)) // 100) * 100
list_3_changed_maxvalue = (int(len(list_3)) // 100) * 100
list_4_changed_maxvalue = (int(len(list_4)) // 100) * 100
list_5_changed_maxvalue = (int(len(list_5)) // 100) * 100
for index_value in range(list_0_changed_maxvalue):
filewriter.writerow(list_0[index_value])
for index_value in range(list_1_changed_maxvalue):
filewriter.writerow(list_1[index_value])
for index_value in range(list_2_changed_maxvalue):
filewriter.writerow(list_2[index_value])
for index_value in range(list_3_changed_maxvalue):
filewriter.writerow(list_3[index_value])
for index_value in range(list_4_changed_maxvalue):
filewriter.writerow(list_4[index_value])
for index_value in range(list_5_changed_maxvalue):
filewriter.writerow(list_5[index_value])
Step 3
读取文件test2
的数据,取每行的后3列元素,以空格为分隔符写入文件test3
。
文件test3
共100200行,样例如下:
-0.72 9.62 0.14982383
-4.02 11.03 3.445948
0.95 14.71 3.636633
...
import csv
input_file = "C:/Users/Peng Runhao/Desktop/test2.csv"
output_file = "C:/Users/Peng Runhao/Desktop/test3.csv"
with open(input_file,'r',newline='') as csv_in_file:
with open(output_file,'w',newline='') as csv_out_file:
filereader = csv.reader(csv_in_file)
filewriter = csv.writer(csv_out_file, delimiter=' ')
for row_list in filereader:
row_changed_list = []
for index_value in range(3,6):
row_changed_list.append(row_list[index_value])
filewriter.writerow(row_changed_list)
Step 4
读取文件test3
的数据,每行数据为一组,每组组内的元素以空格为分隔符,组与组之间的数据以逗号为分隔符,每20组元素为一行,写入文件finally
。
文件finally
共5010行,样例如下:
-0.72 9.62 0.14982383,-4.02 11.03 3.445948,0.95 14.71 3.636633,-3.57 5.75 -5.407278,-5.28 8.85 -9.615966,-1.14 15.02 -3.8681788,7.86 11.22 -1.879608,6.28 4.9 -2.3018389,0.95 7.06 -3.445948,-1.61 9.7 0.23154591,6.44 12.18 -0.7627395,5.83 12.07 -0.53119355,7.21 12.41 0.3405087,6.17 12.53 -6.701211,-1.08 17.54 -6.701211,-1.69 16.78 3.214402,-2.3 8.12 -3.486809,-2.91 0 -4.7535014,-2.91 0 -4.7535014,-4.44 1.84 -2.8330324
```python
import csv
input_file = "C:/Users/Peng Runhao/Desktop/test3.csv"
output_file = "C:/Users/Peng Runhao/Desktop/finally.csv"
num = 0
s = ' '
list = []
rows_20 = []
with open(input_file,'r',newline='') as csv_in_file:
with open(output_file,'w',newline='') as csv_out_file:
filereader = csv.reader(csv_in_file)
filewriter = csv.writer(csv_out_file)
for line in filereader:
num += 1
rows_20.append(s.join(line))
if num == 20:
filewriter.writerow(rows_20)
num = 0
rows_20 = []
学如逆水行舟,不日进,则日退。 加油!