项目实训第三周工作(1)

项目实训题目:饮食健康管理系统设计与实现

我的工作:数据清洗

具体工作事项:数据清洗;错误数据处理;图片大小归一化;图片存放位置归并;与数据库同学沟通;不对应(冗余或缺失)图片的删除与处理等。

本博客旨在记录工作内容。


 

# 找200个优质用户
import numpy as np
intact=np.load("MeishiChina_interaction_data_train_afterdelete_new.npy")
print(intact)
intactlie = intact[:,0]
# print(intactlie)
ids = intactlie
di = {}
for i in ids:
    if i not in di.keys():
        di[i]=1
    else:
        di[i]+=1
values = list(di.values())
keys = list(di.keys())
# print(values)
# print(keys)
print([keys[i] for i,j in enumerate(values) if j==10])
goodman = [keys[i] for i,j in enumerate(values) if j==10]
new_gm = []
for i in range(200):
 print(goodman[i])
 new_gm.append(goodman[i])
print(new_gm)
np.save(r"new_gm", new_gm)
print(goodman)
np.save(r"goodman.npy", goodman)
intactlie = intact[:,0]
print(intactlie)
from collections import Counter
mycount = Counter(intactlie)
print(mycount)


根据他们的index从2中找原来的index
import pandas as pd
import numpy as np
goodman = pd.read_csv('100用户+食谱+url.csv')
userindex = pd.read_csv('100用户+食谱+url.csv')
print(goodman)
print(userindex)
print(len(userindex))
intgm = list(map(int,goodman))
interafter = {}
for i in list(range(0,200)):
    j = intgm[i]
    interafter[i] = userindex[j]
print(interafter)
# text = {}
# text = userindex[449]
# print(text)
np.save(r"interafter",interafter)
# 找到改变的索引
# import numpy as np
# goodman = np.load("new_gm.npy")
# recipeindex = np.load("recipe_index_list_new.npy ")
# # print(goodman)
# # print(recipeindex)
# intgm = list(map(int,goodman))
# interact=np.load("MeishiChina_interaction_data_train_afterdelete_new.npy")
# print(interact)
# interact2 = {}
# interact3 = {}
# u = 0
# for i in list(range(0,200)):
#     for j in list(range(len(interact))):
#         if intgm[i] == interact[j,0]:
#             interact2[u] = interact[j, 0]
#             interact3[u] = interact[j, 1]
#             u = u+1
# print(interact2)
# np.save(r"interact2",interact2)
# np.set_printoptions(suppress=True)
# print(interact2)
# print(interact3)
# print(interact2.values())
# print(interact3.values())
# values = list(interact3.values())
# values1 = list(interact2.values())
# print(values)
# print(values1)
# rindexafter = np.load("recipe_index_list_new.npy ")
# print(rindexafter)
# print(len(rindexafter))
# values = list(map(int,values))
# test = {}
# for i in list(range(0,2000)):
#     j = values[i]
#     test[i] = rindexafter[j]
# print(test)
# values2 = list(test.values())
# print(values2)
# import pandas as pd
# data1 = pd.DataFrame(values)
# data1.to_csv('values.csv')
# data3 = pd.DataFrame(values1)
# data3.to_csv('values1.csv')
# data2 = pd.DataFrame(values2)
# data2.to_csv('values2.csv')
#
# import pandas as pd
# item = pd.read_csv('5.csv')
# id = pd.read_csv('values2++.csv')
# for i in id['recipeID'].values:
#     print(str(item['recipeID'].values.item(i)) + "," + str(item['url'].values.item(i)))

# import pandas as pd
# item = pd.read_csv('4.csv')
# id = pd.read_csv('values1+.csv')
# for i in id['userID'].values:
#     print(str(item['userID'].values.item(i)) + "," + str(item['url'].values.item(i)))
import shutil
import pandas as pd

# frame = pd.read_csv('terminal.csv', engine='python',encoding='utf-8-sig')
# data = frame.drop_duplicates(subset='url', keep='first', inplace=False)
# data.to_csv('terminal2.csv', encoding='utf8', index='url')

以上代码分别是找到200个优质用户,找到他们原来的索引,找到改变的索引。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值