问题:在进行数据分析的过程中我们常常需要将值转换为属性或者特征,例如:
我写了段python代码做这方面的转换:
# encoding=utf-8
import pandas as pd
cname = ["user_id","mobile","exam_id","org_id","q_id","question","a_id",
"answer","a_option","answer_time","answer_timestamp","fraud_score","a_sort","page_num"]
quesid = ["A001","A002","A003","A004","A005","A006",
"A007","A008","A009","A010"]
baddf = pd.read_csv("D:/gooddata.csv",header=0,names=cname)
uniqueuser=baddf["user_id"].unique()
#全局list
finallist = []
for row in uniqueuser:
totallist = []
tempdf = baddf[baddf["user_id"] == row]
print tempdf
newdf = tempdf.loc[:, ['q_id', 'answer']]
timedf = tempdf.loc[:, ['q_id', 'answer_time']]
#局部list
answerlist = []
#######################用户名
answerlist.append(row)
#######################答题做为属性
for qrow in quesid:
avalue = newdf[newdf["q_id"] == qrow]
if list(avalue['answer']):
answerlist.append(list(avalue['answer']).pop())
else:
answerlist.append("没有答题")
totallist.extend(answerlist)
finallist.append(totallist)
for i in finallist:
print i
#组装成数据框
badfinal=pd.DataFrame(finallist)
badfinal.to_csv("D:/badfinal.csv",index=False,sep=',',encoding='utf-8')
效果图如下:
转换成
有任何问题想跟我交流,请加qq群636866908(Python&大数据)与我联系,或者加qq群456726635(R语言&大数据分析)也可。