python汉字转化为二进制,在python中将字典转换为二进制

I have a dictionary with keys as my customer ID and values as my movie id. Though the customer has watched the same movie many times, I want it to make as one.

Here I need to convert my dictionary to binary data.

In all the rows I need the customers ID's and columns as movie id's, where if the customer has watched the movie, it gives 1 else 0.

d = {'121212121' : 111, 222, 333, 333,444, 444, '212121212' : 222, 555, 555, 666, '212123322' : 555, 666, 666, 666, 777}

Desired output :

customer ID 111 222 333 444 555 666 777

121212121 1 1 1 1 0 0 0

212121212 0 1 0 0 1 1 0

121323231 0 0 0 0 1 1 1

I have tried using count vectorizer()

code :

cv = CountVectorizer()

movies = cv.fit_transform(cust['movies_list'])

cols = cv.vocabulary_

movies_ = pd.DataFrame(movies.toarray(), columns = cols, index =

cust['customer_id'])

movies_

output :

customer ID 111 222 333 444 555 666 777

212121212 1 1 2 2 0 0 0

121212121 0 1 0 0 2 1 0

121323231 0 0 0 0 1 3 1

The customer Id's dint match and I got a count on how many times he watched the movie.

解决方案

It looks like you can just use clip_upper to clip positive values to 1.

movies_.clip_upper(1)

111 222 333 444 555 666 777

121212121 1 1 1 1 0 0 0

212121212 0 1 0 0 1 1 0

212123322 0 0 0 0 1 1 1

Here's an alternative solution starting with d. You can use pd.get_dummies, followed by clip_upper.

import pandas as pd

df = pd.concat([

pd.Series(v, name=k).astype(str) for k, v in d.items() # `d` is your dict

],

axis=1

)

pd.get_dummies(df.stack()).sum(level=1).clip_upper(1)

111 222 333 444 555 666 777

121212121 1 1 1 1 0 0 0

212121212 0 1 0 0 1 1 0

212123322 0 0 0 0 1 1 1

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值