我正在学习创建一个用户-用户协同推荐系统,在这个系统中,我借助Python:MySql_Connector从MySql数据库读取隐式数据。有了购买数据,我试图创建一个user*item rating matrix,对于这个矩阵,我用pandas将行(700000行)旋转成列。对整个数据帧运行pivot时出现以下错误。在
“值错误:未堆叠的数据帧太大,导致int32溢出”import mysql.connector
import pandas as pd
import numpy as np
from mysql.connector import errorcode
def readData():
try:
mySQLConnection = mysql.connector.connect(host='localhost',
database='testdb',
user='user',
password='pwd')
cursor = mySQLConnection.cursor(prepared=True)
sql_select_query = """""" #Removed the select query
cursor.execute(sql_select_query)
record = cursor.fetchall()
return record
except mysql.connector.Error as error:
print("Failed to get record from database: {}".format(error))
finally:
# closing database connection.
if (mySQLConnection.is_connected()):
cursor.close()
mySQLConnection.close()
print("connection is closed")
data = readData()
df = pd.DataFrame(data,columns=['user_id','product_id','purchase_count'])
data_pivot = pd.pivot_table(df,index=['user_id'],columns=df['product_id'])
#print(data_pivot.to_string())
python_版本:3.6
操作系统:win7
内存:16gb
pandas_版本:0.24.2