Python for NB/BGD
在本系列的第四部分,我们将使用Python来实现这个模型。输入数据采用开放数据集Online Retail Dataset
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
数据清洗
data = pd.read_excel('Online Retail.xlsx')
data.info()
<class ‘pandas.core.frame.DataFrame’>
RangeIndex: 541909 entries, 0 to 541908
Data columns (total 8 columns):
InvoiceNo 541909 non-null object
StockCode 541909 non-null object
Description 540455 non-null object
Quantity 541909 non-null int64
InvoiceDate 541909 non-null datetime64[ns]
UnitPrice 541909 non-null float64
CustomerID 406829 non-null float64
Country 541909 non-null object
dtypes: datetime64ns, float64(2), int64(1), object(4)
memory usage: 33.1+ MB
data.InvoiceDate.min()
Timestamp(‘2010-12-01 08:26:00’)
last_day = data.InvoiceDate.max()
last_day
Timestamp(‘2011-12-09 12:50:00’)
data.dropna(inplace=True)
data[["CustomerID"]] = data[["CustomerID"]].apply(pd.to_numeric)
data.set_index('CustomerID',