1 数据处理
1.1 数据准备
import pandas as pd
from random import choice #数据填充用
import numpy as np
from sklearn.preprocessing import StandardScaler,MinMaxScaler#数据标准化用
from sklearn.cluster import KMeans #聚类分析建模用
from sklearn.manifold import TSNE #时间序列用
from sklearn.metrics import silhouette_score
from collections import Counter #词频统计用
from wordcloud import WordCloud #词云图绘制用
import snownlp #情感分析用
import statsmodels.api as sm #回归拟合用
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.ticker as ticker
plt.rcParams['font.sans-serif'] = ['SimHei'] #解决中文显示问题
plt.rcParams['axes.unicode_minus'] = False # 解决中文显示问题
import jieba
#引入mlxtend包进行关联关系挖掘
from mlxtend.frequent_patterns import apriori, association_rules
# 利用pandas库读取数据
neo9 = pd.read_excel("IQOONeo9.xlsx")
ace3 = pd.read_excel("一加Ace3.xlsx")
redmik70 = pd.read_excel("RedmiK70.xlsx")
1.2 数据清洗
#增加列"评论长度"用于数据清洗
neo9["评论长度"] = [len(_) for _ in neo9["评价内容"]]
ace3["评论长度"] = [len(_) for _ in ace3["评价内容"]]
redmik70["评论长度"] = [len(_) for _ in redmik70["评价内容"]]
# 对评论长度少于10的评论进行删除
print("neo9短评论数量为:"+str(len(neo9[neo9["评论长度"]<10])))
print("ace3短评论数量为:"+str(len(ace3[ace3["评论长度"]<10])))
print("redmik70短评论数量为:"+str(len(redmik70[redmik70["评论长度"]<10])))
#结果为0,则不删除
neo9.head()
neo9短评论数量为:0
ace3短评论数量为:0
redmik70短评论数量为:0
是否会员 | 评价内容 | 颜色 | 存储空间 | 点赞数 | 评论数 | 日期 | 地区 | 评论长度 | |
---|---|---|---|---|---|---|---|---|---|
0 | PLUS会员 | iQOO Neo9 设计时尚大气,金属质感边框搭配深邃的黑色背板,尽显高端品质。这款手机搭载... | 格斗黑 | 16GB+256GB | 105 | 18.0 | 2024-09-16 | 河南 | 217 |
1 | PLUS会员 | 高颜值,高品质,一看就很上档次,非常喜欢!vivoiQOONeo9性能卓越,运行流畅,拍照效... | 格斗黑 | 16GB+256GB | 55 | 2.0 | 2024-09-08 | 安徽 | 77 |
2 | PLUS会员 | 手机太棒了,两千出头的价位搭配骁龙8gen2的处理器,玩游戏的体验很好,游戏不卡顿的同时手机... | 格斗黑 | 12GB+256GB | 38 | 2.0 | 2024-09-15 | 广东 | 82 |
3 | PLUS会员 | 高颜值,高品质,一分钱一分货,材质外观和质量一看就很上档次,非常喜欢!vivoiQOONeo... | 格斗黑 | 12GB+256GB | 22 | 1.0 | 2024-09-07 | 安徽 | 93 |
4 | PLUS会员 | 外观不错,大小也很适合我\n外形外观:好\n屏幕音效:很舒服\n拍照效果:很不错\n运行速度... | 航海蓝 | 12GB+256GB | 5 | 0.0 | 2024-10-23 | 浙江 | 76 |
1.3 数据填充
#展示数据缺失情况
print('{:*^60}'.format('neo9的数据缺失情况如下'))
print(neo9.isnull().sum())
print('{:*^60}'.format('ace3的数据缺失情况如下'))
print(ace3.isnull().sum())
print('{:*^60}'.format('redmik70的数据缺失情况如下'))
print(redmik70.isnull().sum())
***********************neo9的数据缺失情况如下************************
是否会员 201
评价内容 0
颜色 0
存储空间 0
点赞数 0
评论数 29
日期 29
地区 29
评论长度 0
dtype: int64
***********************ace3的数据缺失情况如下************************
是否会员 273
评价内容 0
颜色 0
存储空间 0
日期 0
地区 0
点赞数 0
评论数 0
评论长度 0
dtype: int64
*********************redmik70的数据缺失情况如下**********************
是否会员 131
评价内容 0
颜色 60
存储空间 60
点赞数 0
评论数 0
日期 64
地区 64
评论长度 0
dtype: int64
可以看到neo9和redmi的颜色、存储空间、评论数、日期、地区都有为空的,ace3并没有空值
其中颜色、存储空间、日期、地区为分类属性,可通过随机选择其他非空值进行填充
评论数为连续属性,可通过选择平均值进行填充
注意:要保证同品牌数据进行填充
#编写分类属性填充函数
def fillna_fenlei(data):
#返回该列非空值的随机值
return choice(list(data.dropna().unique()))
neo9["日期"].fillna(fillna_fenlei(neo9["日期"]),inplace=True)
neo9["地区"].fillna(fillna_fenlei(neo9["地区"]),inplace=True)
neo9["评论数"].fillna(neo9["评论数"].mean(),inplace=True)
print('{:*^60}'.format('已完成对日期、地区、评论数填充,现neo9的数据缺失情况如下'))
print(neo9.isnull().sum())
#对连续变量进行平均值填充
redmik70["日期"].fillna(fillna_fenlei(redmik70["日期"]),inplace=True)
redmik70["地区"].fillna(fillna_fenlei(redmik70["地区"]),inplace=True)
redmik70["评论数"].fillna(redmik70["评论数"].mean(),inplace=True)
redmik70["存储空间"].fillna(fillna_fenlei(redmik70["存储空间"]),inplace=True)
redmik70["颜色"].fillna(fillna_fenlei(redmik70["颜色"]),inplace=True)
print('{:*^60}'.format('已完成对日期、地区、评论数、存储空间和颜色填充,现redmik70的数据缺失情况如下'))
print(redmik70.isnull().sum())
***************已完成对日期、地区、评论数填充,现neo9的数据缺失情况如下***************
是否会员 201
评价内容 0
颜色 0
存储空间 0
点赞数 0
评论数 0
日期 0
地区 0
评论长度 0
dtype: int64
*********已完成对日期、地区、评论数、存储空间和颜色填充,现redmik70的数据缺失情况如下*********
是否会员 131
评价内容 0
颜色 0
存储空间 0
点赞数 0
评论数 0
日期 0
地区 0
评论长度 0
dtype: int64
1.4 数据合并
# 品牌打标签
neo9["品牌"]="neo9"
ace3["品牌"]="ace3"
redmik70["品牌"]="redmik70"
#按照行进行拼接,数据一览
data= pd.concat([neo9,ace3,redmik70])
#对"是否会员列进行转换"
data["是否会员"] = [1 if(i=="PLUS会员") else 0 for i in data["是否会员"]]
data["RAM"] = [int(_.split("+")[0][:-2]) for _ in data["存储空间"]]
data["ROM"] = [int(_.split("+")[1][:-2]) for _ in data["存储空间"]]
data.head()
是否会员 | 评价内容 | 颜色 | 存储空间 | 点赞数 | 评论数 | 日期 | 地区 | 评论长度 | 品牌 | RAM | ROM | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | iQOO Neo9 设计时尚大气,金属质感边框搭配深邃的黑色背板,尽显高端品质。这款手机搭载... | 格斗黑 | 16GB+256GB | 105 | 18.0 | 2024-09-16 | 河南 | 217 | neo9 | 16 | 256 |
1 | 1 | 高颜值,高品质,一看就很上档次,非常喜欢!vivoiQOONeo9性能卓越,运行流畅,拍照效... | 格斗黑 | 16GB+256GB | 55 | 2.0 | 2024-09-08 | 安徽 | 77 | neo9 | 16 | 256 |
2 | 1 | 手机太棒了,两千出头的价位搭配骁龙8gen2的处理器,玩游戏的体验很好,游戏不卡顿的同时手机... | 格斗黑 | 12GB+256GB | 38 | 2.0 | 2024-09-15 | 广东 | 82 | neo9 | 12 | 256 |
3 | 1 | 高颜值,高品质,一分钱一分货,材质外观和质量一看就很上档次,非常喜欢!vivoiQOONeo... | 格斗黑 | 12GB+256GB | 22 | 1.0 | 2024-09-07 | 安徽 | 93 | neo9 | 12 | 256 |
4 | 1 | 外观不错,大小也很适合我\n外形外观:好\n屏幕音效:很舒服\n拍照效果:很不错\n运行速度... | 航海蓝 | 12GB+256GB | 5 | 0.0 | 2024-10-23 | 浙江 | 76 | neo9 | 12 | 256 |
1.5 数据分词
#停用词
stopwords = [line.strip() for line in open('停用词.txt', 'r', encoding='utf-8').readlines()]
#编写停用词处理函数,输入为dataframe中的评价内容元素,输出为分词后的列表。
def stopwords_process(data):
data_filtered = []
for i in data:
if(i not in stopwords):
data_filtered.append(i)
return data_filtered
#增加一列“分词”为对“评价内容”分词后的列表
data["分词"] = [stopwords_process(list(jieba.cut(x))) for x in list(data['评价内容'])]
data.head()
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\12700\AppData\Local\Temp\jieba.cache
Loading model cost 0.872 seconds.
Prefix dict has been built successfully.
是否会员 | 评价内容 | 颜色 | 存储空间 | 点赞数 | 评论数 | 日期 | 地区 | 评论长度 | 品牌 | RAM | ROM | 分词 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | iQOO Neo9 设计时尚大气,金属质感边框搭配深邃的黑色背板,尽显高端品质。这款手机搭载... | 格斗黑 | 16GB+256GB | 105 | 18.0 | 2024-09-16 | 河南 | 217 | neo9 | 16 | 256 | [iQOO, , Neo9, , 设计, 时尚, 大气, 金属, 质感, 边框, 搭配,... |
1 | 1 | 高颜值,高品质,一看就很上档次,非常喜欢!vivoiQOONeo9性能卓越,运行流畅,拍照效... | 格斗黑 | 16GB+256GB | 55 | 2.0 | 2024-09-08 | 安徽 | 77 | neo9 | 16 | 256 | [高颜值, 高品质, 一看, 上档次, 喜欢, vivoiQOONeo9, 性能, 卓越, ... |
2 | 1 | 手机太棒了,两千出头的价位搭配骁龙8gen2的处理器,玩游戏的体验很好,游戏不卡顿的同时手机... | 格斗黑 | 12GB+256GB | 38 | 2.0 | 2024-09-15 | 广东 | 82 | neo9 | 12 | 256 | [手机, 太棒了, 两千, 出头, 价位, 搭配, 骁龙, 8gen2, 处理器, 玩游戏,... |
3 | 1 | 高颜值,高品质,一分钱一分货,材质外观和质量一看就很上档次,非常喜欢!vivoiQOONeo... | 格斗黑 | 12GB+256GB | 22 | 1.0 | 2024-09-07 | 安徽 | 93 | neo9 | 12 | 256 | [高颜值, 高品质, 一分钱, 一分货, 材质, 外观, 质量, 一看, 上档次, 喜欢, ... |
4 | 1 | 外观不错,大小也很适合我\n外形外观:好\n屏幕音效:很舒服\n拍照效果:很不错\n运行速度... | 航海蓝 | 12GB+256GB | 5 | 0.0 | 2024-10-23 | 浙江 | 76 | neo9 | 12 | 256 | [外观, 不错, 大小, 适合, \n, 外形, 外观, \n, 屏幕, 音效, 舒服, \... |
data.info()
<class 'pandas.core.frame.DataFrame'>
Index: 3000 entries, 0 to 999
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 是否会员 3000 non-null int64
1 评价内容 3000 non-null object
2 颜色 3000 non-null object
3 存储空间 3000 non-null object
4 点赞数 3000 non-null int64
5 评论数 3000 non-null float64
6 日期 3000 non-null object
7 地区 3000 non-null object
8 评论长度 3000 non-null int64
9 品牌 3000 non-null object
10 RAM 3000 non-null int64
11 ROM 3000 non-null int64
12 分词 3000 non-null object
dtypes: float64(1), int64(5), object(7)
memory usage: 328.1+ KB
data.describe()
是否会员 | 点赞数 | 评论数 | 评论长度 | RAM | ROM | |
---|---|---|---|---|---|---|
count | 3000.000000 | 3000.000000 | 3000.000000 | 3000.000000 | 3000.000000 | 3000.000000 |
mean | 0.798333 | 0.888333 | 0.508779 | 94.605333 | 13.337333 | 307.234000 |
std | 0.401311 | 6.663082 | 2.133196 | 57.980113 | 1.887342 | 122.173347 |
min | 0.000000 | 0.000000 | 0.000000 | 10.000000 | 12.000000 | 1.000000 |
25% | 1.000000 | 0.000000 | 0.000000 | 63.000000 | 12.000000 | 256.000000 |
50% | 1.000000 | 0.000000 | 0.000000 | 77.000000 | 12.000000 | 256.000000 |
75% | 1.000000 | 0.000000 | 1.000000 | 104.000000 | 16.000000 | 256.000000 |
max | 1.000000 | 249.000000 | 106.000000 | 533.000000 | 16.000000 | 512.000000 |
2 手机销量分析
2.1 销售地区分析
#画分布图
# 分辨率参数-dpi,画布大小参数-figsize
plt.figure(dpi=300,figsize=(24,8))
# 改变文字大小参数-fontsize
plt.xticks(fontsize=10)
plt.title("地区分布图")
sns.countplot(x=data['地区'], hue=data['品牌'])
<Axes: title={'center': '地区分布图'}, xlabel='地区', ylabel='count'>
#总体地区占比
data_area = data.groupby(["地区"])["地区"].count()
data_area_neo9 = data[data["品牌"]=="neo9"].groupby(["地区"])["地区"].count()
data_area_ace = data[data["品牌"]=="ace3"].groupby(["地区"])["地区"].count()
data_area_redmik70 = data[data["品牌"]=="redmik70"].groupby(["地区"])["地区"].count()
#设置总画板的分辨率,宽和高
plt.figure(dpi=100,figsize=(18,18))
plt.xticks(fontsize=15)
#画第一张图,总体的地区分布
plt.subplot(2,2,1)
plt.title("总体分布")
plt.pie(data_area,labels=data_area.index,autopct='%3.1f%%')
# 画第2张图,neo9的地区分布
plt.subplot(2,2,2);
plt.title("neo9分布")
plt.pie(data_area_neo9,labels=data_area_neo9.index,autopct='%3.1f%%')
# 画第3张图,ace3的地区分布
plt.subplot(2,2,3);
plt.title("ace3分布")
plt.pie(data_area_ace,labels=data_area_ace.index,autopct='%3.1f%%')
# 画第4张图,redmi的地区分布
plt.subplot(2,2,4);
plt.title("redmi70分布")
plt.pie(data_area_redmik70,labels=data_area_redmik70.index,autopct='%3.1f%%')
plt.show()
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\36189799.py:12: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
plt.subplot(2,2,1)
2.2 手机颜色分析
#总体颜色占比
data_color = data.groupby(["颜色"])["颜色"].count()
data_color_neo9 = data[data["品牌"]=="neo9"].groupby(["颜色"])["颜色"].count()
data_color_ace = data[data["品牌"]=="ace3"].groupby(["颜色"])["颜色"].count()
data_color_redmik70 = data[data["品牌"]=="redmik70"].groupby(["颜色"])["颜色"].count()
#设置总画板的分辨率,宽和高
plt.figure(dpi=100,figsize=(18,18))
plt.xticks(fontsize=15)
#画第一张图,总体的颜色分布
plt.subplot(2,2,1)
plt.title("总体分布")
plt.pie(data_color,labels=data_color.index,autopct='%3.1f%%')
# 画第2张图,neo9的颜色分布
plt.subplot(2,2,2);
plt.title("neo9分布")
plt.pie(data_color_neo9,labels=data_color_neo9.index,autopct='%3.1f%%')
# 画第3张图,ace3的颜色分布
plt.subplot(2,2,3);
plt.title("ace3分布")
plt.pie(data_color_ace,labels=data_color_ace.index,autopct='%3.1f%%')
# 画第4张图,redmi的颜色分布
plt.subplot(2,2,4);
plt.title("redmi70分布")
plt.pie(data_color_redmik70,labels=data_color_redmik70.index,autopct='%3.1f%%')
plt.show()
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\1634177627.py:12: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
plt.subplot(2,2,1)
2.3 存储空间分析
#总体存储空间占比
data_rom = data.groupby(["存储空间"])["存储空间"].count()
data_rom_neo9 = data[data["品牌"]=="neo9"].groupby(["存储空间"])["存储空间"].count()
data_rom_ace = data[data["品牌"]=="ace3"].groupby(["存储空间"])["存储空间"].count()
data_rom_redmik70 = data[data["品牌"]=="redmik70"].groupby(["存储空间"])["存储空间"].count()
#设置总画板的分辨率,宽和高
plt.figure(dpi=100,figsize=(18,18))
plt.xticks(fontsize=15)
#画第一张图,总体的存储空间分布
plt.subplot(2,2,1)
plt.title("总体分布")
plt.pie(data_rom,labels=data_rom.index,autopct='%3.1f%%')
# 画第2张图,neo9的存储空间分布
plt.subplot(2,2,2);
plt.title("neo9分布")
plt.pie(data_rom_neo9,labels=data_rom_neo9.index,autopct='%3.1f%%')
# 画第3张图,ace3的存储空间分布
plt.subplot(2,2,3);
plt.title("ace3分布")
plt.pie(data_rom_ace,labels=data_rom_ace.index,autopct='%3.1f%%')
# 画第4张图,redmi的存储空间分布
plt.subplot(2,2,4);
plt.title("redmi70分布")
plt.pie(data_rom_redmik70,labels=data_rom_redmik70.index,autopct='%3.1f%%')
plt.show()
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\2901513871.py:12: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
plt.subplot(2,2,1)
#画分布图
# 分辨率参数-dpi,画布大小参数-figsize
plt.figure(dpi=300,figsize=(24,8))
# 改变文字大小参数-fontsize
plt.xticks(fontsize=10)
plt.title("存储空间分布图")
sns.countplot(x=data['存储空间'], hue=data['品牌'])
plt.show()
2.4相关性分析
1、先进行两两变量之间的相关性计算
# 相关性分析
print('{:*^60}'.format('相关性分析'))
print(data[["点赞数","评论数","评论长度","RAM","ROM"]].corr().round(2).T) # 打印原始数据相关性信息
sns.heatmap(data[["点赞数","评论数","评论长度","RAM","ROM"]].corr().round(2),cmap="Reds",annot=True)
plt.show()
***************************相关性分析****************************
点赞数 评论数 评论长度 RAM ROM
点赞数 1.00 0.80 0.01 0.04 0.01
评论数 0.80 1.00 -0.01 0.05 0.03
评论长度 0.01 -0.01 1.00 0.01 -0.02
RAM 0.04 0.05 0.01 1.00 0.58
ROM 0.01 0.03 -0.02 0.58 1.00
2、进行Kmeas聚类分析
data_onehot_part = data[["是否会员","颜色","RAM","ROM","点赞数","评论数","评论长度","地区","品牌"]]
scaler = MinMaxScaler()
# 对RAM数据进行列标准化
data_onehot_part[["RAM","ROM","点赞数","评论数","评论长度"]] = scaler.fit_transform(data_onehot_part[["RAM","ROM","点赞数","评论数","评论长度"]])
data_onehot_part = pd.get_dummies(data_onehot_part)
data_onehot_part.head()
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\3032597134.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
data_onehot_part[["RAM","ROM","点赞数","评论数","评论长度"]] = scaler.fit_transform(data_onehot_part[["RAM","ROM","点赞数","评论数","评论长度"]])
是否会员 | RAM | ROM | 点赞数 | 评论数 | 评论长度 | 颜色_墨羽 | 颜色_星曜白 | 颜色_星辰黑 | 颜色_晴雪 | ... | 地区_西藏 | 地区_贵州 | 地区_辽宁 | 地区_重庆 | 地区_陕西 | 地区_青海 | 地区_黑龙江 | 品牌_ace3 | 品牌_neo9 | 品牌_redmik70 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1.0 | 0.499022 | 0.421687 | 0.169811 | 0.395793 | False | False | False | False | ... | False | False | False | False | False | False | False | False | True | False |
1 | 1 | 1.0 | 0.499022 | 0.220884 | 0.018868 | 0.128107 | False | False | False | False | ... | False | False | False | False | False | False | False | False | True | False |
2 | 1 | 0.0 | 0.499022 | 0.152610 | 0.018868 | 0.137667 | False | False | False | False | ... | False | False | False | False | False | False | False | False | True | False |
3 | 1 | 0.0 | 0.499022 | 0.088353 | 0.009434 | 0.158700 | False | False | False | False | ... | False | False | False | False | False | False | False | False | True | False |
4 | 1 | 0.0 | 0.499022 | 0.020080 | 0.000000 | 0.126195 | False | False | False | False | ... | False | False | False | False | False | False | False | False | True | False |
5 rows × 53 columns
def kmeans_process(data):
# 通过平均轮廓系数检验得到最佳KMeans聚类模型
score_list = list() # 用来存储每个K下模型的平局轮廓系数
silhouette_int = -1 # 初始化的平均轮廓系数阀值
for n_clusters in range(2, 8): # 遍历从2到5几个有限组
model_kmeans = KMeans(n_clusters=n_clusters) # 建立聚类模型对象
labels_tmp = model_kmeans.fit_predict(data) # 训练聚类模型
silhouette_tmp = silhouette_score(data, labels_tmp) # 得到每个K下的平均轮廓系数
if silhouette_tmp > silhouette_int: # 如果平均轮廓系数更高
best_k = n_clusters # 保存K将最好的K存储下来
silhouette_int = silhouette_tmp # 保存平均轮廓得分
best_kmeans = model_kmeans # 保存模型实例对象
cluster_labels_k = labels_tmp # 保存聚类标签
score_list.append([n_clusters, silhouette_tmp]) # 将每次K及其得分追加到列表
print('{:*^60}'.format('K值对应的轮廓系数:'))
print(np.array(score_list)) # 打印输出所有K下的详细得分
print('最优的K值是:{0} \n对应的轮廓系数是:{1}'.format(best_k, silhouette_int))
return cluster_labels_k
cluster_labels_k = kmeans_process(data_onehot_part)
d:\anaconda\lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
d:\anaconda\lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
d:\anaconda\lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
d:\anaconda\lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
d:\anaconda\lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
d:\anaconda\lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
*************************K值对应的轮廓系数:*************************
[[2. 0.17536367]
[3. 0.23756583]
[4. 0.22252055]
[5. 0.19257643]
[6. 0.16658535]
[7. 0.19443532]]
最优的K值是:3
对应的轮廓系数是:0.2375658338076598
3、进行关联关系挖掘
#选取其中可用变量并进行格式化处理
data_onehot_all = data[["是否会员","颜色","存储空间","点赞数","评论数","地区","品牌"]]
data_onehot_all["点赞数"] = [1 if(_>0) else 0 for _ in data_onehot_all["点赞数"]]
data_onehot_all["评论数"] = [1 if(_>0) else 0 for _ in data_onehot_all["评论数"]]
data_onehot_all = pd.get_dummies(data_onehot_all)
data_onehot_all.head()
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\3570425174.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
data_onehot_all["点赞数"] = [1 if(_>0) else 0 for _ in data_onehot_all["点赞数"]]
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\3570425174.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
data_onehot_all["评论数"] = [1 if(_>0) else 0 for _ in data_onehot_all["评论数"]]
是否会员 | 点赞数 | 评论数 | 颜色_墨羽 | 颜色_星曜白 | 颜色_星辰黑 | 颜色_晴雪 | 颜色_月海蓝 | 颜色_格斗黑 | 颜色_浅茄紫 | ... | 地区_西藏 | 地区_贵州 | 地区_辽宁 | 地区_重庆 | 地区_陕西 | 地区_青海 | 地区_黑龙江 | 品牌_ace3 | 品牌_neo9 | 品牌_redmik70 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 1 | False | False | False | False | False | True | False | ... | False | False | False | False | False | False | False | False | True | False |
1 | 1 | 1 | 1 | False | False | False | False | False | True | False | ... | False | False | False | False | False | False | False | False | True | False |
2 | 1 | 1 | 1 | False | False | False | False | False | True | False | ... | False | False | False | False | False | False | False | False | True | False |
3 | 1 | 1 | 1 | False | False | False | False | False | True | False | ... | False | False | False | False | False | False | False | False | True | False |
4 | 1 | 1 | 0 | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | True | False |
5 rows × 55 columns
#开始进行关联关系挖掘
frequent_itemsets = apriori(data_onehot_all, min_support=0.2, use_colnames=True)
# 生成关联规则
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
# 筛选出lift值大于1的规则
high_lift_rules = rules[rules['lift'] > 1]
print(high_lift_rules)
antecedents consequents \
0 (存储空间_12GB+256GB) (是否会员)
1 (是否会员) (存储空间_12GB+256GB)
2 (品牌_neo9) (是否会员)
3 (是否会员) (品牌_neo9)
4 (品牌_redmik70) (是否会员)
5 (是否会员) (品牌_redmik70)
6 (品牌_neo9) (评论数)
7 (评论数) (品牌_neo9)
8 (存储空间_12GB+256GB) (品牌_redmik70)
9 (品牌_redmik70) (存储空间_12GB+256GB)
10 (存储空间_12GB+256GB, 品牌_redmik70) (是否会员)
11 (存储空间_12GB+256GB, 是否会员) (品牌_redmik70)
12 (品牌_redmik70, 是否会员) (存储空间_12GB+256GB)
13 (存储空间_12GB+256GB) (品牌_redmik70, 是否会员)
14 (品牌_redmik70) (存储空间_12GB+256GB, 是否会员)
15 (是否会员) (存储空间_12GB+256GB, 品牌_redmik70)
antecedent support consequent support support confidence lift \
0 0.662333 0.798333 0.545000 0.822849 1.030708
1 0.798333 0.662333 0.545000 0.682672 1.030708
2 0.333333 0.798333 0.266333 0.799000 1.000835
3 0.798333 0.333333 0.266333 0.333612 1.000835
4 0.333333 0.798333 0.289667 0.869000 1.088518
5 0.798333 0.333333 0.289667 0.362839 1.088518
6 0.333333 0.430000 0.208000 0.624000 1.451163
7 0.430000 0.333333 0.208000 0.483721 1.451163
8 0.662333 0.333333 0.298333 0.450428 1.351283
9 0.333333 0.662333 0.298333 0.895000 1.351283
10 0.298333 0.798333 0.263667 0.883799 1.107055
11 0.545000 0.333333 0.263667 0.483792 1.451376
12 0.289667 0.662333 0.263667 0.910242 1.374295
13 0.662333 0.289667 0.263667 0.398088 1.374295
14 0.333333 0.545000 0.263667 0.791000 1.451376
15 0.798333 0.298333 0.263667 0.330271 1.107055
leverage conviction zhangs_metric
0 0.016237 1.138385 0.088232
1 0.016237 1.064094 0.147734
2 0.000222 1.003317 0.001252
3 0.000222 1.000418 0.004137
4 0.023556 1.539440 0.121979
5 0.023556 1.046308 0.403237
6 0.064667 1.515957 0.466346
7 0.064667 1.291291 0.545434
8 0.077556 1.213065 0.769880
9 0.077556 3.215873 0.389944
10 0.025497 1.735497 0.137818
11 0.082000 1.291469 0.683514
12 0.071811 3.761953 0.383418
13 0.071811 1.180127 0.806578
14 0.082000 2.177033 0.466498
15 0.025497 1.047688 0.479516
d:\anaconda\lib\site-packages\mlxtend\frequent_patterns\fpcommon.py:109: DeprecationWarning: DataFrames with non-bool types result in worse computationalperformance and their support might be discontinued in the future.Please use a DataFrame with bool type
warnings.warn(
3 消费者评价分析
3.1 评价词频分析
#编写函数,按照品牌进行词频统计分析
def counter_words(data,group_name):
if(group_name=="all"):
data = data["分词"]
else:
data = data[data["品牌"]==group_name]["分词"]
sentences_sum = []
for _ in data:
sentences_sum = sentences_sum +_
word_freq = Counter(sentences_sum)
word_freq[" "]=0
word_freq["\n"]=0
word_freq["__"]=0
return pd.DataFrame(word_freq.most_common(20)),str(sentences_sum)
cipin_redmik70,words_redmik70 = counter_words(data,"redmik70")
cipin_ace3,words_ace3 = counter_words(data,"ace3")
cipin_neo9,words_neo9 = counter_words(data,"neo9")
# 可视化词频统计结果
plt.figure(dpi=300,figsize=(15,15))
plt.xticks(fontsize=10)
plt.subplot(3,1,1)
plt.bar(cipin_ace3[0], cipin_ace3[1])
plt.xlabel('Words')
plt.ylabel('Frequency')
plt.title('ace3词频')
plt.subplot(3,1,2)
plt.bar(cipin_neo9[0], cipin_neo9[1])
plt.xlabel('Words')
plt.ylabel('Frequency')
plt.title('neo9词频')
plt.subplot(3,1,3)
plt.bar(cipin_redmik70[0], cipin_redmik70[1])
plt.xlabel('Words')
plt.ylabel('Frequency')
plt.title('redmik70词频')
plt.show()
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\426847262.py:5: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
plt.subplot(3,1,1)
3.2 评价词云展示
plt.figure(dpi=200,figsize=(15,10))
plt.xticks(fontsize=10)
plt.subplot(3,1,1)
wc_neo9 = WordCloud(
background_color='white',
width=1500,
height=1000,
font_path="C:\Windows\Fonts\simhei.ttf",
)
wc_neo9.generate_from_text(words_neo9)#绘制图片
plt.title("neo9词云展示")
plt.imshow(wc_neo9)
plt.subplot(3,1,2)
wc_ace3 = WordCloud(
background_color='white',
width=1500,
height=1000,
font_path="C:\Windows\Fonts\simhei.ttf",
)
wc_ace3.generate_from_text(words_ace3)#绘制图片
plt.title("ace3词云展示")
plt.imshow(wc_ace3)
plt.subplot(3,1,3)
wc_redmik70 = WordCloud(
background_color='white',
width=1500,
height=1000,
font_path="C:\Windows\Fonts\simhei.ttf",
)
wc_redmik70.generate_from_text(words_redmik70)#绘制图片
plt.title("redmik70词云展示")
plt.imshow(wc_redmik70)
plt.show()
<>:9: DeprecationWarning: invalid escape sequence \W
<>:20: DeprecationWarning: invalid escape sequence \W
<>:31: DeprecationWarning: invalid escape sequence \W
<>:9: DeprecationWarning: invalid escape sequence \W
<>:20: DeprecationWarning: invalid escape sequence \W
<>:31: DeprecationWarning: invalid escape sequence \W
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\786007750.py:9: DeprecationWarning: invalid escape sequence \W
font_path="C:\Windows\Fonts\simhei.ttf",
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\786007750.py:20: DeprecationWarning: invalid escape sequence \W
font_path="C:\Windows\Fonts\simhei.ttf",
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\786007750.py:31: DeprecationWarning: invalid escape sequence \W
font_path="C:\Windows\Fonts\simhei.ttf",
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\786007750.py:4: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
plt.subplot(3,1,1)
4 消费者评价情感分析
4.1 消费者评价情感分析
data["情感得分"] = [snownlp.SnowNLP(_).sentiments for _ in data["评价内容"]]
print(data["情感得分"])
0 1.000000
1 1.000000
2 0.999982
3 1.000000
4 1.000000
...
995 0.999891
996 0.997800
997 1.000000
998 0.963768
999 1.000000
Name: 情感得分, Length: 3000, dtype: float64
print(data[data["品牌"]=="ace3"]["情感得分"].describe())
print(data[data["品牌"]=="redmik70"]["情感得分"].describe())
print(data[data["品牌"]=="neo9"]["情感得分"].describe())
plt.figure(dpi=100,figsize=(8,5))
plt.xticks(fontsize=10)
plt.subplot(1,3,1)
plt.hist(data[data["品牌"]=="ace3"]["情感得分"])
plt.title("ace3情感得分")
plt.subplot(1,3,2)
plt.hist(data[data["品牌"]=="redmik70"]["情感得分"])
plt.title("redmik70情感得分")
plt.subplot(1,3,3)
plt.hist(data[data["品牌"]=="neo9"]["情感得分"])
plt.title("neo9情感得分")
count 1000.000000
mean 0.958957
std 0.159753
min 0.000039
25% 0.999313
50% 0.999996
75% 1.000000
max 1.000000
Name: 情感得分, dtype: float64
count 1000.000000
mean 0.976659
std 0.115732
min 0.005078
25% 0.999846
50% 0.999999
75% 1.000000
max 1.000000
Name: 情感得分, dtype: float64
count 1000.000000
mean 0.964201
std 0.150796
min 0.006325
25% 0.999737
50% 0.999998
75% 1.000000
max 1.000000
Name: 情感得分, dtype: float64
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\694268859.py:10: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
plt.subplot(1,3,1)
Text(0.5, 1.0, 'neo9情感得分')
4.2 消费者评价影响因素分析
以消费者评价的情感得分为因变量,以评价长度、品牌、存储空间、是否为会员、地区等为自变量进行拟合
data_onehot_part[["是否会员","RAM","ROM","点赞数","评论数","评论长度"]]
是否会员 | RAM | ROM | 点赞数 | 评论数 | 评论长度 | |
---|---|---|---|---|---|---|
0 | 1 | 1.0 | 0.499022 | 0.421687 | 0.169811 | 0.395793 |
1 | 1 | 1.0 | 0.499022 | 0.220884 | 0.018868 | 0.128107 |
2 | 1 | 0.0 | 0.499022 | 0.152610 | 0.018868 | 0.137667 |
3 | 1 | 0.0 | 0.499022 | 0.088353 | 0.009434 | 0.158700 |
4 | 1 | 0.0 | 0.499022 | 0.020080 | 0.000000 | 0.126195 |
... | ... | ... | ... | ... | ... | ... |
995 | 1 | 0.0 | 0.499022 | 0.000000 | 0.000000 | 0.137667 |
996 | 1 | 0.0 | 0.499022 | 0.000000 | 0.000000 | 0.122371 |
997 | 1 | 0.0 | 0.499022 | 0.000000 | 0.000000 | 0.166348 |
998 | 1 | 0.0 | 0.499022 | 0.000000 | 0.009434 | 0.116635 |
999 | 1 | 0.0 | 0.499022 | 0.000000 | 0.000000 | 0.208413 |
3000 rows × 6 columns
model = sm.OLS(data["情感得分"].astype(float), data_onehot_part[["是否会员","RAM","ROM","点赞数","评论数","评论长度"]].astype(float)) #生成模型
result = model.fit() #模型拟合
result.summary() #模型描述
Dep. Variable: | 情感得分 | R-squared (uncentered): | 0.919 |
---|---|---|---|
Model: | OLS | Adj. R-squared (uncentered): | 0.919 |
Method: | Least Squares | F-statistic: | 5662. |
Date: | Sun, 12 Jan 2025 | Prob (F-statistic): | 0.00 |
Time: | 15:50:27 | Log-Likelihood: | -417.45 |
No. Observations: | 3000 | AIC: | 846.9 |
Df Residuals: | 2994 | BIC: | 882.9 |
Df Model: | 6 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
是否会员 | 0.3663 | 0.011 | 34.490 | 0.000 | 0.345 | 0.387 |
RAM | -0.1436 | 0.013 | -11.130 | 0.000 | -0.169 | -0.118 |
ROM | 0.8360 | 0.019 | 44.439 | 0.000 | 0.799 | 0.873 |
点赞数 | -0.1597 | 0.318 | -0.503 | 0.615 | -0.783 | 0.463 |
评论数 | 0.8796 | 0.422 | 2.083 | 0.037 | 0.052 | 1.707 |
评论长度 | 0.9640 | 0.042 | 22.818 | 0.000 | 0.881 | 1.047 |
Omnibus: | 213.267 | Durbin-Watson: | 1.747 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 802.558 |
Skew: | -0.266 | Prob(JB): | 5.33e-175 |
Kurtosis: | 5.477 | Cond. No. | 112. |
Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
5 手机销量预测
5.1 手机销量展示
# 将列索引改为l,u
data_sale = data.set_index("日期")
data_sale_ace3 = data_sale[data_sale["品牌"]=="ace3"].groupby(["日期"]).count()
data_sale_neo9 = data_sale[data_sale["品牌"]=="neo9"].groupby(["日期"]).count()
data_sale_redmik70 = data_sale[data_sale["品牌"]=="redmik70"].groupby(["日期"]).count()
def show_sale(data,title):
fig,ax = plt.subplots(figsize=(14,7),dpi=200)
ax.plot(data)
plt.legend(fontsize=20)
plt.title(title,fontsize=20)
tick_spacing = 10 #通过修改tick_spacing的值可以修改x轴的密度
ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing ))
plt.xticks(rotation=30)
plt.show()
#fig,ax=plt.figure(num=3,dpi=80,figsize=(15,8))
show_sale(data_sale_ace3,"ace3抽样从2024-04-17到2024-11-1的销量时序图")
show_sale(data_sale_neo9,"neo9抽样从2024-05-02到2024-11-1的销量时序图")
show_sale(data_sale_redmik70,"redmik70抽样从2024-05-23到2024-11-1的销量时序图")
No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
5.2 手机销量预测
#对三款手机分别做训练集和测试集切分
data_sale_redmik70_all = pd.DataFrame(data_sale_redmik70.iloc[:,0])
data_sale_redmik70_all.columns = ["Count"]
data_sale_ace3_all = pd.DataFrame(data_sale_ace3.iloc[:,0])
data_sale_ace3_all.columns = ["Count"]
data_sale_neo9_all = pd.DataFrame(data_sale_neo9.iloc[:,0])
data_sale_neo9_all.columns = ["Count"]
#编写时间序列分析预测函数
def timeseries_valid(data_sou,time_begin,time_end,title):
data = data_sou.copy()
results = sm.tsa.statespace.SARIMAX(data.Count, order=(1,1,1),seasonal_order=(1,1,1,12)).fit()
data['SARIMA'] = results.predict(start=time_begin,end=time_end,dynamic=True)
fig,ax = plt.subplots(figsize=(14,7))
ax.plot(data['Count'], label='all')
ax.plot(data['SARIMA'], label='SARIMA')
plt.legend(fontsize=20)
plt.title(title,fontsize=20)
tick_spacing = 10 #通过修改tick_spacing的值可以修改x轴的密度
ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing ))
plt.xticks(rotation=30)
plt.show()
timeseries_valid(data_sale_redmik70_all,"2024-10-12","2024-11-01","redemik70时序预测验证图")
timeseries_valid(data_sale_ace3_all,"2024-10-08","2024-11-01","ace3时序预测验证图")
timeseries_valid(data_sale_neo9_all,"2024-10-08","2024-11-01","neo9时序预测验证图")
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
self._init_dates(dates, freq)
pre_index = ["2024-12-01","2025-01-01","2025-02-01","2025-03-01","2025-04-01","2025-05-01","2025-06-01","2025-07-01","2025-08-01","2025-09-01","2025-10-01","2025-11-01"]
#pre_index = [pd.to_datetime(i) for i in pre_index]
#编写时间序列分析预测函数
def timeseries_predict(data_sou,time_begin,time_end,title):
data = data_sou.copy()
results = sm.tsa.statespace.SARIMAX(data.Count, order=(1,1,1),seasonal_order=(1,1,1,12)).fit()#,
data['SARIMA'] = results.predict(start=time_begin,end=time_end,dynamic=True)#,end="2024-10-31",
# 预测未来12个月的销售数据
forecast = results.get_forecast(steps=4)
forecast_ci = forecast.conf_int()
print(data.index)
ax = data['Count'].plot(label='Observed',figsize=(18, 10))
forecast.predicted_mean.plot(ax=ax, label='Forecast', alpha=0.7)
#forecast_ci.index=["2024-12-01","2025-01-01","2025-02-01","2025-03-01","2025-04-01","2025-05-01","2025-06-01","2025-07-01","2025-08-01"]
ax.fill_between(forecast_ci.index,
forecast_ci.iloc[:, 0],
forecast_ci.iloc[:, 1], color='k', alpha=0.2)
ax.set_xlabel('Date')
ax.set_ylabel('Sales')
ax.set_title(title,fontsize=20)
#tick_spacing = 1 #通过修改tick_spacing的值可以修改x轴的密度
#ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing ))
plt.xticks(rotation=30)
plt.legend(fontsize=20)
plt.show()
timeseries_predict(data_sale_redmik70_all,"2024-10-12","2024-11-01","redemik70时序预测图")
timeseries_predict(data_sale_ace3_all,"2024-10-08","2024-11-01","ace3时序预测图")
timeseries_predict(data_sale_neo9_all,"2024-10-08","2024-11-01","neo9时序预测图")
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: ValueWarning: No supported index is available. Prediction results will be given with an integer index beginning at `start`.
return get_prediction_index(
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: FutureWarning: No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.
return get_prediction_index(
Index(['2024-05-23', '2024-05-28', '2024-07-05', '2024-07-17', '2024-07-29',
'2024-08-10', '2024-08-17', '2024-08-20', '2024-08-22', '2024-08-25',
'2024-08-28', '2024-08-29', '2024-08-31', '2024-09-01', '2024-09-02',
'2024-09-03', '2024-09-04', '2024-09-05', '2024-09-22', '2024-09-23',
'2024-09-24', '2024-09-25', '2024-10-02', '2024-10-03', '2024-10-04',
'2024-10-05', '2024-10-06', '2024-10-07', '2024-10-08', '2024-10-09',
'2024-10-10', '2024-10-11', '2024-10-12', '2024-10-13', '2024-10-14',
'2024-10-15', '2024-10-16', '2024-10-17', '2024-10-18', '2024-10-19',
'2024-10-20', '2024-10-21', '2024-10-22', '2024-10-23', '2024-10-24',
'2024-10-25', '2024-10-26', '2024-10-27', '2024-10-28', '2024-10-29',
'2024-10-30', '2024-10-31', '2024-11-01'],
dtype='object', name='日期')
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: ValueWarning: No supported index is available. Prediction results will be given with an integer index beginning at `start`.
return get_prediction_index(
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: FutureWarning: No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.
return get_prediction_index(
Index(['2024-04-17', '2024-05-07', '2024-05-09', '2024-05-10', '2024-05-12',
'2024-05-21', '2024-05-22', '2024-05-23', '2024-05-24', '2024-05-25',
...
'2024-10-23', '2024-10-24', '2024-10-25', '2024-10-26', '2024-10-27',
'2024-10-28', '2024-10-29', '2024-10-30', '2024-10-31', '2024-11-01'],
dtype='object', name='日期', length=148)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: ValueWarning: No supported index is available. Prediction results will be given with an integer index beginning at `start`.
return get_prediction_index(
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: FutureWarning: No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.
return get_prediction_index(
Index(['2024-05-02', '2024-05-03', '2024-05-04', '2024-05-05', '2024-05-06',
'2024-05-07', '2024-05-08', '2024-05-09', '2024-05-10', '2024-05-12',
...
'2024-10-23', '2024-10-24', '2024-10-25', '2024-10-26', '2024-10-27',
'2024-10-28', '2024-10-29', '2024-10-30', '2024-10-31', '2024-11-01'],
dtype='object', name='日期', length=176)
36: FutureWarning: No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.
return get_prediction_index(
Index(['2024-04-17', '2024-05-07', '2024-05-09', '2024-05-10', '2024-05-12',
'2024-05-21', '2024-05-22', '2024-05-23', '2024-05-24', '2024-05-25',
...
'2024-10-23', '2024-10-24', '2024-10-25', '2024-10-26', '2024-10-27',
'2024-10-28', '2024-10-29', '2024-10-30', '2024-10-31', '2024-11-01'],
dtype='object', name='日期', length=148)
[外链图片转存中…(img-4hfVPC5e-1736725507133)]
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: ValueWarning: No supported index is available. Prediction results will be given with an integer index beginning at `start`.
return get_prediction_index(
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: FutureWarning: No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.
return get_prediction_index(
Index(['2024-05-02', '2024-05-03', '2024-05-04', '2024-05-05', '2024-05-06',
'2024-05-07', '2024-05-08', '2024-05-09', '2024-05-10', '2024-05-12',
...
'2024-10-23', '2024-10-24', '2024-10-25', '2024-10-26', '2024-10-27',
'2024-10-28', '2024-10-29', '2024-10-30', '2024-10-31', '2024-11-01'],
dtype='object', name='日期', length=176)
[外链图片转存中…(img-WD9kJKy7-1736725507133)]