京东平台手机评论分析-CSDN博客

本文链接：https://blog.csdn.net/weixin_43360569/article/details/145104678

1 数据处理

1.1 数据准备

import pandas as pd
from random import choice #数据填充用
import numpy as np
from sklearn.preprocessing import StandardScaler,MinMaxScaler#数据标准化用
from sklearn.cluster import KMeans  #聚类分析建模用
from sklearn.manifold import TSNE #时间序列用
from sklearn.metrics import silhouette_score
from collections import Counter #词频统计用
from wordcloud import WordCloud #词云图绘制用
import snownlp #情感分析用
import statsmodels.api as sm #回归拟合用
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.ticker as ticker
plt.rcParams['font.sans-serif'] = ['SimHei']   #解决中文显示问题
plt.rcParams['axes.unicode_minus'] = False    # 解决中文显示问题
import jieba
#引入mlxtend包进行关联关系挖掘
from mlxtend.frequent_patterns import apriori, association_rules

# 利用pandas库读取数据
neo9 = pd.read_excel("IQOONeo9.xlsx")
ace3 = pd.read_excel("一加Ace3.xlsx") 
redmik70 = pd.read_excel("RedmiK70.xlsx")

1.2 数据清洗

#增加列"评论长度"用于数据清洗
neo9["评论长度"] = [len(_) for _ in neo9["评价内容"]]
ace3["评论长度"] = [len(_) for _ in ace3["评价内容"]]
redmik70["评论长度"] = [len(_) for _ in redmik70["评价内容"]]

# 对评论长度少于10的评论进行删除
print("neo9短评论数量为:"+str(len(neo9[neo9["评论长度"]<10])))

print("ace3短评论数量为:"+str(len(ace3[ace3["评论长度"]<10])))

print("redmik70短评论数量为:"+str(len(redmik70[redmik70["评论长度"]<10])))
#结果为0，则不删除
neo9.head()

neo9短评论数量为:0
ace3短评论数量为:0
redmik70短评论数量为:0

	是否会员	评价内容	颜色	存储空间	点赞数	评论数	日期	地区	评论长度
0	PLUS会员	iQOO Neo9 设计时尚大气，金属质感边框搭配深邃的黑色背板，尽显高端品质。这款手机搭载...	格斗黑	16GB+256GB	105	18.0	2024-09-16	河南	217
1	PLUS会员	高颜值,高品质,一看就很上档次，非常喜欢！vivoiQOONeo9性能卓越，运行流畅，拍照效...	格斗黑	16GB+256GB	55	2.0	2024-09-08	安徽	77
2	PLUS会员	手机太棒了，两千出头的价位搭配骁龙8gen2的处理器，玩游戏的体验很好，游戏不卡顿的同时手机...	格斗黑	12GB+256GB	38	2.0	2024-09-15	广东	82
3	PLUS会员	高颜值，高品质，一分钱一分货，材质外观和质量一看就很上档次，非常喜欢！vivoiQOONeo...	格斗黑	12GB+256GB	22	1.0	2024-09-07	安徽	93
4	PLUS会员	外观不错，大小也很适合我\n外形外观：好\n屏幕音效：很舒服\n拍照效果：很不错\n运行速度...	航海蓝	12GB+256GB	5	0.0	2024-10-23	浙江	76

1.3 数据填充

#展示数据缺失情况

print('{:*^60}'.format('neo9的数据缺失情况如下'))
print(neo9.isnull().sum())
print('{:*^60}'.format('ace3的数据缺失情况如下'))
print(ace3.isnull().sum())
print('{:*^60}'.format('redmik70的数据缺失情况如下'))
print(redmik70.isnull().sum())

***********************neo9的数据缺失情况如下************************
是否会员    201
评价内容      0
颜色        0
存储空间      0
点赞数       0
评论数      29
日期       29
地区       29
评论长度      0
dtype: int64
***********************ace3的数据缺失情况如下************************
是否会员    273
评价内容      0
颜色        0
存储空间      0
日期        0
地区        0
点赞数       0
评论数       0
评论长度      0
dtype: int64
*********************redmik70的数据缺失情况如下**********************
是否会员    131
评价内容      0
颜色       60
存储空间     60
点赞数       0
评论数       0
日期       64
地区       64
评论长度      0
dtype: int64

可以看到neo9和redmi的颜色、存储空间、评论数、日期、地区都有为空的，ace3并没有空值
其中颜色、存储空间、日期、地区为分类属性，可通过随机选择其他非空值进行填充
评论数为连续属性，可通过选择平均值进行填充
注意：要保证同品牌数据进行填充

#编写分类属性填充函数
def fillna_fenlei(data):
    #返回该列非空值的随机值
    return choice(list(data.dropna().unique()))

neo9["日期"].fillna(fillna_fenlei(neo9["日期"]),inplace=True)

neo9["地区"].fillna(fillna_fenlei(neo9["地区"]),inplace=True)

neo9["评论数"].fillna(neo9["评论数"].mean(),inplace=True)

print('{:*^60}'.format('已完成对日期、地区、评论数填充,现neo9的数据缺失情况如下'))
print(neo9.isnull().sum())

#对连续变量进行平均值填充

redmik70["日期"].fillna(fillna_fenlei(redmik70["日期"]),inplace=True)

redmik70["地区"].fillna(fillna_fenlei(redmik70["地区"]),inplace=True)

redmik70["评论数"].fillna(redmik70["评论数"].mean(),inplace=True)

redmik70["存储空间"].fillna(fillna_fenlei(redmik70["存储空间"]),inplace=True)

redmik70["颜色"].fillna(fillna_fenlei(redmik70["颜色"]),inplace=True)

print('{:*^60}'.format('已完成对日期、地区、评论数、存储空间和颜色填充,现redmik70的数据缺失情况如下'))
print(redmik70.isnull().sum())

***************已完成对日期、地区、评论数填充,现neo9的数据缺失情况如下***************
是否会员    201
评价内容      0
颜色        0
存储空间      0
点赞数       0
评论数       0
日期        0
地区        0
评论长度      0
dtype: int64
*********已完成对日期、地区、评论数、存储空间和颜色填充,现redmik70的数据缺失情况如下*********
是否会员    131
评价内容      0
颜色        0
存储空间      0
点赞数       0
评论数       0
日期        0
地区        0
评论长度      0
dtype: int64

1.4 数据合并

# 品牌打标签
neo9["品牌"]="neo9"
ace3["品牌"]="ace3"
redmik70["品牌"]="redmik70"

#按照行进行拼接，数据一览
data= pd.concat([neo9,ace3,redmik70])

#对"是否会员列进行转换"
data["是否会员"] = [1 if(i=="PLUS会员") else 0 for i in data["是否会员"]]

data["RAM"] = [int(_.split("+")[0][:-2]) for _ in data["存储空间"]]

data["ROM"] = [int(_.split("+")[1][:-2]) for _ in data["存储空间"]]

data.head()

	是否会员	评价内容	颜色	存储空间	点赞数	评论数	日期	地区	评论长度	品牌	RAM	ROM
0	1	iQOO Neo9 设计时尚大气，金属质感边框搭配深邃的黑色背板，尽显高端品质。这款手机搭载...	格斗黑	16GB+256GB	105	18.0	2024-09-16	河南	217	neo9	16	256
1	1	高颜值,高品质,一看就很上档次，非常喜欢！vivoiQOONeo9性能卓越，运行流畅，拍照效...	格斗黑	16GB+256GB	55	2.0	2024-09-08	安徽	77	neo9	16	256
2	1	手机太棒了，两千出头的价位搭配骁龙8gen2的处理器，玩游戏的体验很好，游戏不卡顿的同时手机...	格斗黑	12GB+256GB	38	2.0	2024-09-15	广东	82	neo9	12	256
3	1	高颜值，高品质，一分钱一分货，材质外观和质量一看就很上档次，非常喜欢！vivoiQOONeo...	格斗黑	12GB+256GB	22	1.0	2024-09-07	安徽	93	neo9	12	256
4	1	外观不错，大小也很适合我\n外形外观：好\n屏幕音效：很舒服\n拍照效果：很不错\n运行速度...	航海蓝	12GB+256GB	5	0.0	2024-10-23	浙江	76	neo9	12	256

1.5 数据分词

#停用词
stopwords = [line.strip() for line in open('停用词.txt', 'r', encoding='utf-8').readlines()]

#编写停用词处理函数，输入为dataframe中的评价内容元素，输出为分词后的列表。
def stopwords_process(data):
    data_filtered = []
    for i in data:
        if(i not in stopwords):
            data_filtered.append(i)
    return data_filtered

#增加一列“分词”为对“评价内容”分词后的列表
data["分词"] = [stopwords_process(list(jieba.cut(x))) for x in list(data['评价内容'])]

data.head()

Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\12700\AppData\Local\Temp\jieba.cache
Loading model cost 0.872 seconds.
Prefix dict has been built successfully.

	是否会员	评价内容	颜色	存储空间	点赞数	评论数	日期	地区	评论长度	品牌	RAM	ROM	分词
0	1	iQOO Neo9 设计时尚大气，金属质感边框搭配深邃的黑色背板，尽显高端品质。这款手机搭载...	格斗黑	16GB+256GB	105	18.0	2024-09-16	河南	217	neo9	16	256	[iQOO, , Neo9, , 设计, 时尚, 大气, 金属, 质感, 边框, 搭配,...
1	1	高颜值,高品质,一看就很上档次，非常喜欢！vivoiQOONeo9性能卓越，运行流畅，拍照效...	格斗黑	16GB+256GB	55	2.0	2024-09-08	安徽	77	neo9	16	256	[高颜值, 高品质, 一看, 上档次, 喜欢, vivoiQOONeo9, 性能, 卓越, ...
2	1	手机太棒了，两千出头的价位搭配骁龙8gen2的处理器，玩游戏的体验很好，游戏不卡顿的同时手机...	格斗黑	12GB+256GB	38	2.0	2024-09-15	广东	82	neo9	12	256	[手机, 太棒了, 两千, 出头, 价位, 搭配, 骁龙, 8gen2, 处理器, 玩游戏,...
3	1	高颜值，高品质，一分钱一分货，材质外观和质量一看就很上档次，非常喜欢！vivoiQOONeo...	格斗黑	12GB+256GB	22	1.0	2024-09-07	安徽	93	neo9	12	256	[高颜值, 高品质, 一分钱, 一分货, 材质, 外观, 质量, 一看, 上档次, 喜欢, ...
4	1	外观不错，大小也很适合我\n外形外观：好\n屏幕音效：很舒服\n拍照效果：很不错\n运行速度...	航海蓝	12GB+256GB	5	0.0	2024-10-23	浙江	76	neo9	12	256	[外观, 不错, 大小, 适合, \n, 外形, 外观, \n, 屏幕, 音效, 舒服, \...

data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3000 entries, 0 to 999
Data columns (total 13 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   是否会员    3000 non-null   int64  
 1   评价内容    3000 non-null   object 
 2   颜色      3000 non-null   object 
 3   存储空间    3000 non-null   object 
 4   点赞数     3000 non-null   int64  
 5   评论数     3000 non-null   float64
 6   日期      3000 non-null   object 
 7   地区      3000 non-null   object 
 8   评论长度    3000 non-null   int64  
 9   品牌      3000 non-null   object 
 10  RAM     3000 non-null   int64  
 11  ROM     3000 non-null   int64  
 12  分词      3000 non-null   object 
dtypes: float64(1), int64(5), object(7)
memory usage: 328.1+ KB

data.describe()

	是否会员	点赞数	评论数	评论长度	RAM	ROM
count	3000.000000	3000.000000	3000.000000	3000.000000	3000.000000	3000.000000
mean	0.798333	0.888333	0.508779	94.605333	13.337333	307.234000
std	0.401311	6.663082	2.133196	57.980113	1.887342	122.173347
min	0.000000	0.000000	0.000000	10.000000	12.000000	1.000000
25%	1.000000	0.000000	0.000000	63.000000	12.000000	256.000000
50%	1.000000	0.000000	0.000000	77.000000	12.000000	256.000000
75%	1.000000	0.000000	1.000000	104.000000	16.000000	256.000000
max	1.000000	249.000000	106.000000	533.000000	16.000000	512.000000

2 手机销量分析

2.1 销售地区分析

#画分布图
# 分辨率参数-dpi，画布大小参数-figsize
plt.figure(dpi=300,figsize=(24,8))
# 改变文字大小参数-fontsize
plt.xticks(fontsize=10)
plt.title("地区分布图")
sns.countplot(x=data['地区'], hue=data['品牌'])

<Axes: title={'center': '地区分布图'}, xlabel='地区', ylabel='count'>

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

#总体地区占比
data_area = data.groupby(["地区"])["地区"].count()
data_area_neo9 = data[data["品牌"]=="neo9"].groupby(["地区"])["地区"].count()
data_area_ace = data[data["品牌"]=="ace3"].groupby(["地区"])["地区"].count()
data_area_redmik70 = data[data["品牌"]=="redmik70"].groupby(["地区"])["地区"].count()

#设置总画板的分辨率，宽和高
plt.figure(dpi=100,figsize=(18,18))
plt.xticks(fontsize=15)

#画第一张图，总体的地区分布
plt.subplot(2,2,1)
plt.title("总体分布")
plt.pie(data_area,labels=data_area.index,autopct='%3.1f%%')

# 画第2张图，neo9的地区分布
plt.subplot(2,2,2);
plt.title("neo9分布")
plt.pie(data_area_neo9,labels=data_area_neo9.index,autopct='%3.1f%%')

# 画第3张图，ace3的地区分布
plt.subplot(2,2,3);
plt.title("ace3分布")
plt.pie(data_area_ace,labels=data_area_ace.index,autopct='%3.1f%%')

# 画第4张图，redmi的地区分布
plt.subplot(2,2,4);
plt.title("redmi70分布")
plt.pie(data_area_redmik70,labels=data_area_redmik70.index,autopct='%3.1f%%')
plt.show()

C:\Users\12700\AppData\Local\Temp\ipykernel_36980\36189799.py:12: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
  plt.subplot(2,2,1)

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

2.2 手机颜色分析

#总体颜色占比
data_color = data.groupby(["颜色"])["颜色"].count()
data_color_neo9 = data[data["品牌"]=="neo9"].groupby(["颜色"])["颜色"].count()
data_color_ace = data[data["品牌"]=="ace3"].groupby(["颜色"])["颜色"].count()
data_color_redmik70 = data[data["品牌"]=="redmik70"].groupby(["颜色"])["颜色"].count()

#设置总画板的分辨率，宽和高
plt.figure(dpi=100,figsize=(18,18))
plt.xticks(fontsize=15)

#画第一张图，总体的颜色分布
plt.subplot(2,2,1)
plt.title("总体分布")
plt.pie(data_color,labels=data_color.index,autopct='%3.1f%%')

# 画第2张图，neo9的颜色分布
plt.subplot(2,2,2);
plt.title("neo9分布")
plt.pie(data_color_neo9,labels=data_color_neo9.index,autopct='%3.1f%%')

# 画第3张图，ace3的颜色分布
plt.subplot(2,2,3);
plt.title("ace3分布")
plt.pie(data_color_ace,labels=data_color_ace.index,autopct='%3.1f%%')

# 画第4张图，redmi的颜色分布
plt.subplot(2,2,4);
plt.title("redmi70分布")
plt.pie(data_color_redmik70,labels=data_color_redmik70.index,autopct='%3.1f%%')
plt.show()

C:\Users\12700\AppData\Local\Temp\ipykernel_36980\1634177627.py:12: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
  plt.subplot(2,2,1)

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

2.3 存储空间分析

#总体存储空间占比
data_rom = data.groupby(["存储空间"])["存储空间"].count()
data_rom_neo9 = data[data["品牌"]=="neo9"].groupby(["存储空间"])["存储空间"].count()
data_rom_ace = data[data["品牌"]=="ace3"].groupby(["存储空间"])["存储空间"].count()
data_rom_redmik70 = data[data["品牌"]=="redmik70"].groupby(["存储空间"])["存储空间"].count()

#设置总画板的分辨率，宽和高
plt.figure(dpi=100,figsize=(18,18))
plt.xticks(fontsize=15)

#画第一张图，总体的存储空间分布
plt.subplot(2,2,1)
plt.title("总体分布")
plt.pie(data_rom,labels=data_rom.index,autopct='%3.1f%%')

# 画第2张图，neo9的存储空间分布
plt.subplot(2,2,2);
plt.title("neo9分布")
plt.pie(data_rom_neo9,labels=data_rom_neo9.index,autopct='%3.1f%%')

# 画第3张图，ace3的存储空间分布
plt.subplot(2,2,3);
plt.title("ace3分布")
plt.pie(data_rom_ace,labels=data_rom_ace.index,autopct='%3.1f%%')

# 画第4张图，redmi的存储空间分布
plt.subplot(2,2,4);
plt.title("redmi70分布")
plt.pie(data_rom_redmik70,labels=data_rom_redmik70.index,autopct='%3.1f%%')
plt.show()

C:\Users\12700\AppData\Local\Temp\ipykernel_36980\2901513871.py:12: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
  plt.subplot(2,2,1)

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

#画分布图
# 分辨率参数-dpi，画布大小参数-figsize
plt.figure(dpi=300,figsize=(24,8))
# 改变文字大小参数-fontsize
plt.xticks(fontsize=10)
plt.title("存储空间分布图")
sns.countplot(x=data['存储空间'], hue=data['品牌'])
plt.show()

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

2.4相关性分析

1、先进行两两变量之间的相关性计算

# 相关性分析
print('{:*^60}'.format('相关性分析'))
print(data[["点赞数","评论数","评论长度","RAM","ROM"]].corr().round(2).T)  # 打印原始数据相关性信息
sns.heatmap(data[["点赞数","评论数","评论长度","RAM","ROM"]].corr().round(2),cmap="Reds",annot=True)
plt.show()

***************************相关性分析****************************
       点赞数   评论数  评论长度   RAM   ROM
点赞数   1.00  0.80  0.01  0.04  0.01
评论数   0.80  1.00 -0.01  0.05  0.03
评论长度  0.01 -0.01  1.00  0.01 -0.02
RAM   0.04  0.05  0.01  1.00  0.58
ROM   0.01  0.03 -0.02  0.58  1.00

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

2、进行Kmeas聚类分析

data_onehot_part = data[["是否会员","颜色","RAM","ROM","点赞数","评论数","评论长度","地区","品牌"]]
scaler = MinMaxScaler()
# 对RAM数据进行列标准化
data_onehot_part[["RAM","ROM","点赞数","评论数","评论长度"]] = scaler.fit_transform(data_onehot_part[["RAM","ROM","点赞数","评论数","评论长度"]])
data_onehot_part = pd.get_dummies(data_onehot_part)
data_onehot_part.head()

C:\Users\12700\AppData\Local\Temp\ipykernel_36980\3032597134.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_onehot_part[["RAM","ROM","点赞数","评论数","评论长度"]] = scaler.fit_transform(data_onehot_part[["RAM","ROM","点赞数","评论数","评论长度"]])

	是否会员	RAM	ROM	点赞数	评论数	评论长度	颜色_墨羽	颜色_星曜白	颜色_星辰黑	颜色_晴雪	...	地区_西藏	地区_贵州	地区_辽宁	地区_重庆	地区_陕西	地区_青海	地区_黑龙江	品牌_ace3	品牌_neo9	品牌_redmik70
0	1	1.0	0.499022	0.421687	0.169811	0.395793	False	False	False	False	...	False	False	False	False	False	False	False	False	True	False
1	1	1.0	0.499022	0.220884	0.018868	0.128107	False	False	False	False	...	False	False	False	False	False	False	False	False	True	False
2	1	0.0	0.499022	0.152610	0.018868	0.137667	False	False	False	False	...	False	False	False	False	False	False	False	False	True	False
3	1	0.0	0.499022	0.088353	0.009434	0.158700	False	False	False	False	...	False	False	False	False	False	False	False	False	True	False
4	1	0.0	0.499022	0.020080	0.000000	0.126195	False	False	False	False	...	False	False	False	False	False	False	False	False	True	False

5 rows × 53 columns

def kmeans_process(data):
    # 通过平均轮廓系数检验得到最佳KMeans聚类模型
    score_list = list()  # 用来存储每个K下模型的平局轮廓系数
    silhouette_int = -1  # 初始化的平均轮廓系数阀值
    for n_clusters in range(2, 8):  # 遍历从2到5几个有限组
        model_kmeans = KMeans(n_clusters=n_clusters)  # 建立聚类模型对象
        labels_tmp = model_kmeans.fit_predict(data)  # 训练聚类模型
        silhouette_tmp = silhouette_score(data, labels_tmp)  # 得到每个K下的平均轮廓系数
        if silhouette_tmp > silhouette_int:  # 如果平均轮廓系数更高
            best_k = n_clusters  # 保存K将最好的K存储下来
            silhouette_int = silhouette_tmp  # 保存平均轮廓得分
            best_kmeans = model_kmeans  # 保存模型实例对象
            cluster_labels_k = labels_tmp  # 保存聚类标签
        score_list.append([n_clusters, silhouette_tmp])  # 将每次K及其得分追加到列表
    print('{:*^60}'.format('K值对应的轮廓系数:'))
    print(np.array(score_list))  # 打印输出所有K下的详细得分
    print('最优的K值是:{0} \n对应的轮廓系数是:{1}'.format(best_k, silhouette_int))

    return cluster_labels_k

cluster_labels_k = kmeans_process(data_onehot_part)

d:\anaconda\lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
d:\anaconda\lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
d:\anaconda\lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
d:\anaconda\lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
d:\anaconda\lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
d:\anaconda\lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)


*************************K值对应的轮廓系数:*************************
[[2.         0.17536367]
 [3.         0.23756583]
 [4.         0.22252055]
 [5.         0.19257643]
 [6.         0.16658535]
 [7.         0.19443532]]
最优的K值是:3 
对应的轮廓系数是:0.2375658338076598

3、进行关联关系挖掘

#选取其中可用变量并进行格式化处理
data_onehot_all = data[["是否会员","颜色","存储空间","点赞数","评论数","地区","品牌"]]
data_onehot_all["点赞数"] = [1 if(_>0) else 0 for _ in data_onehot_all["点赞数"]]
data_onehot_all["评论数"] = [1 if(_>0) else 0 for _ in data_onehot_all["评论数"]]
data_onehot_all = pd.get_dummies(data_onehot_all)
data_onehot_all.head()

C:\Users\12700\AppData\Local\Temp\ipykernel_36980\3570425174.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_onehot_all["点赞数"] = [1 if(_>0) else 0 for _ in data_onehot_all["点赞数"]]
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\3570425174.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_onehot_all["评论数"] = [1 if(_>0) else 0 for _ in data_onehot_all["评论数"]]

	是否会员	点赞数	评论数	颜色_墨羽	颜色_星曜白	颜色_星辰黑	颜色_晴雪	颜色_月海蓝	颜色_格斗黑	颜色_浅茄紫	...	地区_西藏	地区_贵州	地区_辽宁	地区_重庆	地区_陕西	地区_青海	地区_黑龙江	品牌_ace3	品牌_neo9	品牌_redmik70
0	1	1	1	False	False	False	False	False	True	False	...	False	False	False	False	False	False	False	False	True	False
1	1	1	1	False	False	False	False	False	True	False	...	False	False	False	False	False	False	False	False	True	False
2	1	1	1	False	False	False	False	False	True	False	...	False	False	False	False	False	False	False	False	True	False
3	1	1	1	False	False	False	False	False	True	False	...	False	False	False	False	False	False	False	False	True	False
4	1	1	0	False	False	False	False	False	False	False	...	False	False	False	False	False	False	False	False	True	False

5 rows × 55 columns

#开始进行关联关系挖掘
frequent_itemsets = apriori(data_onehot_all, min_support=0.2, use_colnames=True)
# 生成关联规则
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
# 筛选出lift值大于1的规则
high_lift_rules = rules[rules['lift'] > 1]
print(high_lift_rules)

                       antecedents                     consequents  \
0                (存储空间_12GB+256GB)                          (是否会员)   
1                           (是否会员)               (存储空间_12GB+256GB)   
2                        (品牌_neo9)                          (是否会员)   
3                           (是否会员)                       (品牌_neo9)   
4                    (品牌_redmik70)                          (是否会员)   
5                           (是否会员)                   (品牌_redmik70)   
6                        (品牌_neo9)                           (评论数)   
7                            (评论数)                       (品牌_neo9)   
8                (存储空间_12GB+256GB)                   (品牌_redmik70)   
9                    (品牌_redmik70)               (存储空间_12GB+256GB)   
10  (存储空间_12GB+256GB, 品牌_redmik70)                          (是否会员)   
11         (存储空间_12GB+256GB, 是否会员)                   (品牌_redmik70)   
12             (品牌_redmik70, 是否会员)               (存储空间_12GB+256GB)   
13               (存储空间_12GB+256GB)             (品牌_redmik70, 是否会员)   
14                   (品牌_redmik70)         (存储空间_12GB+256GB, 是否会员)   
15                          (是否会员)  (存储空间_12GB+256GB, 品牌_redmik70)   

    antecedent support  consequent support   support  confidence      lift  \
0             0.662333            0.798333  0.545000    0.822849  1.030708   
1             0.798333            0.662333  0.545000    0.682672  1.030708   
2             0.333333            0.798333  0.266333    0.799000  1.000835   
3             0.798333            0.333333  0.266333    0.333612  1.000835   
4             0.333333            0.798333  0.289667    0.869000  1.088518   
5             0.798333            0.333333  0.289667    0.362839  1.088518   
6             0.333333            0.430000  0.208000    0.624000  1.451163   
7             0.430000            0.333333  0.208000    0.483721  1.451163   
8             0.662333            0.333333  0.298333    0.450428  1.351283   
9             0.333333            0.662333  0.298333    0.895000  1.351283   
10            0.298333            0.798333  0.263667    0.883799  1.107055   
11            0.545000            0.333333  0.263667    0.483792  1.451376   
12            0.289667            0.662333  0.263667    0.910242  1.374295   
13            0.662333            0.289667  0.263667    0.398088  1.374295   
14            0.333333            0.545000  0.263667    0.791000  1.451376   
15            0.798333            0.298333  0.263667    0.330271  1.107055   

    leverage  conviction  zhangs_metric  
0   0.016237    1.138385       0.088232  
1   0.016237    1.064094       0.147734  
2   0.000222    1.003317       0.001252  
3   0.000222    1.000418       0.004137  
4   0.023556    1.539440       0.121979  
5   0.023556    1.046308       0.403237  
6   0.064667    1.515957       0.466346  
7   0.064667    1.291291       0.545434  
8   0.077556    1.213065       0.769880  
9   0.077556    3.215873       0.389944  
10  0.025497    1.735497       0.137818  
11  0.082000    1.291469       0.683514  
12  0.071811    3.761953       0.383418  
13  0.071811    1.180127       0.806578  
14  0.082000    2.177033       0.466498  
15  0.025497    1.047688       0.479516  


d:\anaconda\lib\site-packages\mlxtend\frequent_patterns\fpcommon.py:109: DeprecationWarning: DataFrames with non-bool types result in worse computationalperformance and their support might be discontinued in the future.Please use a DataFrame with bool type
  warnings.warn(

3 消费者评价分析

3.1 评价词频分析

#编写函数，按照品牌进行词频统计分析
def counter_words(data,group_name):
    if(group_name=="all"):
        data = data["分词"]
    else:
        data = data[data["品牌"]==group_name]["分词"]

    sentences_sum = []
    for _ in data:
        sentences_sum = sentences_sum +_
    word_freq = Counter(sentences_sum)

    word_freq[" "]=0
    word_freq["\n"]=0
    word_freq["__"]=0

    return pd.DataFrame(word_freq.most_common(20)),str(sentences_sum)


cipin_redmik70,words_redmik70 = counter_words(data,"redmik70")
cipin_ace3,words_ace3 = counter_words(data,"ace3")
cipin_neo9,words_neo9 = counter_words(data,"neo9")

# 可视化词频统计结果
plt.figure(dpi=300,figsize=(15,15))
plt.xticks(fontsize=10)

plt.subplot(3,1,1)
plt.bar(cipin_ace3[0], cipin_ace3[1])
plt.xlabel('Words')
plt.ylabel('Frequency')
plt.title('ace3词频')

plt.subplot(3,1,2)
plt.bar(cipin_neo9[0], cipin_neo9[1])
plt.xlabel('Words')
plt.ylabel('Frequency')
plt.title('neo9词频')

plt.subplot(3,1,3)
plt.bar(cipin_redmik70[0], cipin_redmik70[1])
plt.xlabel('Words')
plt.ylabel('Frequency')
plt.title('redmik70词频')
plt.show()

C:\Users\12700\AppData\Local\Temp\ipykernel_36980\426847262.py:5: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
  plt.subplot(3,1,1)

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

3.2 评价词云展示

plt.figure(dpi=200,figsize=(15,10))
plt.xticks(fontsize=10)

plt.subplot(3,1,1)
wc_neo9 = WordCloud(
    background_color='white',
    width=1500,
    height=1000,
    font_path="C:\Windows\Fonts\simhei.ttf",
)
wc_neo9.generate_from_text(words_neo9)#绘制图片
plt.title("neo9词云展示")
plt.imshow(wc_neo9)

plt.subplot(3,1,2)
wc_ace3 = WordCloud(
    background_color='white',
    width=1500,
    height=1000,
    font_path="C:\Windows\Fonts\simhei.ttf",
)
wc_ace3.generate_from_text(words_ace3)#绘制图片
plt.title("ace3词云展示")
plt.imshow(wc_ace3)

plt.subplot(3,1,3)
wc_redmik70 = WordCloud(
    background_color='white',
    width=1500,
    height=1000,
    font_path="C:\Windows\Fonts\simhei.ttf",
)
wc_redmik70.generate_from_text(words_redmik70)#绘制图片
plt.title("redmik70词云展示")
plt.imshow(wc_redmik70)

plt.show()

<>:9: DeprecationWarning: invalid escape sequence \W
<>:20: DeprecationWarning: invalid escape sequence \W
<>:31: DeprecationWarning: invalid escape sequence \W
<>:9: DeprecationWarning: invalid escape sequence \W
<>:20: DeprecationWarning: invalid escape sequence \W
<>:31: DeprecationWarning: invalid escape sequence \W
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\786007750.py:9: DeprecationWarning: invalid escape sequence \W
  font_path="C:\Windows\Fonts\simhei.ttf",
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\786007750.py:20: DeprecationWarning: invalid escape sequence \W
  font_path="C:\Windows\Fonts\simhei.ttf",
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\786007750.py:31: DeprecationWarning: invalid escape sequence \W
  font_path="C:\Windows\Fonts\simhei.ttf",
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\786007750.py:4: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
  plt.subplot(3,1,1)

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

4 消费者评价情感分析

4.1 消费者评价情感分析

data["情感得分"] = [snownlp.SnowNLP(_).sentiments for _ in data["评价内容"]]
print(data["情感得分"])

0      1.000000
1      1.000000
2      0.999982
3      1.000000
4      1.000000
         ...   
995    0.999891
996    0.997800
997    1.000000
998    0.963768
999    1.000000
Name: 情感得分, Length: 3000, dtype: float64

print(data[data["品牌"]=="ace3"]["情感得分"].describe())

print(data[data["品牌"]=="redmik70"]["情感得分"].describe())

print(data[data["品牌"]=="neo9"]["情感得分"].describe())

plt.figure(dpi=100,figsize=(8,5))
plt.xticks(fontsize=10)

plt.subplot(1,3,1)
plt.hist(data[data["品牌"]=="ace3"]["情感得分"])
plt.title("ace3情感得分")

plt.subplot(1,3,2)
plt.hist(data[data["品牌"]=="redmik70"]["情感得分"])
plt.title("redmik70情感得分")

plt.subplot(1,3,3)
plt.hist(data[data["品牌"]=="neo9"]["情感得分"])
plt.title("neo9情感得分")

count    1000.000000
mean        0.958957
std         0.159753
min         0.000039
25%         0.999313
50%         0.999996
75%         1.000000
max         1.000000
Name: 情感得分, dtype: float64
count    1000.000000
mean        0.976659
std         0.115732
min         0.005078
25%         0.999846
50%         0.999999
75%         1.000000
max         1.000000
Name: 情感得分, dtype: float64
count    1000.000000
mean        0.964201
std         0.150796
min         0.006325
25%         0.999737
50%         0.999998
75%         1.000000
max         1.000000
Name: 情感得分, dtype: float64


C:\Users\12700\AppData\Local\Temp\ipykernel_36980\694268859.py:10: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
  plt.subplot(1,3,1)





Text(0.5, 1.0, 'neo9情感得分')

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

4.2 消费者评价影响因素分析

以消费者评价的情感得分为因变量，以评价长度、品牌、存储空间、是否为会员、地区等为自变量进行拟合

data_onehot_part[["是否会员","RAM","ROM","点赞数","评论数","评论长度"]]

	是否会员	RAM	ROM	点赞数	评论数	评论长度
0	1	1.0	0.499022	0.421687	0.169811	0.395793
1	1	1.0	0.499022	0.220884	0.018868	0.128107
2	1	0.0	0.499022	0.152610	0.018868	0.137667
3	1	0.0	0.499022	0.088353	0.009434	0.158700
4	1	0.0	0.499022	0.020080	0.000000	0.126195
...	...	...	...	...	...	...
995	1	0.0	0.499022	0.000000	0.000000	0.137667
996	1	0.0	0.499022	0.000000	0.000000	0.122371
997	1	0.0	0.499022	0.000000	0.000000	0.166348
998	1	0.0	0.499022	0.000000	0.009434	0.116635
999	1	0.0	0.499022	0.000000	0.000000	0.208413

3000 rows × 6 columns

model = sm.OLS(data["情感得分"].astype(float), data_onehot_part[["是否会员","RAM","ROM","点赞数","评论数","评论长度"]].astype(float)) #生成模型
result = model.fit() #模型拟合
result.summary() #模型描述

OLS Regression Results
Dep. Variable:	情感得分	R-squared (uncentered):	0.919
Model:	OLS	Adj. R-squared (uncentered):	0.919
Method:	Least Squares	F-statistic:	5662.
Date:	Sun, 12 Jan 2025	Prob (F-statistic):	0.00
Time:	15:50:27	Log-Likelihood:	-417.45
No. Observations:	3000	AIC:	846.9
Df Residuals:	2994	BIC:	882.9
Df Model:	6
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
是否会员	0.3663	0.011	34.490	0.000	0.345	0.387
RAM	-0.1436	0.013	-11.130	0.000	-0.169	-0.118
ROM	0.8360	0.019	44.439	0.000	0.799	0.873
点赞数	-0.1597	0.318	-0.503	0.615	-0.783	0.463
评论数	0.8796	0.422	2.083	0.037	0.052	1.707
评论长度	0.9640	0.042	22.818	0.000	0.881	1.047

Omnibus:	213.267	Durbin-Watson:	1.747
Prob(Omnibus):	0.000	Jarque-Bera (JB):	802.558
Skew:	-0.266	Prob(JB):	5.33e-175
Kurtosis:	5.477	Cond. No.	112.

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.

5 手机销量预测

5.1 手机销量展示

# 将列索引改为l，u
data_sale = data.set_index("日期")
data_sale_ace3 = data_sale[data_sale["品牌"]=="ace3"].groupby(["日期"]).count()
data_sale_neo9 = data_sale[data_sale["品牌"]=="neo9"].groupby(["日期"]).count()
data_sale_redmik70 = data_sale[data_sale["品牌"]=="redmik70"].groupby(["日期"]).count()

def show_sale(data,title):
    fig,ax = plt.subplots(figsize=(14,7),dpi=200)
    ax.plot(data)
    plt.legend(fontsize=20)  
    plt.title(title,fontsize=20)
    tick_spacing = 10        #通过修改tick_spacing的值可以修改x轴的密度
    ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing ))
    plt.xticks(rotation=30)
    plt.show()
#fig,ax=plt.figure(num=3,dpi=80,figsize=(15,8))

show_sale(data_sale_ace3,"ace3抽样从2024-04-17到2024-11-1的销量时序图")
show_sale(data_sale_neo9,"neo9抽样从2024-05-02到2024-11-1的销量时序图")
show_sale(data_sale_redmik70,"redmik70抽样从2024-05-23到2024-11-1的销量时序图")

No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

5.2 手机销量预测

#对三款手机分别做训练集和测试集切分
data_sale_redmik70_all = pd.DataFrame(data_sale_redmik70.iloc[:,0])
data_sale_redmik70_all.columns = ["Count"]

data_sale_ace3_all = pd.DataFrame(data_sale_ace3.iloc[:,0])
data_sale_ace3_all.columns = ["Count"]


data_sale_neo9_all = pd.DataFrame(data_sale_neo9.iloc[:,0])
data_sale_neo9_all.columns = ["Count"]

#编写时间序列分析预测函数
def timeseries_valid(data_sou,time_begin,time_end,title):
    data = data_sou.copy()
    results = sm.tsa.statespace.SARIMAX(data.Count, order=(1,1,1),seasonal_order=(1,1,1,12)).fit()
    data['SARIMA'] = results.predict(start=time_begin,end=time_end,dynamic=True)

    fig,ax = plt.subplots(figsize=(14,7))
    ax.plot(data['Count'], label='all')
    ax.plot(data['SARIMA'], label='SARIMA')

    plt.legend(fontsize=20)  
    plt.title(title,fontsize=20)
    tick_spacing = 10        #通过修改tick_spacing的值可以修改x轴的密度
    ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing ))
    plt.xticks(rotation=30)
    plt.show()

timeseries_valid(data_sale_redmik70_all,"2024-10-12","2024-11-01","redemik70时序预测验证图")

timeseries_valid(data_sale_ace3_all,"2024-10-08","2024-11-01","ace3时序预测验证图")

timeseries_valid(data_sale_neo9_all,"2024-10-08","2024-11-01","neo9时序预测验证图")

d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

pre_index = ["2024-12-01","2025-01-01","2025-02-01","2025-03-01","2025-04-01","2025-05-01","2025-06-01","2025-07-01","2025-08-01","2025-09-01","2025-10-01","2025-11-01"]
#pre_index = [pd.to_datetime(i) for i in pre_index]

#编写时间序列分析预测函数
def timeseries_predict(data_sou,time_begin,time_end,title):
    data = data_sou.copy()
    results = sm.tsa.statespace.SARIMAX(data.Count, order=(1,1,1),seasonal_order=(1,1,1,12)).fit()#,
    data['SARIMA'] = results.predict(start=time_begin,end=time_end,dynamic=True)#,end="2024-10-31",

    # 预测未来12个月的销售数据
    forecast = results.get_forecast(steps=4)
    forecast_ci = forecast.conf_int()
    print(data.index)
    ax = data['Count'].plot(label='Observed',figsize=(18, 10))
    forecast.predicted_mean.plot(ax=ax, label='Forecast', alpha=0.7)
    #forecast_ci.index=["2024-12-01","2025-01-01","2025-02-01","2025-03-01","2025-04-01","2025-05-01","2025-06-01","2025-07-01","2025-08-01"]
    ax.fill_between(forecast_ci.index,
                    forecast_ci.iloc[:, 0],
                    forecast_ci.iloc[:, 1], color='k', alpha=0.2)
    
    ax.set_xlabel('Date')
    ax.set_ylabel('Sales')
    ax.set_title(title,fontsize=20)
    #tick_spacing = 1        #通过修改tick_spacing的值可以修改x轴的密度
    #ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing ))
    plt.xticks(rotation=30)
    plt.legend(fontsize=20)
    plt.show()
    


timeseries_predict(data_sale_redmik70_all,"2024-10-12","2024-11-01","redemik70时序预测图")

timeseries_predict(data_sale_ace3_all,"2024-10-08","2024-11-01","ace3时序预测图")

timeseries_predict(data_sale_neo9_all,"2024-10-08","2024-11-01","neo9时序预测图")

d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: ValueWarning: No supported index is available. Prediction results will be given with an integer index beginning at `start`.
  return get_prediction_index(
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: FutureWarning: No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.
  return get_prediction_index(


Index(['2024-05-23', '2024-05-28', '2024-07-05', '2024-07-17', '2024-07-29',
       '2024-08-10', '2024-08-17', '2024-08-20', '2024-08-22', '2024-08-25',
       '2024-08-28', '2024-08-29', '2024-08-31', '2024-09-01', '2024-09-02',
       '2024-09-03', '2024-09-04', '2024-09-05', '2024-09-22', '2024-09-23',
       '2024-09-24', '2024-09-25', '2024-10-02', '2024-10-03', '2024-10-04',
       '2024-10-05', '2024-10-06', '2024-10-07', '2024-10-08', '2024-10-09',
       '2024-10-10', '2024-10-11', '2024-10-12', '2024-10-13', '2024-10-14',
       '2024-10-15', '2024-10-16', '2024-10-17', '2024-10-18', '2024-10-19',
       '2024-10-20', '2024-10-21', '2024-10-22', '2024-10-23', '2024-10-24',
       '2024-10-25', '2024-10-26', '2024-10-27', '2024-10-28', '2024-10-29',
       '2024-10-30', '2024-10-31', '2024-11-01'],
      dtype='object', name='日期')

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: ValueWarning: No supported index is available. Prediction results will be given with an integer index beginning at `start`.
  return get_prediction_index(
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: FutureWarning: No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.
  return get_prediction_index(


Index(['2024-04-17', '2024-05-07', '2024-05-09', '2024-05-10', '2024-05-12',
       '2024-05-21', '2024-05-22', '2024-05-23', '2024-05-24', '2024-05-25',
       ...
       '2024-10-23', '2024-10-24', '2024-10-25', '2024-10-26', '2024-10-27',
       '2024-10-28', '2024-10-29', '2024-10-30', '2024-10-31', '2024-11-01'],
      dtype='object', name='日期', length=148)

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: ValueWarning: No supported index is available. Prediction results will be given with an integer index beginning at `start`.
  return get_prediction_index(
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: FutureWarning: No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.
  return get_prediction_index(


Index(['2024-05-02', '2024-05-03', '2024-05-04', '2024-05-05', '2024-05-06',
       '2024-05-07', '2024-05-08', '2024-05-09', '2024-05-10', '2024-05-12',
       ...
       '2024-10-23', '2024-10-24', '2024-10-25', '2024-10-26', '2024-10-27',
       '2024-10-28', '2024-10-29', '2024-10-30', '2024-10-31', '2024-11-01'],
      dtype='object', name='日期', length=176)

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

36: FutureWarning: No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.
return get_prediction_index(

Index(['2024-04-17', '2024-05-07', '2024-05-09', '2024-05-10', '2024-05-12',
       '2024-05-21', '2024-05-22', '2024-05-23', '2024-05-24', '2024-05-25',
       ...
       '2024-10-23', '2024-10-24', '2024-10-25', '2024-10-26', '2024-10-27',
       '2024-10-28', '2024-10-29', '2024-10-30', '2024-10-31', '2024-11-01'],
      dtype='object', name='日期', length=148)

[外链图片转存中…(img-4hfVPC5e-1736725507133)]

d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: ValueWarning: No supported index is available. Prediction results will be given with an integer index beginning at `start`.
  return get_prediction_index(
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: FutureWarning: No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.
  return get_prediction_index(


Index(['2024-05-02', '2024-05-03', '2024-05-04', '2024-05-05', '2024-05-06',
       '2024-05-07', '2024-05-08', '2024-05-09', '2024-05-10', '2024-05-12',
       ...
       '2024-10-23', '2024-10-24', '2024-10-25', '2024-10-26', '2024-10-27',
       '2024-10-28', '2024-10-29', '2024-10-30', '2024-10-31', '2024-11-01'],
      dtype='object', name='日期', length=176)

[外链图片转存中…(img-WD9kJKy7-1736725507133)]