京东平台手机评论分析

1 数据处理

1.1 数据准备

import pandas as pd
from random import choice #数据填充用
import numpy as np
from sklearn.preprocessing import StandardScaler,MinMaxScaler#数据标准化用
from sklearn.cluster import KMeans  #聚类分析建模用
from sklearn.manifold import TSNE #时间序列用
from sklearn.metrics import silhouette_score
from collections import Counter #词频统计用
from wordcloud import WordCloud #词云图绘制用
import snownlp #情感分析用
import statsmodels.api as sm #回归拟合用
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.ticker as ticker
plt.rcParams['font.sans-serif'] = ['SimHei']   #解决中文显示问题
plt.rcParams['axes.unicode_minus'] = False    # 解决中文显示问题
import jieba
#引入mlxtend包进行关联关系挖掘
from mlxtend.frequent_patterns import apriori, association_rules
# 利用pandas库读取数据
neo9 = pd.read_excel("IQOONeo9.xlsx")
ace3 = pd.read_excel("一加Ace3.xlsx") 
redmik70 = pd.read_excel("RedmiK70.xlsx")

1.2 数据清洗

#增加列"评论长度"用于数据清洗
neo9["评论长度"] = [len(_) for _ in neo9["评价内容"]]
ace3["评论长度"] = [len(_) for _ in ace3["评价内容"]]
redmik70["评论长度"] = [len(_) for _ in redmik70["评价内容"]]

# 对评论长度少于10的评论进行删除
print("neo9短评论数量为:"+str(len(neo9[neo9["评论长度"]<10])))

print("ace3短评论数量为:"+str(len(ace3[ace3["评论长度"]<10])))

print("redmik70短评论数量为:"+str(len(redmik70[redmik70["评论长度"]<10])))
#结果为0,则不删除
neo9.head()
neo9短评论数量为:0
ace3短评论数量为:0
redmik70短评论数量为:0
是否会员评价内容颜色存储空间点赞数评论数日期地区评论长度
0PLUS会员iQOO Neo9 设计时尚大气,金属质感边框搭配深邃的黑色背板,尽显高端品质。这款手机搭载...格斗黑16GB+256GB10518.02024-09-16河南217
1PLUS会员高颜值,高品质,一看就很上档次,非常喜欢!vivoiQOONeo9性能卓越,运行流畅,拍照效...格斗黑16GB+256GB552.02024-09-08安徽77
2PLUS会员手机太棒了,两千出头的价位搭配骁龙8gen2的处理器,玩游戏的体验很好,游戏不卡顿的同时手机...格斗黑12GB+256GB382.02024-09-15广东82
3PLUS会员高颜值,高品质,一分钱一分货,材质外观和质量一看就很上档次,非常喜欢!vivoiQOONeo...格斗黑12GB+256GB221.02024-09-07安徽93
4PLUS会员外观不错,大小也很适合我\n外形外观:好\n屏幕音效:很舒服\n拍照效果:很不错\n运行速度...航海蓝12GB+256GB50.02024-10-23浙江76

1.3 数据填充

#展示数据缺失情况

print('{:*^60}'.format('neo9的数据缺失情况如下'))
print(neo9.isnull().sum())
print('{:*^60}'.format('ace3的数据缺失情况如下'))
print(ace3.isnull().sum())
print('{:*^60}'.format('redmik70的数据缺失情况如下'))
print(redmik70.isnull().sum())
***********************neo9的数据缺失情况如下************************
是否会员    201
评价内容      0
颜色        0
存储空间      0
点赞数       0
评论数      29
日期       29
地区       29
评论长度      0
dtype: int64
***********************ace3的数据缺失情况如下************************
是否会员    273
评价内容      0
颜色        0
存储空间      0
日期        0
地区        0
点赞数       0
评论数       0
评论长度      0
dtype: int64
*********************redmik70的数据缺失情况如下**********************
是否会员    131
评价内容      0
颜色       60
存储空间     60
点赞数       0
评论数       0
日期       64
地区       64
评论长度      0
dtype: int64

可以看到neo9和redmi的颜色、存储空间、评论数、日期、地区都有为空的,ace3并没有空值
其中颜色、存储空间、日期、地区为分类属性,可通过随机选择其他非空值进行填充
评论数为连续属性,可通过选择平均值进行填充
注意:要保证同品牌数据进行填充

#编写分类属性填充函数
def fillna_fenlei(data):
    #返回该列非空值的随机值
    return choice(list(data.dropna().unique()))

neo9["日期"].fillna(fillna_fenlei(neo9["日期"]),inplace=True)

neo9["地区"].fillna(fillna_fenlei(neo9["地区"]),inplace=True)

neo9["评论数"].fillna(neo9["评论数"].mean(),inplace=True)

print('{:*^60}'.format('已完成对日期、地区、评论数填充,现neo9的数据缺失情况如下'))
print(neo9.isnull().sum())

#对连续变量进行平均值填充

redmik70["日期"].fillna(fillna_fenlei(redmik70["日期"]),inplace=True)

redmik70["地区"].fillna(fillna_fenlei(redmik70["地区"]),inplace=True)

redmik70["评论数"].fillna(redmik70["评论数"].mean(),inplace=True)

redmik70["存储空间"].fillna(fillna_fenlei(redmik70["存储空间"]),inplace=True)

redmik70["颜色"].fillna(fillna_fenlei(redmik70["颜色"]),inplace=True)

print('{:*^60}'.format('已完成对日期、地区、评论数、存储空间和颜色填充,现redmik70的数据缺失情况如下'))
print(redmik70.isnull().sum())
***************已完成对日期、地区、评论数填充,现neo9的数据缺失情况如下***************
是否会员    201
评价内容      0
颜色        0
存储空间      0
点赞数       0
评论数       0
日期        0
地区        0
评论长度      0
dtype: int64
*********已完成对日期、地区、评论数、存储空间和颜色填充,现redmik70的数据缺失情况如下*********
是否会员    131
评价内容      0
颜色        0
存储空间      0
点赞数       0
评论数       0
日期        0
地区        0
评论长度      0
dtype: int64

1.4 数据合并

# 品牌打标签
neo9["品牌"]="neo9"
ace3["品牌"]="ace3"
redmik70["品牌"]="redmik70"

#按照行进行拼接,数据一览
data= pd.concat([neo9,ace3,redmik70])

#对"是否会员列进行转换"
data["是否会员"] = [1 if(i=="PLUS会员") else 0 for i in data["是否会员"]]

data["RAM"] = [int(_.split("+")[0][:-2]) for _ in data["存储空间"]]

data["ROM"] = [int(_.split("+")[1][:-2]) for _ in data["存储空间"]]

data.head()
是否会员评价内容颜色存储空间点赞数评论数日期地区评论长度品牌RAMROM
01iQOO Neo9 设计时尚大气,金属质感边框搭配深邃的黑色背板,尽显高端品质。这款手机搭载...格斗黑16GB+256GB10518.02024-09-16河南217neo916256
11高颜值,高品质,一看就很上档次,非常喜欢!vivoiQOONeo9性能卓越,运行流畅,拍照效...格斗黑16GB+256GB552.02024-09-08安徽77neo916256
21手机太棒了,两千出头的价位搭配骁龙8gen2的处理器,玩游戏的体验很好,游戏不卡顿的同时手机...格斗黑12GB+256GB382.02024-09-15广东82neo912256
31高颜值,高品质,一分钱一分货,材质外观和质量一看就很上档次,非常喜欢!vivoiQOONeo...格斗黑12GB+256GB221.02024-09-07安徽93neo912256
41外观不错,大小也很适合我\n外形外观:好\n屏幕音效:很舒服\n拍照效果:很不错\n运行速度...航海蓝12GB+256GB50.02024-10-23浙江76neo912256

1.5 数据分词

#停用词
stopwords = [line.strip() for line in open('停用词.txt', 'r', encoding='utf-8').readlines()]
#编写停用词处理函数,输入为dataframe中的评价内容元素,输出为分词后的列表。
def stopwords_process(data):
    data_filtered = []
    for i in data:
        if(i not in stopwords):
            data_filtered.append(i)
    return data_filtered

#增加一列“分词”为对“评价内容”分词后的列表
data["分词"] = [stopwords_process(list(jieba.cut(x))) for x in list(data['评价内容'])]

data.head()
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\12700\AppData\Local\Temp\jieba.cache
Loading model cost 0.872 seconds.
Prefix dict has been built successfully.
是否会员评价内容颜色存储空间点赞数评论数日期地区评论长度品牌RAMROM分词
01iQOO Neo9 设计时尚大气,金属质感边框搭配深邃的黑色背板,尽显高端品质。这款手机搭载...格斗黑16GB+256GB10518.02024-09-16河南217neo916256[iQOO, , Neo9, , 设计, 时尚, 大气, 金属, 质感, 边框, 搭配,...
11高颜值,高品质,一看就很上档次,非常喜欢!vivoiQOONeo9性能卓越,运行流畅,拍照效...格斗黑16GB+256GB552.02024-09-08安徽77neo916256[高颜值, 高品质, 一看, 上档次, 喜欢, vivoiQOONeo9, 性能, 卓越, ...
21手机太棒了,两千出头的价位搭配骁龙8gen2的处理器,玩游戏的体验很好,游戏不卡顿的同时手机...格斗黑12GB+256GB382.02024-09-15广东82neo912256[手机, 太棒了, 两千, 出头, 价位, 搭配, 骁龙, 8gen2, 处理器, 玩游戏,...
31高颜值,高品质,一分钱一分货,材质外观和质量一看就很上档次,非常喜欢!vivoiQOONeo...格斗黑12GB+256GB221.02024-09-07安徽93neo912256[高颜值, 高品质, 一分钱, 一分货, 材质, 外观, 质量, 一看, 上档次, 喜欢, ...
41外观不错,大小也很适合我\n外形外观:好\n屏幕音效:很舒服\n拍照效果:很不错\n运行速度...航海蓝12GB+256GB50.02024-10-23浙江76neo912256[外观, 不错, 大小, 适合, \n, 外形, 外观, \n, 屏幕, 音效, 舒服, \...
data.info()
<class 'pandas.core.frame.DataFrame'>
Index: 3000 entries, 0 to 999
Data columns (total 13 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   是否会员    3000 non-null   int64  
 1   评价内容    3000 non-null   object 
 2   颜色      3000 non-null   object 
 3   存储空间    3000 non-null   object 
 4   点赞数     3000 non-null   int64  
 5   评论数     3000 non-null   float64
 6   日期      3000 non-null   object 
 7   地区      3000 non-null   object 
 8   评论长度    3000 non-null   int64  
 9   品牌      3000 non-null   object 
 10  RAM     3000 non-null   int64  
 11  ROM     3000 non-null   int64  
 12  分词      3000 non-null   object 
dtypes: float64(1), int64(5), object(7)
memory usage: 328.1+ KB
data.describe()
是否会员点赞数评论数评论长度RAMROM
count3000.0000003000.0000003000.0000003000.0000003000.0000003000.000000
mean0.7983330.8883330.50877994.60533313.337333307.234000
std0.4013116.6630822.13319657.9801131.887342122.173347
min0.0000000.0000000.00000010.00000012.0000001.000000
25%1.0000000.0000000.00000063.00000012.000000256.000000
50%1.0000000.0000000.00000077.00000012.000000256.000000
75%1.0000000.0000001.000000104.00000016.000000256.000000
max1.000000249.000000106.000000533.00000016.000000512.000000

2 手机销量分析

2.1 销售地区分析
#画分布图
# 分辨率参数-dpi,画布大小参数-figsize
plt.figure(dpi=300,figsize=(24,8))
# 改变文字大小参数-fontsize
plt.xticks(fontsize=10)
plt.title("地区分布图")
sns.countplot(x=data['地区'], hue=data['品牌'])
<Axes: title={'center': '地区分布图'}, xlabel='地区', ylabel='count'>

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

#总体地区占比
data_area = data.groupby(["地区"])["地区"].count()
data_area_neo9 = data[data["品牌"]=="neo9"].groupby(["地区"])["地区"].count()
data_area_ace = data[data["品牌"]=="ace3"].groupby(["地区"])["地区"].count()
data_area_redmik70 = data[data["品牌"]=="redmik70"].groupby(["地区"])["地区"].count()

#设置总画板的分辨率,宽和高
plt.figure(dpi=100,figsize=(18,18))
plt.xticks(fontsize=15)

#画第一张图,总体的地区分布
plt.subplot(2,2,1)
plt.title("总体分布")
plt.pie(data_area,labels=data_area.index,autopct='%3.1f%%')

# 画第2张图,neo9的地区分布
plt.subplot(2,2,2);
plt.title("neo9分布")
plt.pie(data_area_neo9,labels=data_area_neo9.index,autopct='%3.1f%%')

# 画第3张图,ace3的地区分布
plt.subplot(2,2,3);
plt.title("ace3分布")
plt.pie(data_area_ace,labels=data_area_ace.index,autopct='%3.1f%%')

# 画第4张图,redmi的地区分布
plt.subplot(2,2,4);
plt.title("redmi70分布")
plt.pie(data_area_redmik70,labels=data_area_redmik70.index,autopct='%3.1f%%')
plt.show()

C:\Users\12700\AppData\Local\Temp\ipykernel_36980\36189799.py:12: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
  plt.subplot(2,2,1)

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

2.2 手机颜色分析

#总体颜色占比
data_color = data.groupby(["颜色"])["颜色"].count()
data_color_neo9 = data[data["品牌"]=="neo9"].groupby(["颜色"])["颜色"].count()
data_color_ace = data[data["品牌"]=="ace3"].groupby(["颜色"])["颜色"].count()
data_color_redmik70 = data[data["品牌"]=="redmik70"].groupby(["颜色"])["颜色"].count()

#设置总画板的分辨率,宽和高
plt.figure(dpi=100,figsize=(18,18))
plt.xticks(fontsize=15)

#画第一张图,总体的颜色分布
plt.subplot(2,2,1)
plt.title("总体分布")
plt.pie(data_color,labels=data_color.index,autopct='%3.1f%%')

# 画第2张图,neo9的颜色分布
plt.subplot(2,2,2);
plt.title("neo9分布")
plt.pie(data_color_neo9,labels=data_color_neo9.index,autopct='%3.1f%%')

# 画第3张图,ace3的颜色分布
plt.subplot(2,2,3);
plt.title("ace3分布")
plt.pie(data_color_ace,labels=data_color_ace.index,autopct='%3.1f%%')

# 画第4张图,redmi的颜色分布
plt.subplot(2,2,4);
plt.title("redmi70分布")
plt.pie(data_color_redmik70,labels=data_color_redmik70.index,autopct='%3.1f%%')
plt.show()
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\1634177627.py:12: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
  plt.subplot(2,2,1)

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

2.3 存储空间分析

#总体存储空间占比
data_rom = data.groupby(["存储空间"])["存储空间"].count()
data_rom_neo9 = data[data["品牌"]=="neo9"].groupby(["存储空间"])["存储空间"].count()
data_rom_ace = data[data["品牌"]=="ace3"].groupby(["存储空间"])["存储空间"].count()
data_rom_redmik70 = data[data["品牌"]=="redmik70"].groupby(["存储空间"])["存储空间"].count()

#设置总画板的分辨率,宽和高
plt.figure(dpi=100,figsize=(18,18))
plt.xticks(fontsize=15)

#画第一张图,总体的存储空间分布
plt.subplot(2,2,1)
plt.title("总体分布")
plt.pie(data_rom,labels=data_rom.index,autopct='%3.1f%%')

# 画第2张图,neo9的存储空间分布
plt.subplot(2,2,2);
plt.title("neo9分布")
plt.pie(data_rom_neo9,labels=data_rom_neo9.index,autopct='%3.1f%%')

# 画第3张图,ace3的存储空间分布
plt.subplot(2,2,3);
plt.title("ace3分布")
plt.pie(data_rom_ace,labels=data_rom_ace.index,autopct='%3.1f%%')

# 画第4张图,redmi的存储空间分布
plt.subplot(2,2,4);
plt.title("redmi70分布")
plt.pie(data_rom_redmik70,labels=data_rom_redmik70.index,autopct='%3.1f%%')
plt.show()
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\2901513871.py:12: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
  plt.subplot(2,2,1)

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

#画分布图
# 分辨率参数-dpi,画布大小参数-figsize
plt.figure(dpi=300,figsize=(24,8))
# 改变文字大小参数-fontsize
plt.xticks(fontsize=10)
plt.title("存储空间分布图")
sns.countplot(x=data['存储空间'], hue=data['品牌'])
plt.show()

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

2.4相关性分析

1、先进行两两变量之间的相关性计算

# 相关性分析
print('{:*^60}'.format('相关性分析'))
print(data[["点赞数","评论数","评论长度","RAM","ROM"]].corr().round(2).T)  # 打印原始数据相关性信息
sns.heatmap(data[["点赞数","评论数","评论长度","RAM","ROM"]].corr().round(2),cmap="Reds",annot=True)
plt.show()
***************************相关性分析****************************
       点赞数   评论数  评论长度   RAM   ROM
点赞数   1.00  0.80  0.01  0.04  0.01
评论数   0.80  1.00 -0.01  0.05  0.03
评论长度  0.01 -0.01  1.00  0.01 -0.02
RAM   0.04  0.05  0.01  1.00  0.58
ROM   0.01  0.03 -0.02  0.58  1.00

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

2、进行Kmeas聚类分析

data_onehot_part = data[["是否会员","颜色","RAM","ROM","点赞数","评论数","评论长度","地区","品牌"]]
scaler = MinMaxScaler()
# 对RAM数据进行列标准化
data_onehot_part[["RAM","ROM","点赞数","评论数","评论长度"]] = scaler.fit_transform(data_onehot_part[["RAM","ROM","点赞数","评论数","评论长度"]])
data_onehot_part = pd.get_dummies(data_onehot_part)
data_onehot_part.head()
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\3032597134.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_onehot_part[["RAM","ROM","点赞数","评论数","评论长度"]] = scaler.fit_transform(data_onehot_part[["RAM","ROM","点赞数","评论数","评论长度"]])
是否会员RAMROM点赞数评论数评论长度颜色_墨羽颜色_星曜白颜色_星辰黑颜色_晴雪...地区_西藏地区_贵州地区_辽宁地区_重庆地区_陕西地区_青海地区_黑龙江品牌_ace3品牌_neo9品牌_redmik70
011.00.4990220.4216870.1698110.395793FalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseTrueFalse
111.00.4990220.2208840.0188680.128107FalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseTrueFalse
210.00.4990220.1526100.0188680.137667FalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseTrueFalse
310.00.4990220.0883530.0094340.158700FalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseTrueFalse
410.00.4990220.0200800.0000000.126195FalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseTrueFalse

5 rows × 53 columns

def kmeans_process(data):
    # 通过平均轮廓系数检验得到最佳KMeans聚类模型
    score_list = list()  # 用来存储每个K下模型的平局轮廓系数
    silhouette_int = -1  # 初始化的平均轮廓系数阀值
    for n_clusters in range(2, 8):  # 遍历从2到5几个有限组
        model_kmeans = KMeans(n_clusters=n_clusters)  # 建立聚类模型对象
        labels_tmp = model_kmeans.fit_predict(data)  # 训练聚类模型
        silhouette_tmp = silhouette_score(data, labels_tmp)  # 得到每个K下的平均轮廓系数
        if silhouette_tmp > silhouette_int:  # 如果平均轮廓系数更高
            best_k = n_clusters  # 保存K将最好的K存储下来
            silhouette_int = silhouette_tmp  # 保存平均轮廓得分
            best_kmeans = model_kmeans  # 保存模型实例对象
            cluster_labels_k = labels_tmp  # 保存聚类标签
        score_list.append([n_clusters, silhouette_tmp])  # 将每次K及其得分追加到列表
    print('{:*^60}'.format('K值对应的轮廓系数:'))
    print(np.array(score_list))  # 打印输出所有K下的详细得分
    print('最优的K值是:{0} \n对应的轮廓系数是:{1}'.format(best_k, silhouette_int))

    return cluster_labels_k

cluster_labels_k = kmeans_process(data_onehot_part)
d:\anaconda\lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
d:\anaconda\lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
d:\anaconda\lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
d:\anaconda\lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
d:\anaconda\lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
d:\anaconda\lib\site-packages\sklearn\cluster\_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)


*************************K值对应的轮廓系数:*************************
[[2.         0.17536367]
 [3.         0.23756583]
 [4.         0.22252055]
 [5.         0.19257643]
 [6.         0.16658535]
 [7.         0.19443532]]
最优的K值是:3 
对应的轮廓系数是:0.2375658338076598

3、进行关联关系挖掘

#选取其中可用变量并进行格式化处理
data_onehot_all = data[["是否会员","颜色","存储空间","点赞数","评论数","地区","品牌"]]
data_onehot_all["点赞数"] = [1 if(_>0) else 0 for _ in data_onehot_all["点赞数"]]
data_onehot_all["评论数"] = [1 if(_>0) else 0 for _ in data_onehot_all["评论数"]]
data_onehot_all = pd.get_dummies(data_onehot_all)
data_onehot_all.head()
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\3570425174.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_onehot_all["点赞数"] = [1 if(_>0) else 0 for _ in data_onehot_all["点赞数"]]
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\3570425174.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_onehot_all["评论数"] = [1 if(_>0) else 0 for _ in data_onehot_all["评论数"]]
是否会员点赞数评论数颜色_墨羽颜色_星曜白颜色_星辰黑颜色_晴雪颜色_月海蓝颜色_格斗黑颜色_浅茄紫...地区_西藏地区_贵州地区_辽宁地区_重庆地区_陕西地区_青海地区_黑龙江品牌_ace3品牌_neo9品牌_redmik70
0111FalseFalseFalseFalseFalseTrueFalse...FalseFalseFalseFalseFalseFalseFalseFalseTrueFalse
1111FalseFalseFalseFalseFalseTrueFalse...FalseFalseFalseFalseFalseFalseFalseFalseTrueFalse
2111FalseFalseFalseFalseFalseTrueFalse...FalseFalseFalseFalseFalseFalseFalseFalseTrueFalse
3111FalseFalseFalseFalseFalseTrueFalse...FalseFalseFalseFalseFalseFalseFalseFalseTrueFalse
4110FalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseTrueFalse

5 rows × 55 columns

#开始进行关联关系挖掘
frequent_itemsets = apriori(data_onehot_all, min_support=0.2, use_colnames=True)
# 生成关联规则
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
# 筛选出lift值大于1的规则
high_lift_rules = rules[rules['lift'] > 1]
print(high_lift_rules)
                       antecedents                     consequents  \
0                (存储空间_12GB+256GB)                          (是否会员)   
1                           (是否会员)               (存储空间_12GB+256GB)   
2                        (品牌_neo9)                          (是否会员)   
3                           (是否会员)                       (品牌_neo9)   
4                    (品牌_redmik70)                          (是否会员)   
5                           (是否会员)                   (品牌_redmik70)   
6                        (品牌_neo9)                           (评论数)   
7                            (评论数)                       (品牌_neo9)   
8                (存储空间_12GB+256GB)                   (品牌_redmik70)   
9                    (品牌_redmik70)               (存储空间_12GB+256GB)   
10  (存储空间_12GB+256GB, 品牌_redmik70)                          (是否会员)   
11         (存储空间_12GB+256GB, 是否会员)                   (品牌_redmik70)   
12             (品牌_redmik70, 是否会员)               (存储空间_12GB+256GB)   
13               (存储空间_12GB+256GB)             (品牌_redmik70, 是否会员)   
14                   (品牌_redmik70)         (存储空间_12GB+256GB, 是否会员)   
15                          (是否会员)  (存储空间_12GB+256GB, 品牌_redmik70)   

    antecedent support  consequent support   support  confidence      lift  \
0             0.662333            0.798333  0.545000    0.822849  1.030708   
1             0.798333            0.662333  0.545000    0.682672  1.030708   
2             0.333333            0.798333  0.266333    0.799000  1.000835   
3             0.798333            0.333333  0.266333    0.333612  1.000835   
4             0.333333            0.798333  0.289667    0.869000  1.088518   
5             0.798333            0.333333  0.289667    0.362839  1.088518   
6             0.333333            0.430000  0.208000    0.624000  1.451163   
7             0.430000            0.333333  0.208000    0.483721  1.451163   
8             0.662333            0.333333  0.298333    0.450428  1.351283   
9             0.333333            0.662333  0.298333    0.895000  1.351283   
10            0.298333            0.798333  0.263667    0.883799  1.107055   
11            0.545000            0.333333  0.263667    0.483792  1.451376   
12            0.289667            0.662333  0.263667    0.910242  1.374295   
13            0.662333            0.289667  0.263667    0.398088  1.374295   
14            0.333333            0.545000  0.263667    0.791000  1.451376   
15            0.798333            0.298333  0.263667    0.330271  1.107055   

    leverage  conviction  zhangs_metric  
0   0.016237    1.138385       0.088232  
1   0.016237    1.064094       0.147734  
2   0.000222    1.003317       0.001252  
3   0.000222    1.000418       0.004137  
4   0.023556    1.539440       0.121979  
5   0.023556    1.046308       0.403237  
6   0.064667    1.515957       0.466346  
7   0.064667    1.291291       0.545434  
8   0.077556    1.213065       0.769880  
9   0.077556    3.215873       0.389944  
10  0.025497    1.735497       0.137818  
11  0.082000    1.291469       0.683514  
12  0.071811    3.761953       0.383418  
13  0.071811    1.180127       0.806578  
14  0.082000    2.177033       0.466498  
15  0.025497    1.047688       0.479516  


d:\anaconda\lib\site-packages\mlxtend\frequent_patterns\fpcommon.py:109: DeprecationWarning: DataFrames with non-bool types result in worse computationalperformance and their support might be discontinued in the future.Please use a DataFrame with bool type
  warnings.warn(

3 消费者评价分析

3.1 评价词频分析

#编写函数,按照品牌进行词频统计分析
def counter_words(data,group_name):
    if(group_name=="all"):
        data = data["分词"]
    else:
        data = data[data["品牌"]==group_name]["分词"]

    sentences_sum = []
    for _ in data:
        sentences_sum = sentences_sum +_
    word_freq = Counter(sentences_sum)

    word_freq[" "]=0
    word_freq["\n"]=0
    word_freq["__"]=0

    return pd.DataFrame(word_freq.most_common(20)),str(sentences_sum)


cipin_redmik70,words_redmik70 = counter_words(data,"redmik70")
cipin_ace3,words_ace3 = counter_words(data,"ace3")
cipin_neo9,words_neo9 = counter_words(data,"neo9")

# 可视化词频统计结果
plt.figure(dpi=300,figsize=(15,15))
plt.xticks(fontsize=10)

plt.subplot(3,1,1)
plt.bar(cipin_ace3[0], cipin_ace3[1])
plt.xlabel('Words')
plt.ylabel('Frequency')
plt.title('ace3词频')

plt.subplot(3,1,2)
plt.bar(cipin_neo9[0], cipin_neo9[1])
plt.xlabel('Words')
plt.ylabel('Frequency')
plt.title('neo9词频')

plt.subplot(3,1,3)
plt.bar(cipin_redmik70[0], cipin_redmik70[1])
plt.xlabel('Words')
plt.ylabel('Frequency')
plt.title('redmik70词频')
plt.show()
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\426847262.py:5: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
  plt.subplot(3,1,1)

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

3.2 评价词云展示

plt.figure(dpi=200,figsize=(15,10))
plt.xticks(fontsize=10)

plt.subplot(3,1,1)
wc_neo9 = WordCloud(
    background_color='white',
    width=1500,
    height=1000,
    font_path="C:\Windows\Fonts\simhei.ttf",
)
wc_neo9.generate_from_text(words_neo9)#绘制图片
plt.title("neo9词云展示")
plt.imshow(wc_neo9)

plt.subplot(3,1,2)
wc_ace3 = WordCloud(
    background_color='white',
    width=1500,
    height=1000,
    font_path="C:\Windows\Fonts\simhei.ttf",
)
wc_ace3.generate_from_text(words_ace3)#绘制图片
plt.title("ace3词云展示")
plt.imshow(wc_ace3)

plt.subplot(3,1,3)
wc_redmik70 = WordCloud(
    background_color='white',
    width=1500,
    height=1000,
    font_path="C:\Windows\Fonts\simhei.ttf",
)
wc_redmik70.generate_from_text(words_redmik70)#绘制图片
plt.title("redmik70词云展示")
plt.imshow(wc_redmik70)

plt.show()
<>:9: DeprecationWarning: invalid escape sequence \W
<>:20: DeprecationWarning: invalid escape sequence \W
<>:31: DeprecationWarning: invalid escape sequence \W
<>:9: DeprecationWarning: invalid escape sequence \W
<>:20: DeprecationWarning: invalid escape sequence \W
<>:31: DeprecationWarning: invalid escape sequence \W
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\786007750.py:9: DeprecationWarning: invalid escape sequence \W
  font_path="C:\Windows\Fonts\simhei.ttf",
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\786007750.py:20: DeprecationWarning: invalid escape sequence \W
  font_path="C:\Windows\Fonts\simhei.ttf",
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\786007750.py:31: DeprecationWarning: invalid escape sequence \W
  font_path="C:\Windows\Fonts\simhei.ttf",
C:\Users\12700\AppData\Local\Temp\ipykernel_36980\786007750.py:4: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
  plt.subplot(3,1,1)

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

4 消费者评价情感分析

4.1 消费者评价情感分析

data["情感得分"] = [snownlp.SnowNLP(_).sentiments for _ in data["评价内容"]]
print(data["情感得分"])
0      1.000000
1      1.000000
2      0.999982
3      1.000000
4      1.000000
         ...   
995    0.999891
996    0.997800
997    1.000000
998    0.963768
999    1.000000
Name: 情感得分, Length: 3000, dtype: float64
print(data[data["品牌"]=="ace3"]["情感得分"].describe())

print(data[data["品牌"]=="redmik70"]["情感得分"].describe())

print(data[data["品牌"]=="neo9"]["情感得分"].describe())

plt.figure(dpi=100,figsize=(8,5))
plt.xticks(fontsize=10)

plt.subplot(1,3,1)
plt.hist(data[data["品牌"]=="ace3"]["情感得分"])
plt.title("ace3情感得分")

plt.subplot(1,3,2)
plt.hist(data[data["品牌"]=="redmik70"]["情感得分"])
plt.title("redmik70情感得分")

plt.subplot(1,3,3)
plt.hist(data[data["品牌"]=="neo9"]["情感得分"])
plt.title("neo9情感得分")
count    1000.000000
mean        0.958957
std         0.159753
min         0.000039
25%         0.999313
50%         0.999996
75%         1.000000
max         1.000000
Name: 情感得分, dtype: float64
count    1000.000000
mean        0.976659
std         0.115732
min         0.005078
25%         0.999846
50%         0.999999
75%         1.000000
max         1.000000
Name: 情感得分, dtype: float64
count    1000.000000
mean        0.964201
std         0.150796
min         0.006325
25%         0.999737
50%         0.999998
75%         1.000000
max         1.000000
Name: 情感得分, dtype: float64


C:\Users\12700\AppData\Local\Temp\ipykernel_36980\694268859.py:10: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
  plt.subplot(1,3,1)





Text(0.5, 1.0, 'neo9情感得分')

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

4.2 消费者评价影响因素分析

以消费者评价的情感得分为因变量,以评价长度、品牌、存储空间、是否为会员、地区等为自变量进行拟合

data_onehot_part[["是否会员","RAM","ROM","点赞数","评论数","评论长度"]]
是否会员RAMROM点赞数评论数评论长度
011.00.4990220.4216870.1698110.395793
111.00.4990220.2208840.0188680.128107
210.00.4990220.1526100.0188680.137667
310.00.4990220.0883530.0094340.158700
410.00.4990220.0200800.0000000.126195
.....................
99510.00.4990220.0000000.0000000.137667
99610.00.4990220.0000000.0000000.122371
99710.00.4990220.0000000.0000000.166348
99810.00.4990220.0000000.0094340.116635
99910.00.4990220.0000000.0000000.208413

3000 rows × 6 columns

model = sm.OLS(data["情感得分"].astype(float), data_onehot_part[["是否会员","RAM","ROM","点赞数","评论数","评论长度"]].astype(float)) #生成模型
result = model.fit() #模型拟合
result.summary() #模型描述
OLS Regression Results
Dep. Variable:情感得分 R-squared (uncentered): 0.919
Model:OLS Adj. R-squared (uncentered): 0.919
Method:Least Squares F-statistic: 5662.
Date:Sun, 12 Jan 2025 Prob (F-statistic): 0.00
Time:15:50:27 Log-Likelihood: -417.45
No. Observations: 3000 AIC: 846.9
Df Residuals: 2994 BIC: 882.9
Df Model: 6
Covariance Type:nonrobust
coefstd errtP>|t|[0.0250.975]
是否会员 0.3663 0.011 34.490 0.000 0.345 0.387
RAM -0.1436 0.013 -11.130 0.000 -0.169 -0.118
ROM 0.8360 0.019 44.439 0.000 0.799 0.873
点赞数 -0.1597 0.318 -0.503 0.615 -0.783 0.463
评论数 0.8796 0.422 2.083 0.037 0.052 1.707
评论长度 0.9640 0.042 22.818 0.000 0.881 1.047
Omnibus:213.267 Durbin-Watson: 1.747
Prob(Omnibus): 0.000 Jarque-Bera (JB): 802.558
Skew:-0.266 Prob(JB): 5.33e-175
Kurtosis: 5.477 Cond. No. 112.


Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.

5 手机销量预测

5.1 手机销量展示

# 将列索引改为l,u
data_sale = data.set_index("日期")
data_sale_ace3 = data_sale[data_sale["品牌"]=="ace3"].groupby(["日期"]).count()
data_sale_neo9 = data_sale[data_sale["品牌"]=="neo9"].groupby(["日期"]).count()
data_sale_redmik70 = data_sale[data_sale["品牌"]=="redmik70"].groupby(["日期"]).count()
def show_sale(data,title):
    fig,ax = plt.subplots(figsize=(14,7),dpi=200)
    ax.plot(data)
    plt.legend(fontsize=20)  
    plt.title(title,fontsize=20)
    tick_spacing = 10        #通过修改tick_spacing的值可以修改x轴的密度
    ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing ))
    plt.xticks(rotation=30)
    plt.show()
#fig,ax=plt.figure(num=3,dpi=80,figsize=(15,8))

show_sale(data_sale_ace3,"ace3抽样从2024-04-17到2024-11-1的销量时序图")
show_sale(data_sale_neo9,"neo9抽样从2024-05-02到2024-11-1的销量时序图")
show_sale(data_sale_redmik70,"redmik70抽样从2024-05-23到2024-11-1的销量时序图")


No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

5.2 手机销量预测

#对三款手机分别做训练集和测试集切分
data_sale_redmik70_all = pd.DataFrame(data_sale_redmik70.iloc[:,0])
data_sale_redmik70_all.columns = ["Count"]

data_sale_ace3_all = pd.DataFrame(data_sale_ace3.iloc[:,0])
data_sale_ace3_all.columns = ["Count"]


data_sale_neo9_all = pd.DataFrame(data_sale_neo9.iloc[:,0])
data_sale_neo9_all.columns = ["Count"]
#编写时间序列分析预测函数
def timeseries_valid(data_sou,time_begin,time_end,title):
    data = data_sou.copy()
    results = sm.tsa.statespace.SARIMAX(data.Count, order=(1,1,1),seasonal_order=(1,1,1,12)).fit()
    data['SARIMA'] = results.predict(start=time_begin,end=time_end,dynamic=True)

    fig,ax = plt.subplots(figsize=(14,7))
    ax.plot(data['Count'], label='all')
    ax.plot(data['SARIMA'], label='SARIMA')

    plt.legend(fontsize=20)  
    plt.title(title,fontsize=20)
    tick_spacing = 10        #通过修改tick_spacing的值可以修改x轴的密度
    ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing ))
    plt.xticks(rotation=30)
    plt.show()

timeseries_valid(data_sale_redmik70_all,"2024-10-12","2024-11-01","redemik70时序预测验证图")

timeseries_valid(data_sale_ace3_all,"2024-10-08","2024-11-01","ace3时序预测验证图")

timeseries_valid(data_sale_neo9_all,"2024-10-08","2024-11-01","neo9时序预测验证图")
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

pre_index = ["2024-12-01","2025-01-01","2025-02-01","2025-03-01","2025-04-01","2025-05-01","2025-06-01","2025-07-01","2025-08-01","2025-09-01","2025-10-01","2025-11-01"]
#pre_index = [pd.to_datetime(i) for i in pre_index]
#编写时间序列分析预测函数
def timeseries_predict(data_sou,time_begin,time_end,title):
    data = data_sou.copy()
    results = sm.tsa.statespace.SARIMAX(data.Count, order=(1,1,1),seasonal_order=(1,1,1,12)).fit()#,
    data['SARIMA'] = results.predict(start=time_begin,end=time_end,dynamic=True)#,end="2024-10-31",

    # 预测未来12个月的销售数据
    forecast = results.get_forecast(steps=4)
    forecast_ci = forecast.conf_int()
    print(data.index)
    ax = data['Count'].plot(label='Observed',figsize=(18, 10))
    forecast.predicted_mean.plot(ax=ax, label='Forecast', alpha=0.7)
    #forecast_ci.index=["2024-12-01","2025-01-01","2025-02-01","2025-03-01","2025-04-01","2025-05-01","2025-06-01","2025-07-01","2025-08-01"]
    ax.fill_between(forecast_ci.index,
                    forecast_ci.iloc[:, 0],
                    forecast_ci.iloc[:, 1], color='k', alpha=0.2)
    
    ax.set_xlabel('Date')
    ax.set_ylabel('Sales')
    ax.set_title(title,fontsize=20)
    #tick_spacing = 1        #通过修改tick_spacing的值可以修改x轴的密度
    #ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing ))
    plt.xticks(rotation=30)
    plt.legend(fontsize=20)
    plt.show()
    


timeseries_predict(data_sale_redmik70_all,"2024-10-12","2024-11-01","redemik70时序预测图")

timeseries_predict(data_sale_ace3_all,"2024-10-08","2024-11-01","ace3时序预测图")

timeseries_predict(data_sale_neo9_all,"2024-10-08","2024-11-01","neo9时序预测图")
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: ValueWarning: No supported index is available. Prediction results will be given with an integer index beginning at `start`.
  return get_prediction_index(
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: FutureWarning: No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.
  return get_prediction_index(


Index(['2024-05-23', '2024-05-28', '2024-07-05', '2024-07-17', '2024-07-29',
       '2024-08-10', '2024-08-17', '2024-08-20', '2024-08-22', '2024-08-25',
       '2024-08-28', '2024-08-29', '2024-08-31', '2024-09-01', '2024-09-02',
       '2024-09-03', '2024-09-04', '2024-09-05', '2024-09-22', '2024-09-23',
       '2024-09-24', '2024-09-25', '2024-10-02', '2024-10-03', '2024-10-04',
       '2024-10-05', '2024-10-06', '2024-10-07', '2024-10-08', '2024-10-09',
       '2024-10-10', '2024-10-11', '2024-10-12', '2024-10-13', '2024-10-14',
       '2024-10-15', '2024-10-16', '2024-10-17', '2024-10-18', '2024-10-19',
       '2024-10-20', '2024-10-21', '2024-10-22', '2024-10-23', '2024-10-24',
       '2024-10-25', '2024-10-26', '2024-10-27', '2024-10-28', '2024-10-29',
       '2024-10-30', '2024-10-31', '2024-11-01'],
      dtype='object', name='日期')

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: ValueWarning: No supported index is available. Prediction results will be given with an integer index beginning at `start`.
  return get_prediction_index(
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: FutureWarning: No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.
  return get_prediction_index(


Index(['2024-04-17', '2024-05-07', '2024-05-09', '2024-05-10', '2024-05-12',
       '2024-05-21', '2024-05-22', '2024-05-23', '2024-05-24', '2024-05-25',
       ...
       '2024-10-23', '2024-10-24', '2024-10-25', '2024-10-26', '2024-10-27',
       '2024-10-28', '2024-10-29', '2024-10-30', '2024-10-31', '2024-11-01'],
      dtype='object', name='日期', length=148)

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: ValueWarning: No supported index is available. Prediction results will be given with an integer index beginning at `start`.
  return get_prediction_index(
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: FutureWarning: No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.
  return get_prediction_index(


Index(['2024-05-02', '2024-05-03', '2024-05-04', '2024-05-05', '2024-05-06',
       '2024-05-07', '2024-05-08', '2024-05-09', '2024-05-10', '2024-05-12',
       ...
       '2024-10-23', '2024-10-24', '2024-10-25', '2024-10-26', '2024-10-27',
       '2024-10-28', '2024-10-29', '2024-10-30', '2024-10-31', '2024-11-01'],
      dtype='object', name='日期', length=176)

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

36: FutureWarning: No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.
return get_prediction_index(

Index(['2024-04-17', '2024-05-07', '2024-05-09', '2024-05-10', '2024-05-12',
       '2024-05-21', '2024-05-22', '2024-05-23', '2024-05-24', '2024-05-25',
       ...
       '2024-10-23', '2024-10-24', '2024-10-25', '2024-10-26', '2024-10-27',
       '2024-10-28', '2024-10-29', '2024-10-30', '2024-10-31', '2024-11-01'],
      dtype='object', name='日期', length=148)

[外链图片转存中…(img-4hfVPC5e-1736725507133)]

d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: ValueWarning: No supported index is available. Prediction results will be given with an integer index beginning at `start`.
  return get_prediction_index(
d:\anaconda\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: FutureWarning: No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.
  return get_prediction_index(


Index(['2024-05-02', '2024-05-03', '2024-05-04', '2024-05-05', '2024-05-06',
       '2024-05-07', '2024-05-08', '2024-05-09', '2024-05-10', '2024-05-12',
       ...
       '2024-10-23', '2024-10-24', '2024-10-25', '2024-10-26', '2024-10-27',
       '2024-10-28', '2024-10-29', '2024-10-30', '2024-10-31', '2024-11-01'],
      dtype='object', name='日期', length=176)

[外链图片转存中…(img-WD9kJKy7-1736725507133)]

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值