分析微信聊天记录(2)——分析单人的微信聊天记录

艾与代码

已于 2025-03-26 11:04:17 修改

阅读量8k

点赞数 9

分类专栏：其他文章标签：微信

于 2021-02-13 18:05:16 首次发布

本文链接：https://blog.csdn.net/kid_14_12/article/details/113802192

版权

其他专栏收录该内容

4 篇文章

订阅专栏

分析微信聊天记录(2)——分析微信聊天记录

文章目录

分析微信聊天记录(2)——分析微信聊天记录

上一篇说到获取到微信的聊天记录，这一篇说说对单人微信聊天记录的分析。

筛选指定聊天记录

假定我们已经获取到一个名为message.csv的聊天记录文件，我们使用python来筛选出指定人的微信聊天记录，存储到chat.csv文件中：

import pandas as pd
chat = pd.read_csv('../message.csv', sep=',')
myG = 'wxid_xxxxxxxxx' # 指定人的微信id
chat = chat[chat['talker'] == myG]
chat.to_csv('../chat.csv', sep=',')

对于上述代码中的微信id，可以根据聊天内容和对应的talker来获取，相信大家都会获取。在这里插入图片描述

正式进行分析

首先导入几个必须的包

import pandas as pd
import time
import seaborn as sns
import numpy as np
from matplotlib.font_manager import *#如果想在图上显示中文，需导入这个包
import matplotlib.pyplot as plt
from tqdm import tqdm
import re, string
np.set_printoptions(linewidth=800, suppress=False)

再导入chat.csv文件，并提取出几个有用的列信息。msg['content']是微信中的主要的聊天信息。msg['type']表示该聊天信息属于哪种类型，语音、文字、图片、表情包还是分享链接等等。msg['createTime']表示该聊天信息发送的时间，单位是毫秒。msg['isSend']表示该聊天信息是否是你发送的，如果是，则值为1，否则为0。还有其他更多的有用信息，后续再补充。

chat = pd.read_csv('chat.csv', sep=',')
myG = 'wxid_xxxxxxx'
lens = len(chat)
# lens = 100
msg_content = [None,]*lens
msg_type = [None,]*lens
msg_isSend = [None,]*lens
msg_time = [None,]*lens
for i in tqdm(range(lens)):
    msg = chat[i:i + 1]
    msg_content[i] = msg['content'].values[0]
    msg_type[i] = msg['type'].values[0]
    msg_time[i] = msg['createTime'].values[0]
    msg_isSend[i] = int(msg['isSend'].values[0]) if msg['isSend'].values[0] in [0., 1.] else -1

在此先总结一下msg['type']的几种类型，目前只总结出这么多类型，后续继续补充：

编号	类型
1	文本消息，包含小表情
3	图片消息，相机中的照片和配置有不同，从相册中发送的消息中会保留一个 MMAsset，如同 PAAset
34	语音消息
42	名片消息，公众号名片和普通名片
47	大表情
48	位置消息
49	分享消息
10000	系统消息
419430449	微信转账
-1879048186	位置共享

各自发送信息条数

首先简单统计一下收发双方的信息数量。同时统计各自发出的文字总数。

msg_data = np.array([[0, 0], [0, 0]])
for i in tqdm(range(lens)):
    if msg_isSend[i] not in [0, 1]:
        pass
    msg_data[msg_isSend[i]][0] += 1
    if msg_type[i] == 1:
        msg_con = msg_content[i]
        msg_data[msg_isSend[i]][1] += len(msg_con)

print(msg_data)
print(msg_data.sum(0))
labels = ['接收到', '发送出']
sizes = msg_data[:, 0]
myfont = FontProperties(fname=r'../kaiti.TTF', size=22)  # 标题字体样式
p = plt.pie(sizes, labels=labels, autopct='%1.1f%%', colors = ['magenta', 'lightskyblue'],
        shadow=True, startangle=90)
for front in p[1]:
    front.set_fontproperties(myfont)
plt.axis('equal') 
plt.show()

效果展示如下：
在这里插入图片描述

统计聊天时间频率

首先定义几个时间转换函数

# tm_year=2016, tm_mon=11, tm_mday=27, tm_hour=10, tm_min=26, tm_sec=5, tm_wday=6, tm_yday=332, tm_isdst=0
def to_hour(t):
    struct_time = time.localtime(t//1000)#将时间戳转换为struct_time元组
#     hour = round((struct_time[3] + struct_time[4] / 60), 2)
    hour = struct_time[3]
    return hour
def to_day(t):
    struct_time = time.localtime(t//1000)#将时间戳转换为struct_time元组
    day = struct_time[2]
    return day
def to_mon(t):
    struct_time = time.localtime(t//1000)#将时间戳转换为struct_time元组
    mon = struct_time[1]
    return mon
def to_wday(t):
    struct_time = time.localtime(t//1000)#将时间戳转换为struct_time元组
    wday = struct_time[6]
    return wday
def to_formatday(t):
    struct_time = time.localtime(t//1000)#将时间戳转换为struct_time元组
    fday = time.strftime("%Y-%m-%d", struct_time) 
    return fday

再获取出每周的聊天频率：

hour_set = [to_hour(i) for i in msg_time]
day_set = [to_day(i) for i in msg_time]
mon_set = [to_mon(i) for i in msg_time]
wday_set = [to_wday(i) for i in msg_time]
# print(hour_set)
week_hour = np.zeros([2, 7, 24]).astype(np.int)
for s, w, h in tqdm(zip(msg_isSend, wday_set, hour_set)):
    if s not in [0, 1]:
        pass
    week_hour[s, w, h] += 1

print(week_hour[0].T)
print(week_hour[1].T)
print(week_hour[0].sum(0))
print(week_hour[1].sum(0))

运行结果如下，分别显示出收发双方在每周每日的发送信息量：

[[1379 1354 1863  847 1301 1032 1320]
 [ 492  627 1028  330  507  422  884]
 [ 149  427  289    4  270   70  353]
...........
 [1083  844  630  698  806  434  799]
 [1061  506  491  565  659  235  727]
 [1424 1616  923 1226  630  624 1083]]
[[1659 1865 1864  995 1342 1150 1547]
 [ 947  852 1087  226  727  393 1028]
 [ 893  330  862  243  709  210  358]
 ........
 [1261  597  511  705  678  312  807]
 [1678 1484  948 1450  915  747 1311]]
[9096 4290 1562  841  227  203  380 1048 3069 7001 8221 8934 7393 6787 6220 5626 6816 7287 7691 6741 6395 5294 4244 7526]
[10422  5260  1926   954   324   219   509  1453  3605  8588  9240  9492  8470  7216  6930  6130  7206  7860  8566  7456  7106  6162  4871  8533]

将上述数据保存至csv中，导入在线统计网站，这里推荐图说，相当不错的（非广告），进行一丢丢配置，就获得下面的展示图啦。
在这里插入图片描述

获取词云

用python当然可以做出不错的词云，但是我很懒，在线词云网站也挺多，就是要收费，我使用微词云来进行制作，由于数据量太大，我还充了69块钱🙃🙂！不过导出的效果还是不错的。
首先，使用python将聊天记录全部导出，我这里分别导出两人的聊天记录。然后将记录全部复制到微词云中，再加上一丢丢的配置，就完成啦。

fday_set = [to_formatday(i) for i in msg_time]
txtlist = [[], []]
alllist = []
reflist = []
for s, c, t, f in tqdm(zip(msg_isSend, msg_content, msg_type, fday_set)):
    if s not in [0, 1]:
        pass
    if t == 1:
        txtlist[s].append(c)
        alllist.append(c)
#         if f == '2020-07-10': # 导出特定一天的记录
#             print(f)
#             reflist.append(c)

print(txtlist[0][:10])
print(txtlist[1][:10])
print(alllist[:10])
with open('recv.txt', 'w') as f:
    for t in txtlist[0]:
        f.write(f'{t}\n')
with open('send.txt', 'w') as f:
    for t in txtlist[1]:
        f.write(f'{t}\n')
with open('all.txt', 'w') as f:
    for t in alllist:
        f.write(f'{t}\n')