您的WhatsApp聊天可以告诉您很多信息

Lately, I was looking for some small and exciting visualisation projects to explore the data visualisation field. Then I came across a feature in WhatsApp about exporting your chats in a text file, which is quite handy and easy to work with. Whatsapp claims that nearly 65 billion WhatsApp messages sent per day, or 29 million per minute in 2020.

最近,我一直在寻找一些激动人心的小型可视化项目来探索数据可视化领域。 然后,我遇到了WhatsApp中的一项功能,该功能可以将您的聊天记录导出为文本文件,该功能非常方便且易于使用。 Whatsapp声称每天发送将近650亿条WhatsApp消息,即2020年每分钟发送2900万条消息。

I started using WhatsApp frequently after getting into college in 2016, so I thought of collecting and visualising my last four years of chats. I obtained around 50 text files from my WhatsApp, including personal conversations with friends and family members, and some group chats.

我在2016年上大学后就开始频繁使用WhatsApp,因此我想到了收集和可视化最近四年的聊天记录。 我从WhatsApp获得了大约50个文本文件,包括与朋友和家人的私人对话以及一些群聊。

For people who aren’t interested in the code, you can enjoy the images. For others, I have also uploaded the entire code in my GitHub repository.

对于那些对代码不感兴趣的人,您可以欣赏图像。 对于其他人,我也将整个代码上传到了GitHub存储库中

I used Google Colaboratory for this project, in case you are using this or some other platform then change the path of the files accordingly.

如果您使用的是此平台或其他平台,则我将Google合作实验室用于该项目,然后相应地更改文件的路径。

Update — I have added the sentiment analysis code and topic modelling code for WhatsApp chat in my Github.

更新—我在Github中添加了WhatsApp聊天的情感分析代码和主题建模代码。

加载消息 (Loading the Messages)

The messages in the text file are of the format — {Date}, {Time} — {Author}: {Message}

文本文件中的消息的格式为-{日期},{时间}-{作者}:{消息}

09/12/17, 10:20 pm — Sheet: Bhai sun na…

17/9 / 12,10:20 pm —工作表:Bhai sun na…

The plain text files will have to be converted in a meaningful manner to store in a Pandas data frame. I used this function to get the dataframe.

纯文本文件将必须以有意义的方式进行转换,以存储在Pandas数据框中。 我使用此函数来获取数据框。

def read(file):
f = open('/content/drive/Drive/Whatsapp/{}'.format (file) , 'r')
m=re.findall('(\d+/\d+/\d+,\d+:\d+\d+[\w]+)-(.*?):(.*)',
f.read())
f.close()
h = pd.DataFrame(m,columns=['date','am-pm','name','msg'])
h['date']= pd.to_datetime(h['date'],format="%d/%m/%y, %I:%M%p")
h['msg_len'] = h['msg'].str.len()
h['date1'] = h['date'].apply(lambda x: x.date())
return h

I saved all my conversations in a data folder so I could list, load, and merge them into one dataframe.

我将所有对话都保存在一个数据文件夹中,以便可以列出,加载并将它们合并到一个数据框中。

files = os.listdir('/content/drive/My Drive/Whatsapp')
lst = []
for file in files:
history = read(file)
lst.append(history)
history = pd.concat(lst).reset_index()

Now our dataframe is ready and looks something like this.

现在我们的数据框已经准备好,看起来像这样。

Image for post
Screenshot of Dataframe
数据框的屏幕截图

一些统计 (Some Statistics)

How many messages have I sent in the last four years? How many different people have I talked to within the past four years?

在过去的四年中,我发送了多少封邮件? 在过去的四年中,我与多少人聊天?

history_clean[history_clean['name']=='sachin']['msg'].count()
history_clean['name'].nunique()

In my case, I sent over 58k messages, talked to over 350 different people. I also checked my AM-PM messages frequency PM-43820, AM-14185.

就我而言,我发送了58k封邮件,与350多个不同的人进行了交谈。 我还检查了AM-PM消息频率PM-43820,AM-14185。

数据探索 (Data Exploration)

Image for post
Photo by Andrew Neel on Unsplash
安德鲁·尼尔 ( Andrew Neel)Unsplash拍摄的照片

This is the most exciting part of this article — Data Exploration. Let’s dig out all the fascinating stories that these data are trying to tell us.

这是本文最令人兴奋的部分-数据探索。 让我们找出这些数据试图告诉我们的所有有趣的故事。

# Create a subset of the dataframe with only messages i've sent
msg_sachin = (history_clean[history_clean['name']=='sachin'])
plt.figure(figsize=(25,8))
msg_sachin.groupby(['date1']).count()['msg'].plot()

This piece of code will give us the number of messages sent over the years.

这段代码将为我们提供多年来发送的消息数。

Image for post
Messages sent over the years (Generated by code)
多年来发送的消息(通过代码生成)

This plot is pretty impressive as it can very quickly identify when I am on vacation at my home or in college. The effect of coronavirus on my texting pattern can be determined very quickly (I guess everyone is going through the same situation). Apart from this, some peaks in the plots over a few months (May to July and December) can be justified by Summer vacations and Winter breaks of my college.

这个情节非常令人印象深刻,因为它可以快速识别我在家里或大学度假的时间。 可以很快确定冠状病毒对我发短信模式的影响(我想每个人都在经历同样的情况)。 除此之外,我大学的暑假和寒假可以证明几个月(5月至7月和12月)期间地块的一些高峰。

I also find it funny to see the length of my messages over the years.

看到这些年来我发来的消息的长度,我也感到很有趣。

plt.figure(figsize=(25,8))
history.groupby(['date'])['msg_len'].mean().plot()
Image for post
Length of messages over years (Generated by code)
多年来的消息长度(通过代码生成)

So I tried to find that outlier, and I got this -

所以我试图找到那个离群值,而我得到了-

array([\’Google drive movies https://drive.google.com/drive/u/ 0/mobile/ folders/0B6FjKMQKynZILTlwZHl4ajUwcFU Programming language collection on google drive — https://drive.google.com/drive/folders/0ByWO0aO1eI_ MN1BEd3VNRUZENkU Books for reading — https://drive. google.com/drive /folders/0B09qtt10aqV1SGxRVXBWYmNIS2M Books (novels) — https://drive.google.com/drive/folders/0B1v9Iy1jH3FXdlND [Udemy] the complete digital marketing course https://drive.google.com/drive/ folders/0Bx2Vez2N3qd7S GxkejRhQmdKQlk Books for reading…]

array([\'Google云端硬盘电影https://drive.google.com/drive/u/ 0 / mobile / folder / 0B6FjKMQKynZILTlwZHl4ajUwcFU google drive上的编程语言集合-https://drive.google.com/drive/folders/ 0ByWO0aO1eI_ MN1BEd3VNRUZENkU可供阅读的图书-https:// drive。google.com/drive / folders / 0B09qtt10aqV1SGxRVXBWYmNIS2M图书(小说)-https://drive.google.com/drive/folders/0B1v9Iy1jde3de营销[完整] ://drive.google.com/drive/ folder / 0Bx2Vez2N3qd7S GxkejRhQmdKQlk阅读书籍...]

After this, I plotted my Message frequency over the months.

此后,我绘制了过去几个月的消息频率。

Image for post
Month wise Message Frequency (Generated by code)
每月明智的消息频率(由代码生成)

I was amazed to see this distribution. I guess this is also suggesting that I chat a lot in summers 😂.

我惊讶地看到了这种分布。 我想这也暗示着我在夏天聊天很多。

Image for post
Hour wise Message Frequency (Generated by code)
每小时的消息频率(通过代码生成)

Next plot is fascinating and tells about your sleeping time. It is a plot of message frequency over the hours in a day (0 means midnight). It suggests that I am not a Night-Owl and prefers to sleep after 11 or midnight.

下一个情节令人着迷,并告诉您您的睡眠时间。 它是一天中各个小时内消息频率的图表(0表示午夜)。 这表明我不是夜猫子,喜欢在11点或午夜之后睡觉。

Also, the number of messages increases from morning to noon. It is contrary to the fact that during these hours one should work or study more 😜.

同样,消息的数量从早上到中午也会增加。 在这些时间里,人们应该工作或学习更多的东西,这与事实相反。

Image for post
Day wise Message Frequency (Generated by code)
每日消息频率(通过代码生成)

Next, I also plotted the message frequency over the days in a week. Well, it looks like it doesn’t matter if it’s Sunday or Monday, I chat almost in the same amount every day.

接下来,我还绘制了一周中各天的消息频率。 好吧,看起来是星期天还是星期一都没关系,我每天聊天的次数几乎相同。

Now comes the most interesting plot of this article.

现在是本文最有趣的情节。

让我们绘制表情符号 (Let’s Plot Emojis)

The complete code for emoji is in my GitHub. I used Emoji library by Python to plot this. You can install this library using this line

表情符号的完整代码在我的GitHub中。 我使用Python的表情符号库对此进行了绘制。 您可以使用此行安装此库

pip install emoji — upgrade

pip install emoji-升级

Image for post
Emoji used by me in the last 4 years (Generated by code)
我过去4年使用的表情符号(由代码生成)

Well, let’s first talk about the second most frequent emoji used by me 💣. I generally use it while quoting some dialogues from the movie “Gangs of Wasseypur” or sometimes for no reason. But I never realised that I have become obsessed with this emoji so much 💣.

好吧,让我们首先谈谈我使用的第二个最常见的表情符号💣 。 我通常会在引用电影“瓦西布尔之帮”中的某些对话时使用它,或者有时是无缘无故地使用它。 但是我从来没有意识到我对这个表情符号如此着迷

Also, I was glad to see that evidently, my life is full of laughter, surprises and bombs!

另外,我很高兴地看到,我的生活充满了欢笑,惊喜和炸弹!

As an extension of this article, I have added Sentiment Analysis and Topic Modelling for WhatsApp chat in my Github. Those who are into NLP can check it once.

作为本文的扩展,我在我的Github中为WhatsApp聊天添加了情感分析和主题建模。 那些参加NLP的人可以检查一次。

结论 (Conclusion)

It looks like this analysis while answering some questions has opened up a lot of new problems which can be further solved.

看起来,这种分析在回答一些问题时开辟了许多可以进一步解决的新问题。

Did you find any of these insights useful? Or do you have suggestions about some valuable insights that I missed? Feel free to add your comments below.

您发现这些见解有用吗? 还是您对我错过的一些有价值的见解有建议? 随时在下面添加您的评论。

You can also add me on Linkedln

您也可以在Linkedln上加我

翻译自: https://towardsdatascience.com/your-whatsapp-chats-can-tell-a-lot-about-you-3a7db37789b3

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值