by Evaristo Caraballo
根据我对3.5GB聊天记录的分析，Emoji开发人员使用最多 (The Emoji developers use most — based on my analysis of 3.5GB of chat logs)
Emoji have drastically changed the way we communicate in social media.
There are numerous studies suggesting differences in the way people use emoji on different social media platforms. For example, the lists of the top emoji in Instagram, Twitter, or Facebook have some similarities but also very distinctive patterns. Those differences get larger when moving down the list.
The possibility that the social platform dynamics might affect the use of emoji made me curious about how people might use them in a social platform to learn to code.
In this article, I look at how new developers use emoji, specifically in the freeCodeCamp’s Gitter Main Chat Room.
There are at least two ways to render emoji in Gitter:
Using aliases (like those listed by existing online cheat sheets).
使用别名 (例如现有在线备忘单中列出的别名 )。
Using the UTF-8 form by either writing the emoji directly from your keyword or copying/pasting the character from online resources.
Both render differently in the message, the former rendering existing Gitter images and the latter rendering according to your machine setups. The first method “using aliases” is the most popular and will be the main subject of this discussion.
To give you a quick idea of what I was after, I wanted to quickly explore answers to questions like:
- Is there a distinctive pattern in the use of emoji? 表情符号的使用是否有与众不同的模式？
- Which are the most popular emoji then? 那么，哪些是最受欢迎的表情符号？
- How many people use emoji? 有多少人使用表情符号？
- How versed are users in the emoji vocabulary? 使用者对表情符号词汇的了解程度如何？
So lets get started and answer these questions.
让我们来谈谈表情符号 (Let's have some emoji-talk)
After carrying out my analysis, I found out that about 23% of engaged chatters were also emoji users. I define an engaged chatter as a person that has sent at least 10 messages. If we instead compare engaged and non-engaged emoji users against all engaged chatters, that figure rises to 45%.
经过分析，我发现约23％的活跃聊天者也是emoji表情用户。 我将参与聊天的人定义为已发送至少10条消息的人。 如果我们将参与的表情符号用户和未参与的表情符号用户与所有参与的聊天者进行比较，则该数字上升到45％。
The number of emoji users might sound small compared to other platforms. However, it is important to note that:
- many users of the chat room were short lived 聊天室的许多用户都是短暂的
- there were users who preferred a conservative communication 有些用户喜欢保守的交流
- some users might not know the emoji aliases 一些用户可能不知道表情符号别名
In total, our emoji users rendered at least 753,000 emoji (600,000 when emoji were counted only once per message) with an average of 32 emoji for every 100 messages.
All in all, our emoji users showed a collective literacy of about 800 aliases, about 25% of the full list of emoji in use. I sketched a beeswarm visualization? on D3.js showing that many of them were introduced for the first time in the chat room between July 2015 and July 2016 with a growth rate of 10 - 20 new emoji per week.
When taken per individual though, our emoji users managed a vocabulary of around 3 different emoji on an average. The difference was due to few users championing the usage of emoji, with one particular emoji master showing an emoji literacy of around 500 different ones. ?
但是，当按个人使用时，我们的表情符号用户平均管理的词汇量约为3种。 造成这种差异的原因是，很少有用户拥护表情符号的使用，其中一位特定的表情符号大师显示出大约500种不同的表情符号素养。 ？
聊天室中的“非典型”表情符号？ (“Atypical” emoji-ing in the chatroom?)
To have a better idea of how people emoji-ed in the chatroom I compared my findings against a report made by SwiftKey in 2015. There have been substantial updates to the emoji list since the release of the report but it appears to be the best free reference available still in use. It was not possible to find the emoji categorizations used by SwiftKey though. I used the categories and subcategories given by unicode.org as an approximation instead:
I first evaluated the use of emoji at the category level and the results were very much as in the SwiftKey report. Most of the emoji posted in the freeCodeCamp chat room belonged to the “Smileys & People” category, which include faces, gestures, person-roles, body parts and hearts.
Because comparisons based on high level categorizations are usually too shallow, I tried another comparison focusing on the 25 most used emoji ever from 2015 to 2017 using their subcategories instead. Together those 25 emoji accounted for around 15% of all the emoji posted during that period.
由于基于高级分类的比较通常太浅，因此我尝试了另一种比较，重点是2015年至2017年使用的子类别中 25种最常用的表情符号。 这25个表情符号合起来占该时期发布的所有表情符号的15％左右。
The list of emoji and subcategories suggest that our emoji users might still fit well into the typical pattern of emoji users. The extensive use in the chat room of icons within the “face-positive” subcategory coincided with the use of the SwiftKey report's “happy faces”.
The same with the “face-negative” subcategory, much like the “sad faces” in the SwiftKey report. A bit apart was the use of “:trollface:”, which is an icon available in GitHub and it is usually associated with spam messages and sabotage, but also used as a joke in the freeCodeCamp chat room, probably in the same way as ? (“:poop:” or “:hankey:”), also listed in the 25 top-ever.
与“脸部阴性”子类别相同，与SwiftKey报告中的“悲伤面Kong”非常相似。 稍有不同的是，使用了“：trollface：”，该图标在GitHub中可用，通常与垃圾邮件和破坏活动相关联，但在freeCodeCamp聊天室中也被用作笑话，可能的方式与？ (“：poop：”或“：hankey：”)，也列在前25名中。
However it is in the extensive use of positive hand gestures and in general “body” icons where this chat room might distinguish itself from other benchmarks.
The most used gesture icons in the freeCodeCamp chat room are positive, related to welcome, support, validation, and recognition of success, which are values commonly shared in the freeCodeCamp community.
Another difference is the lesser use of icons like ♥️ “hearts” or ? “kisses”, suggesting that “sharing affection” was not the main goal of this chat room. With a gender demography of about 70–80% males that could prove even harder. This demographic might also explain some male-related icons in the top-ever, such as ? (“:gun:”).
另一个区别是较少使用诸如♥️“ hearts”或？之类的图标。 “亲吻”，这表明“ 令人讨厌的感情”不是此聊天室的主要目标。 如果按性别进行人口统计， 大约70-80％的男性可能会更加困难。 此人口统计信息还可能会解释一些排行榜上与男性相关的图标，例如？ (“：枪：”)。
Even though we could spot some deviations to the general pattern, it is too soon to make a definitive conclusion. In fact it is likely that the most important deviations might be found in how people used the less-popular emoji.
Furthermore, it might be that the most important differences are not in terms of numbers, but meanings or how the iconography might be interpreted by the group according to its context. A good example of what I refer to is the swastika. A well known example for emoji is the eggplant. I wonder if from our 25 top-ever list ? (“:fire:”) wouldn’t have a distinctive meaning for this group, as a way to express “commitment to a task”. In any case, this is more a topic for those interested in social media communication and emoji, like in this article.
最终获胜者是… (And the winner is…)
As a bonus, I scratched a D3.js visualization of the monthly Top5 emoji. Being part of the list of the-most-counted-ever doesn't mean that the emoji reached the monthly top 5 once, or vice versa. Like the Tour de France, a rider could be consistently in the sixth position for the whole competition without ever winning a day and then listed in the most counted. Similarly, a rider could win a day and then stay the last the rest of the time. This is why this list looks a bit different.
So the winner of the monthly Top 5 is…
Frankly, I didn’t expect ? (“:smile:”) to be the most popular emoji. I thought it was ? (“:joy:”), given that Apple recently revealed it as its most popular during 2017.
坦白说，我没想到吗？ (“：smile：”)成为最受欢迎的表情符号。 我以为是？ (“：joy：”)，因为苹果公司最近宣布它是2017年最受欢迎的产品。
The following 8 emoji also appeared in the freeCodeCamp casual chatroom. All about smiles :). Do you think you are an emoji-fan? Guess their aliases! (Observation: names/keywords can vary by platform…)
以下8个表情符号也出现在freeCodeCamp休闲聊天室中。 所有关于微笑:)。 您是否认为自己是表情符号迷？ 猜他们的别名！ (观察：名称/关键字可能因平台而异...)
I used Python and the Gitter API to get the messages from the freeCodeCamp main chat room. Python libraries like multiprocessing and emoji were used to transform the data. Part of the transformations also required data available online, for which I made customized scrapers also with Python libraries (requests, urllib, BeautifulSoup4). To analyze the data I used plain Python and some pandas. Explorative visualizations were made using matplotlib while the interactive ones where made in D3.js.
我使用Python和Gitter API从freeCodeCamp主聊天室获取消息。 诸如多重处理和表情符号之类的Python库用于转换数据。 部分转换还需要在线提供数据，为此，我还使用Python库(requests， urllib和BeautifulSoup4 )制作了自定义的抓取工具。 为了分析数据，我使用了普通的Python和一些熊猫 。 使用matplotlib进行了探索性可视化，而使用D3.js进行了交互式可视化。
Versions of the code will be available on my GitHub repository together with a few final datasets. Regarding the raw datasets used for this project they are now available on the freeCodeCamp’s Kaggle account.
The motivation of this project adheres to the mission of the freeCodeCamp’s Open Data Initiative. A big thanks to the people in the freeCodeCamp DataScience room and specially to mstellaluna for her comments!
And remember, if you found the information in this article useful or you simply liked the content, don’t forget to leave some claps ? ? before you leave! Thanks and Happy Coding! ?
记住，如果您发现本文中的信息很有用，或者您只是喜欢其中的内容，别忘了鼓掌吗？ ？ 在你离开之前！ 谢谢，祝您编码愉快！ ？