python和深度学习_在Python和深度学习中使用torchmoji-CSDN博客

本文介绍了如何使用Python和torchMoji库进行深度学习的情感分析，特别是将文本转换为表情符号。作者分享了安装和设置转换功能的步骤，包括避免在使用torchMoji时遇到的崩溃问题。此外，还提供了一个函数`deepmojify`，它接受文本并返回最可能的表情符号。最后，展示了如何处理包含多个句子的列表，将其转换为包含表情符号的Pandas DataFrame。

摘要由CSDN通过智能技术生成

python和深度学习

深度学习 (Deep Learning)

It has been difficult to find a tutorial on how to use the notorious DeepMoji using Python. After several attempts on my notebook and several errors later, I gave up and decided to use the alternative version trained using Keras: torchMoji.

很难找到有关如何使用Python使用臭名昭著的DeepMoji的教程。在笔记本上进行了几次尝试并在以后出现了一些错误之后，我放弃了，并决定使用由Keras培训的替代版本：torchMoji。

In fact, I have not been able to find a single tutorial on how to convert text into emojis. Lucky for you, here is one.

实际上，我还没有找到有关如何将文本转换为表情符号的单个教程。幸运的是，这是一个。

安装 (Installation)

***The code, unfortunately, is not entirely mine, source code available at this link.

***不幸的是，该代码不是完全属于我的，此链接提供了源代码。

!pip3 install torch==1.0.1 -f https://download.pytorch.org/whl/cpu/stable 
!git clone https://github.com/huggingface/torchMoji
import os
os.chdir('torchMoji')
!pip3 install -e .
#if you restart the package, the notebook risks to crash on a loop
#I did not restart and worked fine

The code will download around 600 MB of data for training the AI. I have been using Google Colab. However, I noticed that when the program asks you to restart the notebook to make the required changes, it begins crashing on a loop without remedy. If you can learn from my mistake, do not restart the notebook, leave it be, and it will work.

该代码将下载约600 MB的数据，用于训练AI。我一直在使用Google Colab。但是，我注意到， 当程序要求您重新启动笔记本电脑以进行所需的更改时，它开始崩溃而无法补救。如果您可以从我的错误中吸取教训，请不要重新启动笔记本计算机，而是继续使用笔记本计算机，它将可以正常工作。

!python3 scripts/download_weights.py

This script should download the tuning for the neural network. Press yes to confirm when asked.

该脚本应下载神经网络的调整。询问时按是确认。

设置转换功能 (Setting up the conversion function)

With the following function, you will be able to input text, and in return, the function will output the most probable n emojis (n to be specified).

使用以下功能，您将能够输入文本，并且作为回报，该功能将输出最可能的n个表情符号(指定n个)。

import numpy as np
import emoji, json
from torchmoji.global_variables import PRETRAINED_PATH, VOCAB_PATH
from torchmoji.sentence_tokenizer import SentenceTokenizer
from torchmoji.model_def import torchmoji_emojis
EMOJIS = ":joy: :unamused: :weary: :sob: :heart_eyes: :pensive: :ok_hand: :blush: :heart: :smirk: :grin: :notes: :flushed: :100: :sleeping: :relieved: :relaxed: :raised_hands: :two_hearts: :expressionless: :sweat_smile: :pray: :confused: :kissing_heart: :heartbeat: :neutral_face: :information_desk_person: :disappointed: :see_no_evil: :tired_face: :v: :sunglasses: :rage: :thumbsup: :cry: :sleepy: :yum: :triumph: :hand: :mask: :clap: :eyes: :gun: :persevere: :smiling_imp: :sweat: :broken_heart: :yellow_heart: :musical_note: :speak_no_evil: :wink: :skull: :confounded: :smile: :stuck_out_tongue_winking_eye: :angry: :no_good: :muscle: :facepunch: :purple_heart: :sparkling_heart: :blue_heart: :grimacing: :sparkles:".split(' ')
model = torchmoji_emojis(PRETRAINED_PATH)
with open(VOCAB_PATH, 'r') as f:
  vocabulary = json.load(f)
st = SentenceTokenizer(vocabulary, 30)def deepmojify(sentence,top_n =5):
  def top_elements(array, k):
    ind = np.argpartition(array, -k)[-k:]
    return ind[np.argsort(array[ind])][::-1]tokenized, _, _ = st.tokenize_sentences([sentence])
  prob = model(tokenized)[0]
  emoji_ids = top_elements(prob, top_n)
  emojis = map(lambda x: EMOJIS[x], emoji_ids)
  return emoji.emojize(f"{sentence} {' '.join(emojis)}", use_aliases=True)

实验文字 (Experimenting on text)

text = ['I hate coding AI']for _ in text:
  print(deepmojify(_, top_n = 3))

输出： (Output:)

😤 (😡 😠 😤)

As you can see, I have told the program to run this line of text. Because it is a list, you can add how many strings you want.

如您所见，我已经告诉程序运行这一行文本。因为它是一个列表，所以您可以添加所需的字符串数。

原始神经网络 (Original Neural Network)

If you do not know how to code and you just want to play, you can use DeepMoji’s website.

如果您不知道如何编码，而只是想玩，可以使用DeepMoji的网站。

Image for post — screenshot of www.DeepMoji.com

The source code should be exactly the same, in fact, if I input 5 emoticons rather than 3, this is the result in my code:

源代码应该完全相同，实际上，如果我输入5个表情符号而不是3个，这就是我的代码的结果：

😡😠😤🔫😒 (😡 😠 😤 🔫 😒)

输入列表而不是单个句子 (Input lists rather than a single sentence)

***Now, this is my code

***现在，这是我的代码

When performing sentiment analysis I usually have a database of tweets or reviews stored on Pandas, I will use the following algorithm that will turn a list of strings to a pandas DataFrame with a specified number of emojis.

在执行情感分析时，我通常会在Pandas上存储推文或评论数据库，我将使用以下算法将字符串列表转换为具有指定表情符号数量的Pandas DataFrame。

import pandas as pddef emoji_dataset(list1, n_emoji=3):
  emoji_list = [[x] for x in list1]for _ in range(len(list1)):
    for n_emo in range(1, n_emoji+1):
      emoji_list[_].append(deepmojify(list1[_], top_n = n_emoji)[2*-n_emo+1])emoji_list = pd.DataFrame(emoji_list)
  return emoji_listlist1 = ['Stay safe from the virus', 'Push until you break!', 'If it does not challenge you, it will not change you']

I want to estimate the most probable 5 emojis of this list of string:

我想估计此字符串列表中最可能的5个表情符号：