lstm 文本分类_带有lstm和单词嵌入的灾难推文上的文本分类

最新推荐文章于 2024-05-14 10:21:13 发布

weixin_26750481

最新推荐文章于 2024-05-14 10:21:13 发布

阅读量735

点赞数

文章标签： python 自然语言处理 nlp java linux

原文链接：https://towardsdatascience.com/text-classification-on-disaster-tweets-with-lstm-and-word-embedding-df35f039c1db

版权

该博客介绍了如何利用LSTM（长短期记忆网络）和单词嵌入技术对灾难推文进行文本分类。内容涵盖从数据预处理到模型训练的全过程，旨在实现对灾难相关推文的有效识别。

摘要由CSDN通过智能技术生成

lstm 文本分类

This was my first Kaggle notebook and I thought why not write it on Medium too?

Ť他是我第一次Kaggle的笔记本电脑，我想，为什么不把它写在中吗？

Full code on my Github.

我的Github上的完整代码。

In this post, I will elaborate on how to use fastText and GloVe as word embedding on LSTM model for text classification. I got interested in Word Embedding while doing my paper on Natural Language Generation. It showed that embedding matrix for the weight on embedding layer improved the performance of the model. But since it was NLG, the measurement was objective. And I only used fastText too. So in this article, I want to see how each method (with fastText and GloVe and without) affects to the prediction. On my Github code, I also compare the result with CNN. The dataset that i use here is from one of competition on Kaggle, consisted of tweets and labelled with whether the tweet is using disastrous words to inform a real disaster or merely just used it metaphorically. Honestly, on first seeing this dataset, I immediately thought about BERT and its ability to understand way better than what I proposed on this article (further reading on BERT).

在本文中，我将详细介绍如何将fastText和GloVe用作词嵌入在LSTM模型上进行文本分类。在撰写有关自然语言生成的论文时，我对词嵌入感兴趣。结果表明，嵌入权重的嵌入矩阵提高了模型的性能。但是由于它是NLG，因此测量是客观的。而且我也只使用fastText。因此，在本文中，我想了解每种方法(带有fastText和GloVe(不带)的方法)如何影响预测。在我的Github代码上，我还将结果与CNN进行了比较。我在这里使用的数据集来自Kaggle上的一项竞赛，由tweet组成，并标有tweet是使用灾难性单词来告知真实灾难还是只是隐喻地使用它。老实说，在第一次看到该数据集时，我立即想到了BERT及其理解方式的能力比我在本文中提出的更好( 进一步阅读BERT )。

But anyway, in this article I will focus on fastText and GloVe.

但是无论如何，在本文中，我将专注于fastText和GloVe。

Let’s go?

我们走吧？

数据+预处理 (Data + Pre-Processing)

The data consisted of 7613 tweets (columns Text) with label (column Target) whether they were talking about a real disaster or not. With 3271 rows informing real disaster and 4342 rows informing not real disaster. The data shared on kaggle competition, and if you want to learn more about the data you can read it here.

数据由7613条带有标签(列目标)的推文(列文本)组成，无论他们是否在谈论真正的灾难。其中3271行通知真正的灾难，而4342行通知不是真正的灾难。关于kaggle竞赛的数据共享，如果您想了解更多有关数据的信息&

最低0.47元/天解锁文章

weixin_26750481

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
lstm 文本分类_带有lstm和单词嵌入的灾难推文上的文本分类

lstm 文本分类This was my first Kaggle notebook and I thought why not write it on Medium too? Ť他是我第一次Kaggle的笔记本电脑，我想，为什么不把它写在中吗？ Full code on my Github. 我的Github上的完整代码。 In this post, I will elaborate on ...
复制链接

扫一扫