Mining Text Data Chapter One: An Introduction to Data Mining

Introduction

Data mining can learn interesting patterns from the data in a dynamic and scalable way. Information retrieval has traditionally focused more on facilitating information access rather than analyzing information to discover patterns, which is the primary goal of text mining.

The most important characteristic of text data is sparse and high dimensional due to string input. Furthermore, in most application, it would be desirable to represent text information semantically, however, the natural language processing are still not robust enough to work. Usually, text data will be treated as a bag-of-words or a string of words.

Recently, there has been rapid growth of text data in the context of different web-based applications such as social media

Algorithm for text mining

1, information extraction from text data

2, text summarization

3, unsupervised learning methods from text data: clustering and topic modeling

4, LSI and dimensionality reduction for text mining

5, supervised learning methods from test data: classification and transfer learning

6, transfer learning with text data: for cross-lingual mining in some web source

7, probabilistic techniques for test mining

8, mining text streams: for Reuters and news

9, cross-lingual mining of text data

10, text mining in multimedia networks

11, text mining in social media

12, opinion mining from text data

13, text mining from biomedical data

Future Direction

1, Scalable and robust methods for natural language understanding: It is important to develop effective and robust information extraction and other natural language processing methods that can scale to multiple domains

2, Domain adaptation and transfer learning

3, Contextual analysis of text data: Text data is generally associated with a lot of context information such as authors, sources, and time, or more complicated information networks associated with text data.

4, Parallel text mining: In particular, how to parallelize all kinds of text mining algorithms, including both unsupervised and supervised learning methods is a major future challenge.

转载于:https://www.cnblogs.com/jmliunlp/p/3734135.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值