《Mining Text Data》阅读笔记---第1章 An Introduction to Text Mining

这是一本关于文本挖掘的很厚的英文电子书,看英文大部头,很容易边看边忘记。


1.An Introduction to Text Mining

1.1 介绍
文本挖掘的三个问题:
a. 主要的算法模型是什么?与其他数据挖掘的区别?
b. 有哪些可用的工具和技术?(模型是形而上的,技术是形而下的)
c. 有哪些关键的应用领域?

文本挖掘的特点:
a. 文本数据的高维度和稀疏性
b.文本数据可以在多层次进行分析,如单词,句,篇章,文本集合。
  文本的语义表示很有用,如NER.

1.2 算法
本section介绍文本挖掘所覆盖的各种topic及其算法。
a. Information Extraction from Text Data:
   Information Extraction is one of the key problems of text mining, which serves as a starting
   point for many text mining algorithms.
  
b. Text Summarization:
   Another common function needed in many text mining applications is to summarize the text documents.

c. Unsupervised Learning Methods from Text Data:
   The two main unsupervised learning methods commonly used in the context of text data are clustering and topic  modeling.

d. LSI and Dimensionality Reduction for Text Mining:
   representing the underlying data in compressed format for indexing and retrieval.
   这点有点类似Text Summarization了。

e  Supervised Learning Methods for Text Data
  
f. Transfer Learning with Text Data:
   用武之处: For example, labeled English documents are copious and easy to find. On the other hand, it is much
   harder to obtain labeled Chinese documents. 英语的实体库等如此open,的确是很大的机会去转移到中文上去。
  
g. Probabilistic Techniques for Text Mining:

h. Mining Text Streams:
   文本数据类似音频流一样的输入,需要进行on-line连续处理,传统的off-line批处理不适用了。

i. Cross-Lingual Mining of Text Data:

j. Text Mining in Multimedia Networks:

k. Text Mining in Social Media:

l. Opinion Mining from Text Data:
   这是最常见的应用了。

m. Text Mining from Biomedical Data:
   这是在一个专业领域的应用了。

1.3 将来的方向
a.  Scalable and robust methods for natural language understanding:
    目前NLP的许多方法要scale to multiple domains比较困难,有监督学习对训练数据量的要求太高。
b. Domain adaptation and transfer learning
   这也是解决有监督学习缺乏训练数据的问题。
c. Contextual analysis of text data:
d. Parallel text mining:



 
  



 
  

An Introduction to Text Mining: Research Design, Data Collection, and Analysis By 作者: Gabe Ignatow – Rada F. Mihalcea ISBN-10 书号: 1506337007 ISBN-13 书号: 9781506337005 Edition 版本: 1 出版日期: 2017-10-25 pages 页数: (344 ) Students in social science courses communicate, socialize, shop, learn, and work online. When they are asked to collect data for course projects they are often drawn to social media platforms and other online sources of textual data. There are many software packages and programming languages available to help students collect data online, and there are many texts designed to help with different forms of online research, from surveys to ethnographic interviews. But there is no textbook available that teaches students how to construct a viable research project based on online sources of textual data such as newspaper archives, site user comment archives, digitized historical documents, or social media user comment archives. Gabe Ignatow and Rada F. Mihalcea′s new text An Introduction to Text Mining will be a starting point for undergraduates and first-year graduate students interested in collecting and analyzing textual data from online sources, and will cover the most critical issues that students must take into consideration at all stages of their research projects, including: ethical and philosophical issues; issues related to research design; web scraping and crawling; strategic data selection; data sampling; use of specific text analysis methods; and report writing. Coper Half Title Publisher Finelybook 出版社 Note Title Page Copyright Page Brief Contents Detailed Contents Acknowlednents Preface Note to the Reader About the Authors Part I Foundations 1 Text Mining and Text Analysis 2 Acquiring Data 3 Research Ethics 4 The Philosophy and Logic of Text Mining Part II Research Desin and Basic Tools 5 Desi ning Your Research Project 6Web Scraping and Crawling Part III Text Mining Fundamentals 7 Lexical Resources 8 Basic Text Processing 9 Supervised Learning Part IV Text Analysis Methods From the Humanities and Social Sciences 10 Analyzing Narratives 11 Analyzing Themes 12 Analyzing lMetaphors Part V Text Mining Methods From Computer Science 13 Text Classification 140pinion Mining 15 Information Extraction 16 Analyzing Topics Part VI Writing and Reporting Your Research 17 Writing and Reporting Your Research Appendix A Data Sources for Text Mining Appendix B Text Preparation and Cleaning Software Appendix C General Text Analysis Software Appendix D Qualitative Data Analysis Software Appendix E Opinion Mining Software Appendix F Concordance and Keyword Frequency Software Appendix G Visualization Software Appendix H List of Websites Appendix I Statistical Tools Glossary References Index
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值