(01) 第一讲 NLP(Natural Language Processing)和深度学习入门

1.介绍

1.1 深度学习和机器学习不同的是

大多数传统的机器学习是围绕决策树(decision trees)、逻辑回归(logistic regressions)、朴素贝叶斯(naive bayes)、支持向量机(SVM:support vector machines)等概念。它的本质是由人类仔细审视一个特定的问题,然后设计出与该问题相关的重要特征要素,在手工写代码。因此在机器学习中,是人类研究如何描述数据,总结重要特征。
深度学习是表征学习(representation learning)的一个分支,表征学习的理念是我们向电脑提供原始信号(视觉或语言),电脑自动得出好的中介表征来完成任务,也就是说由机器定义特征。深度学习是自动得到多层学习得到的表征。现在深度学习一般指神经网络。

1.2 基于deep learning的NLP

基于深度学习的NLP的核心想法是以词义(word meaning)为起点。具体而言将表示的词用一个数字向量表示。将单词放到高维向量空间中,这些空间就是语义空间,具有相似含义的词汇在向量空间中形成聚集块。向量空间中也存在方向,表示成分和意义的信息。但由于人类不擅长解读高维空间,因此常常将它映射成二维或三维,降维过程中就会丢失一些信息,因此可能具有误导性。
降维过程中,用了一些方法来表示高维向量空间中的主要含义,比如主成分分析(PAC)、t-SNE(一种常用的非线性降维) 。
向量空间中的坐标轴的意义:目前没有特殊发现,因此是没有实际意义。

1.3 NLP的传统做法

NLP的传统方法一般是做lambda演算基于微积分的语义理论,对每个单词赋予语义函数,然后用准确算法研究如何将词汇意义结合起来,获得语义表达。

1.4 总结

用向量表示所有的语言形式,词的部分、单词、句子、对话,都将他们表示成具有真实值的向量。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
We are awash with text, from books, papers, blogs, tweets, news, and increasingly text from spoken utterances. Every day, I get questions asking how to develop machine learning models for text data. Working with text is hard as it requires drawing upon knowledge from diverse domains such as linguistics, machine learning, statistical natural language processing, and these days, deep learning. I have done my best to write blog posts to answer frequently asked questions on the topic and decided to pull together my best knowledge on the matter into this book. I designed this book to teach you step-by-step how to bring modern deep learning methods to your natural language processing projects. I chose the programming language, programming libraries, and tutorial topics to give you the skills you need. Python is the go-to language for applied machine learning and deep learning, both in terms of demand from employers and employees. This is not least because it could be a renaissance for machine learning tools. I have focused on showing you how to use the best of breed Python tools for natural language processing such as Gensim and NLTK, and even a little scikit-learn. Key to getting results is speed of development, and for this reason, we use the Keras deep learning library as you can define, train, and use complex deep learning models with just a few lines of Python code. There are three key areas that you must know when working with text: 1. How to clean text. This includes loading, analyzing, filtering and cleaning tasks required prior to modeling. 2. How to represent text. This includes the classical bag-of-words model and the modern and powerful distributed representation in word embeddings. 3. How to generate text. This includes the range of most interesting problems, such as image captioning and translation. These key topics provide the backbone for the book and the tutorials you will work through. I believe that after completing this book, you will have the skills t

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值