自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(91)
  • 收藏
  • 关注

原创 Week1-3Language Diversity and Ethnologue

How many languages?Difficult to say…Estimate: 6k ~ 7k languages around the world given by EthnologueEthnologueEthnologue is free and you can use it to find information about countries, about languag

2017-03-21 16:22:40 657 1

原创 Week1-2Human Language Versus Other 'Languages'

QuestionDoes every human being have a language?We use languageto communicate ideasto thinkto show who we areLanguage?computer programming languagelanguage of flowerslanguage of musicImportant

2017-03-14 16:31:56 489

原创 Week1-0Introduction

The Course is all about diversity.SyllabusModule 1: IntroductionWhat makes language human?What do linguists do?What kinds of linguists are there?Module 2: SoundsHow do we pronounce sounds?How do

2016-08-31 00:06:03 570

原创 Week7-5Statistical POS tagging

POS tagging methodsrule-basedstochastic HMM(generative)Maximum Entropy(discriminative)transfer-basedHMM taggingT=argmaxP(T∣W)P(T∣W)=P(W∣T)P(T)P(W)T = \arg \max P(T \mid W)\\P(T \mid W ) = \frac

2016-01-04 22:43:04 254

原创 Week7-4HMM2

Observation likelihoodGiven multiple HMMs which one is most likely to generate the observation sequenceNaiive solution try all possible state sequencesForward algorithmcompute a forward trellis t

2016-01-04 22:02:12 318

原创 Week7-3HMM1

Markov modelsequence of random variables that are not independent weather reporttextPropertieslimited horizon P(Xt+1=sk∣X1,...,Xt)=P(Xt+1=s∣Xt)P(X_{t+1} = s_k \mid X_1, ..., X_t) = P(X_{t+1} = s\

2016-01-01 13:35:32 317

原创 Week7-2POS tagging

POSOpen class nouns, non-modal verbs, adjectives, adverbsClosed class prepositions, modal verbs, conjunctions, particles, determiners, pronounsPenn Treebank tag set: the label IN indicates all of

2015-12-31 16:57:40 324

原创 Week7-1Noisy channel model

The noisy channel modelExample: Input: written English(X)Encoder: garble the input(X->Y)Output: spoken English(Y)More examples: Grammatical english to english with mistakesEnglish to bitmaps(cha

2015-12-22 00:08:18 721

原创 Week6-7Word Sense Disambiguation

Introductionpolysemy words have multiple senseshomonymyPOS ambiguityWord sense disambiguationTask given a wordand its contextUse for machine translation e.g., translate ‘play’ into Spanishpl

2015-12-20 16:55:29 522

原创 Week6-6Language Modelling3

Evaluation of LMExtrinsicIntrinsicCorrelate the two for validation purposesIntrinsic: PerplexityDoes the model fit the data? A good model will give high probability to a real sentence.Perplexity

2015-12-20 16:09:52 698

原创 Week6-5Language Modelling2

SmoothingIf the vocabulary size is ∣V∣=1M \mid V \mid = 1M Too many parameters to estimate even a unigram modelMLE assigns value of 0 to unseen data, let alone bigram and trigram.Smoothing(regular

2015-12-19 18:33:42 235

原创 Week6-3,4Language Modelling1

Probabilistic language modelAssign a probability to a sentence P(S)=P(w1,w2,...,wn)P(S) = P(w_1, w_2, ..., w_n)Different from deterministic methods using CFGThe sum of the probabilities of all poss

2015-12-17 13:16:21 270

原创 Week6-2Bayesian theorem

Bayes’ theoremFormula for joint probability P(A,B)=P(B∣A)P(A)=P(A∣B)P(B)P(A,B) = P( B \mid A)P(A) = P(A \mid B)P(B)ThereforeP(A∣B)=P(B∣A)P(A)P(B)P(A \mid B) = \frac{P(B \mid A)P(A)}{P(B)}ExampleW

2015-12-16 11:57:02 93

原创 Week6-1Probabilities

Probabilistic reasoningpeech recognition recognize speech and wreck a nice beachmachine translation l’avocat general: the attorney general and the general avocadoprobabilities make it possible to

2015-12-15 10:55:15 236

原创 Week5-8Alternative Parsing Formalisms

Mildly Context-sensitive grammars-Tree substitution grammars(TSG) - Terminals generate entire tree fragments - TSG and CFG are formally equivalent - Tree adjoining grammar(TAG) - C

2015-12-14 17:59:58 274

原创 Week5-7Dependency parsing

#Dependency structure blue modifier, dependent, child, subordinatehouse head, governor, parent, regentPhrase structureDependency structure Dependency grammarCharacteristics lexical/syntactic depen

2015-12-13 16:52:03 392

原创 Wee5-6Lexicalized parsing

One step up from Context Free Parsing.Limitations of PCFGsThe probabilities do not depend on the specific wordsNot possible to disambiguate sentences based on semantic informationIdea: lexicalized g

2015-12-12 15:15:47 286

原创 Wee5-5Statistical parsing

PCFGNeed for PCFGTime flies like an arrow Many parsesSome more likely than othersNeed for a probabilistic ranking methodDefinitionJust like CFG, a 4 tuple(N,Σ,R,S)(N, \Sigma, R, S)N: non-terminal

2015-12-10 01:22:54 546

原创 Wee5-4PP attachment 3

Accuracy in test dataAlg2 63%Alg2a outperforms Alg2(Rule 3), 70%Summarization Memorizing everything is not a good idea!!What additional sources can we use to improve the algorithm?use a few more

2015-12-02 19:04:46 457

原创 Wee5-3PP attachment 2

Algorithms for PP attachmentAlg0Dumb one, to label all labels as a default label, say lowAlg1: random baselineA random unsupervised baseline would have to label each instance in the test data with a

2015-11-30 17:03:30 456

原创 Week5-2PP attachment 1

PP attachmentHigh(verbal, attached to VP)Low(nominal, attached to NP) with the net is attached to the word caught, and it has no associations with butterfly. We could formulate the PP attachment as

2015-11-30 16:51:30 412

原创 Week5-1Parsing Recap

Programming languages are designed to be disambiguous.Noun-noun compoundHead of the compoundCollege junior - a kind of juniorJunior College - a kind of collegeHead first?Attorney generalAdjective

2015-11-29 16:32:04 239

原创 Week4-5The Penn treebank

DescriptionBackgroundEarly 90’sdeveloped at University of PennsylvaniaMost cited paper in NLP!!!Size40000 training sentences2400 test sentencesGerneMostly Wall Street journal news stories and s

2015-11-28 16:40:26 316

原创 Week4-4Earley Parser

BackgroundDeveloped by Jay Earley in 1970No need to convert grammar to CNFLeft to rightComplexityfast than O(n3)O(n^3) in many casesEarley Parserlook for both full and partial constituentswhen re

2015-11-27 18:01:53 553

原创 Week4-3Classic parsing methods

Parsing as searchThere are 2 types of constraints on the parses from the input sentencefrom the grammarTherefore 2 types of approaches to parsing Top-downBottom-up Shift-reduce parsinga bottom

2015-11-23 15:23:20 538

原创 Week4-2Parsing

Parsing human languageRather different from computer languagesNo types for words(variable, comment, …)No brackets around phrasesAmbiguity wordsparsesImplied informationParsingParsing means asso

2015-11-22 17:59:52 260

原创 Week3-7NLP task 3/3

Q & AJeopardy gameSentiment analysis Understand the sentiment in the text, whether they are positive or negative, or have more polars.Machine TranslationGoogle translateMosesNoise channel model

2015-11-19 14:08:35 460

原创 Week3-6NLP task 2/3

Information ExtractionInput: Sentences or documentsUnderstand different entities and other things, and change it to a form like table that is useful to people.OutputSemanticsFirst order logicInferenc

2015-11-18 16:14:14 284

原创 Week3-5NLP task 1/3

NLP tasksPart of speech tagging Predict the part of speech for every word in the sentence.ParsingPhrase structure grammarParse treeStanford parserParsing resultDependency parsing

2015-11-18 15:49:13 379

原创 Week3-4Dimensionality reduction

Problems with the simple vector approaches to similarityDimensionality reductionlooking for hidden similarities in databased on matrix decompositionMatrix decompositionSVDExampleAssume that we have

2015-11-17 17:52:35 612

原创 Week3-3The vector space model

Document similarityUsed in IR to determine which document(d1 or d2) is more similar to a given query q(the documents and queries are in the same space)The angle, or the cosine of the angle is used as

2015-11-17 17:01:33 258

原创 Week3-2Thesaurus-based Word Similarity Methods

Remember word netGiven 2 words, we could calculate the number of the links between these 2 words in the word net forest(tree). The great the distance, the smaller the similarity.Path similarityVersion

2015-11-17 00:51:04 256

原创 Week4-1Syntax

SyntaxLanguage is more than a bag of words!Grammar rules apply to the categories and groups of words, not individual wordExample - a sentence includes a subject and a predicateLearn the new word a

2015-11-15 23:44:22 254

原创 Week3-1Semantic similarity:Synonymy and other Semantic Relations

Synonyms and paraphrasesSynonymDifferent words(also word compounds) can have similar meanings.True synonyms are actually relatively rare.Polysemy Polysemy is the property of words to have multiple

2015-11-15 23:35:30 302

原创 Week1-1Human language and animal communication systems

About the courseAll about the language varietySimilarities between languagesDifference between animal communication system and human languageThree dimensionsdiscrete infinity: Although the alphabet

2015-11-08 22:08:15 827

原创 Week2-7Preprocessing

Convert the raw text to the format that is easier to process.Text preprocessingType and tokens Type is any sequence of characters that represent a specific word, token is any occurrence of type. So

2015-11-07 14:45:03 214

原创 Week2-6NACLO

Introduction

2015-11-07 14:24:24 242

原创 Week2-5Spelling similarity:edit distance

Spelling similarityTyposVariants in spellingEdit operationsInsertionDeletionSubstitutionMultiple editsLevenstein methodBased on dynamic programmingInsertions, deletions and substitutions usua

2015-11-07 01:29:44 269

原创 Week2-4Morphological similarity:stemming

Whether 2 words are morphologically related.Stemming to reduce the word to its basic form, which is called the stem, after removing various suffixes and endings, and sometimes performing additional

2015-11-05 16:54:30 243

原创 Week2-3Text similarity:introduction

Text similarity People can express the same concepts(or related concepts) in many different ways.Key component of the NLPHuman judgement of similarityPeople are asked to give the similarity of certa

2015-11-05 14:45:55 193

空空如也

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除