- 博客(91)
- 收藏
- 关注
原创 Week1-3Language Diversity and Ethnologue
How many languages?Difficult to say…Estimate: 6k ~ 7k languages around the world given by EthnologueEthnologueEthnologue is free and you can use it to find information about countries, about languag
2017-03-21 16:22:40 657 1
原创 Week1-2Human Language Versus Other 'Languages'
QuestionDoes every human being have a language?We use languageto communicate ideasto thinkto show who we areLanguage?computer programming languagelanguage of flowerslanguage of musicImportant
2017-03-14 16:31:56 489
原创 Week1-0Introduction
The Course is all about diversity.SyllabusModule 1: IntroductionWhat makes language human?What do linguists do?What kinds of linguists are there?Module 2: SoundsHow do we pronounce sounds?How do
2016-08-31 00:06:03 570
原创 Week7-5Statistical POS tagging
POS tagging methodsrule-basedstochastic HMM(generative)Maximum Entropy(discriminative)transfer-basedHMM taggingT=argmaxP(T∣W)P(T∣W)=P(W∣T)P(T)P(W)T = \arg \max P(T \mid W)\\P(T \mid W ) = \frac
2016-01-04 22:43:04 254
原创 Week7-4HMM2
Observation likelihoodGiven multiple HMMs which one is most likely to generate the observation sequenceNaiive solution try all possible state sequencesForward algorithmcompute a forward trellis t
2016-01-04 22:02:12 318
原创 Week7-3HMM1
Markov modelsequence of random variables that are not independent weather reporttextPropertieslimited horizon P(Xt+1=sk∣X1,...,Xt)=P(Xt+1=s∣Xt)P(X_{t+1} = s_k \mid X_1, ..., X_t) = P(X_{t+1} = s\
2016-01-01 13:35:32 317
原创 Week7-2POS tagging
POSOpen class nouns, non-modal verbs, adjectives, adverbsClosed class prepositions, modal verbs, conjunctions, particles, determiners, pronounsPenn Treebank tag set: the label IN indicates all of
2015-12-31 16:57:40 324
原创 Week7-1Noisy channel model
The noisy channel modelExample: Input: written English(X)Encoder: garble the input(X->Y)Output: spoken English(Y)More examples: Grammatical english to english with mistakesEnglish to bitmaps(cha
2015-12-22 00:08:18 721
原创 Week6-7Word Sense Disambiguation
Introductionpolysemy words have multiple senseshomonymyPOS ambiguityWord sense disambiguationTask given a wordand its contextUse for machine translation e.g., translate ‘play’ into Spanishpl
2015-12-20 16:55:29 522
原创 Week6-6Language Modelling3
Evaluation of LMExtrinsicIntrinsicCorrelate the two for validation purposesIntrinsic: PerplexityDoes the model fit the data? A good model will give high probability to a real sentence.Perplexity
2015-12-20 16:09:52 698
原创 Week6-5Language Modelling2
SmoothingIf the vocabulary size is ∣V∣=1M \mid V \mid = 1M Too many parameters to estimate even a unigram modelMLE assigns value of 0 to unseen data, let alone bigram and trigram.Smoothing(regular
2015-12-19 18:33:42 235
原创 Week6-3,4Language Modelling1
Probabilistic language modelAssign a probability to a sentence P(S)=P(w1,w2,...,wn)P(S) = P(w_1, w_2, ..., w_n)Different from deterministic methods using CFGThe sum of the probabilities of all poss
2015-12-17 13:16:21 270
原创 Week6-2Bayesian theorem
Bayes’ theoremFormula for joint probability P(A,B)=P(B∣A)P(A)=P(A∣B)P(B)P(A,B) = P( B \mid A)P(A) = P(A \mid B)P(B)ThereforeP(A∣B)=P(B∣A)P(A)P(B)P(A \mid B) = \frac{P(B \mid A)P(A)}{P(B)}ExampleW
2015-12-16 11:57:02 93
原创 Week6-1Probabilities
Probabilistic reasoningpeech recognition recognize speech and wreck a nice beachmachine translation l’avocat general: the attorney general and the general avocadoprobabilities make it possible to
2015-12-15 10:55:15 236
原创 Week5-8Alternative Parsing Formalisms
Mildly Context-sensitive grammars-Tree substitution grammars(TSG) - Terminals generate entire tree fragments - TSG and CFG are formally equivalent - Tree adjoining grammar(TAG) - C
2015-12-14 17:59:58 274
原创 Week5-7Dependency parsing
#Dependency structure blue modifier, dependent, child, subordinatehouse head, governor, parent, regentPhrase structureDependency structure Dependency grammarCharacteristics lexical/syntactic depen
2015-12-13 16:52:03 392
原创 Wee5-6Lexicalized parsing
One step up from Context Free Parsing.Limitations of PCFGsThe probabilities do not depend on the specific wordsNot possible to disambiguate sentences based on semantic informationIdea: lexicalized g
2015-12-12 15:15:47 286
原创 Wee5-5Statistical parsing
PCFGNeed for PCFGTime flies like an arrow Many parsesSome more likely than othersNeed for a probabilistic ranking methodDefinitionJust like CFG, a 4 tuple(N,Σ,R,S)(N, \Sigma, R, S)N: non-terminal
2015-12-10 01:22:54 546
原创 Wee5-4PP attachment 3
Accuracy in test dataAlg2 63%Alg2a outperforms Alg2(Rule 3), 70%Summarization Memorizing everything is not a good idea!!What additional sources can we use to improve the algorithm?use a few more
2015-12-02 19:04:46 457
原创 Wee5-3PP attachment 2
Algorithms for PP attachmentAlg0Dumb one, to label all labels as a default label, say lowAlg1: random baselineA random unsupervised baseline would have to label each instance in the test data with a
2015-11-30 17:03:30 456
原创 Week5-2PP attachment 1
PP attachmentHigh(verbal, attached to VP)Low(nominal, attached to NP) with the net is attached to the word caught, and it has no associations with butterfly. We could formulate the PP attachment as
2015-11-30 16:51:30 412
原创 Week5-1Parsing Recap
Programming languages are designed to be disambiguous.Noun-noun compoundHead of the compoundCollege junior - a kind of juniorJunior College - a kind of collegeHead first?Attorney generalAdjective
2015-11-29 16:32:04 239
原创 Week4-5The Penn treebank
DescriptionBackgroundEarly 90’sdeveloped at University of PennsylvaniaMost cited paper in NLP!!!Size40000 training sentences2400 test sentencesGerneMostly Wall Street journal news stories and s
2015-11-28 16:40:26 316
原创 Week4-4Earley Parser
BackgroundDeveloped by Jay Earley in 1970No need to convert grammar to CNFLeft to rightComplexityfast than O(n3)O(n^3) in many casesEarley Parserlook for both full and partial constituentswhen re
2015-11-27 18:01:53 553
原创 Week4-3Classic parsing methods
Parsing as searchThere are 2 types of constraints on the parses from the input sentencefrom the grammarTherefore 2 types of approaches to parsing Top-downBottom-up Shift-reduce parsinga bottom
2015-11-23 15:23:20 538
原创 Week4-2Parsing
Parsing human languageRather different from computer languagesNo types for words(variable, comment, …)No brackets around phrasesAmbiguity wordsparsesImplied informationParsingParsing means asso
2015-11-22 17:59:52 260
原创 Week3-7NLP task 3/3
Q & AJeopardy gameSentiment analysis Understand the sentiment in the text, whether they are positive or negative, or have more polars.Machine TranslationGoogle translateMosesNoise channel model
2015-11-19 14:08:35 460
原创 Week3-6NLP task 2/3
Information ExtractionInput: Sentences or documentsUnderstand different entities and other things, and change it to a form like table that is useful to people.OutputSemanticsFirst order logicInferenc
2015-11-18 16:14:14 284
原创 Week3-5NLP task 1/3
NLP tasksPart of speech tagging Predict the part of speech for every word in the sentence.ParsingPhrase structure grammarParse treeStanford parserParsing resultDependency parsing
2015-11-18 15:49:13 379
原创 Week3-4Dimensionality reduction
Problems with the simple vector approaches to similarityDimensionality reductionlooking for hidden similarities in databased on matrix decompositionMatrix decompositionSVDExampleAssume that we have
2015-11-17 17:52:35 612
原创 Week3-3The vector space model
Document similarityUsed in IR to determine which document(d1 or d2) is more similar to a given query q(the documents and queries are in the same space)The angle, or the cosine of the angle is used as
2015-11-17 17:01:33 258
原创 Week3-2Thesaurus-based Word Similarity Methods
Remember word netGiven 2 words, we could calculate the number of the links between these 2 words in the word net forest(tree). The great the distance, the smaller the similarity.Path similarityVersion
2015-11-17 00:51:04 256
原创 Week4-1Syntax
SyntaxLanguage is more than a bag of words!Grammar rules apply to the categories and groups of words, not individual wordExample - a sentence includes a subject and a predicateLearn the new word a
2015-11-15 23:44:22 254
原创 Week3-1Semantic similarity:Synonymy and other Semantic Relations
Synonyms and paraphrasesSynonymDifferent words(also word compounds) can have similar meanings.True synonyms are actually relatively rare.Polysemy Polysemy is the property of words to have multiple
2015-11-15 23:35:30 302
原创 Week1-1Human language and animal communication systems
About the courseAll about the language varietySimilarities between languagesDifference between animal communication system and human languageThree dimensionsdiscrete infinity: Although the alphabet
2015-11-08 22:08:15 827
原创 Week2-7Preprocessing
Convert the raw text to the format that is easier to process.Text preprocessingType and tokens Type is any sequence of characters that represent a specific word, token is any occurrence of type. So
2015-11-07 14:45:03 214
原创 Week2-5Spelling similarity:edit distance
Spelling similarityTyposVariants in spellingEdit operationsInsertionDeletionSubstitutionMultiple editsLevenstein methodBased on dynamic programmingInsertions, deletions and substitutions usua
2015-11-07 01:29:44 269
原创 Week2-4Morphological similarity:stemming
Whether 2 words are morphologically related.Stemming to reduce the word to its basic form, which is called the stem, after removing various suffixes and endings, and sometimes performing additional
2015-11-05 16:54:30 243
原创 Week2-3Text similarity:introduction
Text similarity People can express the same concepts(or related concepts) in many different ways.Key component of the NLPHuman judgement of similarityPeople are asked to give the similarity of certa
2015-11-05 14:45:55 193
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人