NLP(Michigan)
文章平均质量分 76
zypandora
这个作者很懒,什么都没留下…
展开
-
Week7-5Statistical POS tagging
POS tagging methodsrule-basedstochastic HMM(generative)Maximum Entropy(discriminative)transfer-basedHMM taggingT=argmaxP(T∣W)P(T∣W)=P(W∣T)P(T)P(W)T = \arg \max P(T \mid W)\\P(T \mid W ) = \frac原创 2016-01-04 22:43:04 · 269 阅读 · 0 评论 -
Week7-2POS tagging
POSOpen class nouns, non-modal verbs, adjectives, adverbsClosed class prepositions, modal verbs, conjunctions, particles, determiners, pronounsPenn Treebank tag set: the label IN indicates all of原创 2015-12-31 16:57:40 · 335 阅读 · 0 评论 -
Week5-1Parsing Recap
Programming languages are designed to be disambiguous.Noun-noun compoundHead of the compoundCollege junior - a kind of juniorJunior College - a kind of collegeHead first?Attorney generalAdjective原创 2015-11-29 16:32:04 · 258 阅读 · 0 评论 -
Week6-2Bayesian theorem
Bayes’ theoremFormula for joint probability P(A,B)=P(B∣A)P(A)=P(A∣B)P(B)P(A,B) = P( B \mid A)P(A) = P(A \mid B)P(B)ThereforeP(A∣B)=P(B∣A)P(A)P(B)P(A \mid B) = \frac{P(B \mid A)P(A)}{P(B)}ExampleW原创 2015-12-16 11:57:02 · 109 阅读 · 0 评论 -
Week4-5The Penn treebank
DescriptionBackgroundEarly 90’sdeveloped at University of PennsylvaniaMost cited paper in NLP!!!Size40000 training sentences2400 test sentencesGerneMostly Wall Street journal news stories and s原创 2015-11-28 16:40:26 · 332 阅读 · 0 评论 -
Week6-1Probabilities
Probabilistic reasoningpeech recognition recognize speech and wreck a nice beachmachine translation l’avocat general: the attorney general and the general avocadoprobabilities make it possible to原创 2015-12-15 10:55:15 · 248 阅读 · 0 评论 -
Week4-4Earley Parser
BackgroundDeveloped by Jay Earley in 1970No need to convert grammar to CNFLeft to rightComplexityfast than O(n3)O(n^3) in many casesEarley Parserlook for both full and partial constituentswhen re原创 2015-11-27 18:01:53 · 578 阅读 · 0 评论 -
Week5-8Alternative Parsing Formalisms
Mildly Context-sensitive grammars-Tree substitution grammars(TSG) - Terminals generate entire tree fragments - TSG and CFG are formally equivalent - Tree adjoining grammar(TAG) - C原创 2015-12-14 17:59:58 · 286 阅读 · 0 评论 -
Wee5-6Lexicalized parsing
One step up from Context Free Parsing.Limitations of PCFGsThe probabilities do not depend on the specific wordsNot possible to disambiguate sentences based on semantic informationIdea: lexicalized g原创 2015-12-12 15:15:47 · 310 阅读 · 0 评论 -
Week3-7NLP task 3/3
Q & AJeopardy gameSentiment analysis Understand the sentiment in the text, whether they are positive or negative, or have more polars.Machine TranslationGoogle translateMosesNoise channel model原创 2015-11-19 14:08:35 · 482 阅读 · 0 评论 -
Week5-7Dependency parsing
#Dependency structure blue modifier, dependent, child, subordinatehouse head, governor, parent, regentPhrase structureDependency structure Dependency grammarCharacteristics lexical/syntactic depen原创 2015-12-13 16:52:03 · 408 阅读 · 0 评论 -
Week7-3HMM1
Markov modelsequence of random variables that are not independent weather reporttextPropertieslimited horizon P(Xt+1=sk∣X1,...,Xt)=P(Xt+1=s∣Xt)P(X_{t+1} = s_k \mid X_1, ..., X_t) = P(X_{t+1} = s\原创 2016-01-01 13:35:32 · 333 阅读 · 0 评论 -
Week7-1Noisy channel model
The noisy channel modelExample: Input: written English(X)Encoder: garble the input(X->Y)Output: spoken English(Y)More examples: Grammatical english to english with mistakesEnglish to bitmaps(cha原创 2015-12-22 00:08:18 · 739 阅读 · 0 评论 -
Week6-7Word Sense Disambiguation
Introductionpolysemy words have multiple senseshomonymyPOS ambiguityWord sense disambiguationTask given a wordand its contextUse for machine translation e.g., translate ‘play’ into Spanishpl原创 2015-12-20 16:55:29 · 552 阅读 · 0 评论 -
Week6-6Language Modelling3
Evaluation of LMExtrinsicIntrinsicCorrelate the two for validation purposesIntrinsic: PerplexityDoes the model fit the data? A good model will give high probability to a real sentence.Perplexity原创 2015-12-20 16:09:52 · 718 阅读 · 0 评论 -
Week7-4HMM2
Observation likelihoodGiven multiple HMMs which one is most likely to generate the observation sequenceNaiive solution try all possible state sequencesForward algorithmcompute a forward trellis t原创 2016-01-04 22:02:12 · 334 阅读 · 0 评论 -
Week6-3,4Language Modelling1
Probabilistic language modelAssign a probability to a sentence P(S)=P(w1,w2,...,wn)P(S) = P(w_1, w_2, ..., w_n)Different from deterministic methods using CFGThe sum of the probabilities of all poss原创 2015-12-17 13:16:21 · 286 阅读 · 0 评论 -
Week6-5Language Modelling2
SmoothingIf the vocabulary size is ∣V∣=1M \mid V \mid = 1M Too many parameters to estimate even a unigram modelMLE assigns value of 0 to unseen data, let alone bigram and trigram.Smoothing(regular原创 2015-12-19 18:33:42 · 249 阅读 · 0 评论 -
Wee5-4PP attachment 3
Accuracy in test dataAlg2 63%Alg2a outperforms Alg2(Rule 3), 70%Summarization Memorizing everything is not a good idea!!What additional sources can we use to improve the algorithm?use a few more原创 2015-12-02 19:04:46 · 473 阅读 · 0 评论 -
Wee5-3PP attachment 2
Algorithms for PP attachmentAlg0Dumb one, to label all labels as a default label, say lowAlg1: random baselineA random unsupervised baseline would have to label each instance in the test data with a原创 2015-11-30 17:03:30 · 484 阅读 · 0 评论 -
Week5-2PP attachment 1
PP attachmentHigh(verbal, attached to VP)Low(nominal, attached to NP) with the net is attached to the word caught, and it has no associations with butterfly. We could formulate the PP attachment as原创 2015-11-30 16:51:30 · 425 阅读 · 0 评论 -
Week3-6NLP task 2/3
Information ExtractionInput: Sentences or documentsUnderstand different entities and other things, and change it to a form like table that is useful to people.OutputSemanticsFirst order logicInferenc原创 2015-11-18 16:14:14 · 294 阅读 · 0 评论 -
Week1-6Background
Linguistic knowledgeExampleConstituentsChildren eat pizza.They eat pizza.My cousin’s neighbor’s children eat pizza.Eat pizza!CollocationsStrong beer, NOT powerful beerbig sister, NOT large siste原创 2015-11-03 00:14:42 · 369 阅读 · 0 评论 -
Week3-5NLP task 1/3
NLP tasksPart of speech tagging Predict the part of speech for every word in the sentence.ParsingPhrase structure grammarParse treeStanford parserParsing resultDependency parsing原创 2015-11-18 15:49:13 · 392 阅读 · 0 评论 -
Week1-1Introduction
IntroductionThe course will be more introductoryMore focus on linguisticsWhat is NLP? The study of computational treatment of natural language(human language).Modern ApplicationsSearch enginesQ原创 2015-10-13 16:41:02 · 269 阅读 · 0 评论 -
Week2-5Spelling similarity:edit distance
Spelling similarityTyposVariants in spellingEdit operationsInsertionDeletionSubstitutionMultiple editsLevenstein methodBased on dynamic programmingInsertions, deletions and substitutions usua原创 2015-11-07 01:29:44 · 280 阅读 · 0 评论 -
Week2-7Preprocessing
Convert the raw text to the format that is easier to process.Text preprocessingType and tokens Type is any sequence of characters that represent a specific word, token is any occurrence of type. So原创 2015-11-07 14:45:03 · 223 阅读 · 0 评论 -
Week2-6NACLO
Introduction原创 2015-11-07 14:24:24 · 248 阅读 · 0 评论 -
Week2-4Morphological similarity:stemming
Whether 2 words are morphologically related.Stemming to reduce the word to its basic form, which is called the stem, after removing various suffixes and endings, and sometimes performing additional原创 2015-11-05 16:54:30 · 253 阅读 · 0 评论 -
Week2-3Text similarity:introduction
Text similarity People can express the same concepts(or related concepts) in many different ways.Key component of the NLPHuman judgement of similarityPeople are asked to give the similarity of certa原创 2015-11-05 14:45:55 · 204 阅读 · 0 评论 -
Week2-2Morphology
Mental lexicon We can infer the POS of the certain words or some properties of the unknown words, given the morphological representation of the words.Derivational MorphologyUndoable vs. unbelievable原创 2015-11-04 21:51:31 · 394 阅读 · 0 评论 -
Week1-5Why is NLP hard?
ExamplesTime flies like an arrow.InterpretationsMore ExamplesSyntax vs. semantics*Little a has Mary lambSyntactically wrong?Colorless green idea sleeps furiouslySyntactically ri原创 2015-10-13 17:39:52 · 303 阅读 · 0 评论 -
Week1-7Linguistics
IPA chartConsonentsVowalsMany languages are related to each otherLanguage changesDiversity of languagesLanguage universals原创 2015-11-03 17:05:23 · 197 阅读 · 0 评论 -
Week1-2Examples of text
Understanding a news storyCurrent eventBackground eventSpeculationPropertyPronominal referenceGenres of textWebsitePapersMedical recordsLiterary textsPoetry is even harder!!原创 2015-10-13 16:51:14 · 205 阅读 · 0 评论 -
Week1-4Administration
4 major parts and 3 goalsAvailable BooksOther coursesThe alphabet soupResearch in NLP原创 2015-10-13 17:19:38 · 188 阅读 · 0 评论 -
Week3-3The vector space model
Document similarityUsed in IR to determine which document(d1 or d2) is more similar to a given query q(the documents and queries are in the same space)The angle, or the cosine of the angle is used as原创 2015-11-17 17:01:33 · 279 阅读 · 0 评论 -
Week4-3Classic parsing methods
Parsing as searchThere are 2 types of constraints on the parses from the input sentencefrom the grammarTherefore 2 types of approaches to parsing Top-downBottom-up Shift-reduce parsinga bottom原创 2015-11-23 15:23:20 · 555 阅读 · 0 评论 -
Week3-4Dimensionality reduction
Problems with the simple vector approaches to similarityDimensionality reductionlooking for hidden similarities in databased on matrix decompositionMatrix decompositionSVDExampleAssume that we have原创 2015-11-17 17:52:35 · 634 阅读 · 0 评论 -
Week3-2Thesaurus-based Word Similarity Methods
Remember word netGiven 2 words, we could calculate the number of the links between these 2 words in the word net forest(tree). The great the distance, the smaller the similarity.Path similarityVersion原创 2015-11-17 00:51:04 · 273 阅读 · 0 评论 -
Wee5-5Statistical parsing
PCFGNeed for PCFGTime flies like an arrow Many parsesSome more likely than othersNeed for a probabilistic ranking methodDefinitionJust like CFG, a 4 tuple(N,Σ,R,S)(N, \Sigma, R, S)N: non-terminal原创 2015-12-10 01:22:54 · 564 阅读 · 0 评论