Detecting Part of Speech--POS

最新推荐文章于 2024-08-17 11:16:53 发布

HoiDev

最新推荐文章于 2024-08-17 11:16:53 发布

阅读量610

点赞数

分类专栏： NLP

本文链接：https://blog.csdn.net/qq_33938256/article/details/52763787

版权

NLP 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

POS
The tagging process
Importance Of POS

POS

The context of the word is an important aspect of determining what type of word it is.

The tagging process

Tagging is the process of assigning a description to a token or a portion of text. This description is called a tag. POS tagging is the process of assigning a POS tag to a token. These tags are normally tags such as noun, verb, and adjective.

Process

Tokenizing the text
Determining/Identifying possible tags
resolving ambiguous tags

Methods

Rule-based: Rule-based taggers uses a set of rules and a dictionary of words and possible tags. The rules are used when a word has multiple tags. Rules often use the previous and/or following words to select a tag.
Stochastic: Stochastic taggers use is either based on the Markov model or are cue-based, which uses either decision trees or maximum entropy. Markov models are finite state machines where each state has two probability distributions. Its objective is to find the optimal sequence of tags for a sentence. Hidden Markov Models (HMM) are also used. In these models, the state transitions are not visible.

Importance Of POS

Proper tagging of a sentence can enhance the quality of downstream processing tasks.

Determining the POS, phrases, clauses, and any relationship between them is called parsing

POS tagging is used for many downstream processes such as question analysis and analyzing the sentiment of text.

Text indexing will frequently use POS data.

Speech processing can use tags to help decide how to pronounce words.

//Opennlp

try(Inputstream modelIn = new FileInputStream(new File(getModelDir(),"en-pos-maxent.bin"));)
{
    POSModel model = new POSModel(modelIn);
    POSTaggerME tagger = new POSTaggerME(model);

    String tags[] = tagger.tag(sentence);

    for(int i = 0; i < sentence.length; i++ )
    {
        System.out.print(sentence[i] + "/" + tags[i] + " ");
    }
    Sequence topSequence[] = tagger.topKSequence(sentence);
    for(int i = 0; i < topSequence.length; i ++)
    {
        System.out.println(topSequence[i]);

        double probabilities[] = topSequence[i].getProbs();

    }
}
catch(IOException)
{

}

//Stanfordnlp
MaxentTagger tagger = new MaxentTagger(getModelDir() + "//wsj-0-18-bidirectional-distsim.tagger");

List<List<HasWord>> sentences = MaxentTagger.tokenizeText(new BufferedReader(new FileReader("sentences.txt")));

List<TaggedWord> taggedSentence = tagger.tagSentence(sentence);

for (List<HasWord> sentence : sentences) 
{
    List<TaggedWord> taggedSentence = tagger.tagSentence(sentence);

    System.out.println(taggedSentence);
}

List<TaggedWord> taggedSentence = tagger.tagSentence(sentence);
for (List<HasWord> sentence : sentences) 
{
    List<TaggedWord> taggedSentence = tagger.tagSentence(sentence);
    System.out.println(Sentence.listToString(taggedSentence, false));
}