Detecting Part of Speech--POS

POS

The context of the word is an important aspect of determining what type of word it is.

The tagging process

Tagging is the process of assigning a description to a token or a portion of text. This description is called a tag. POS tagging is the process of assigning a POS tag to a token. These tags are normally tags such as noun, verb, and adjective.

Process

  • Tokenizing the text
  • Determining/Identifying possible tags
  • resolving ambiguous tags

Methods


  • Rule-based: Rule-based taggers uses a set of rules and a dictionary of words and possible tags. The rules are used when a word has multiple tags. Rules often use the previous and/or following words to select a tag.
  • Stochastic: Stochastic taggers use is either based on the Markov model or are cue-based, which uses either decision trees or maximum entropy. Markov models are finite state machines where each state has two probability distributions. Its objective is to find the optimal sequence of tags for a sentence. Hidden Markov Models (HMM) are also used. In these models, the state transitions are not visible.

Importance Of POS

Proper tagging of a sentence can enhance the quality of downstream processing tasks.

Determining the POS, phrases, clauses, and any relationship between them is called parsing

POS tagging is used for many downstream processes such as question analysis and analyzing the sentiment of text.

Text indexing will frequently use POS data.

Speech processing can use tags to help decide how to pronounce words.

//Opennlp

try(Inputstream modelIn = new FileInputStream(new File(getModelDir(),"en-pos-maxent.bin"));)
{
    POSModel model = new POSModel(modelIn);
    POSTaggerME tagger = new POSTaggerME(model);

    String tags[] = tagger.tag(sentence);

    for(int i = 0; i < sentence.length; i++ )
    {
        System.out.print(sentence[i] + "/" + tags[i] + " ");
    }
    Sequence topSequence[] = tagger.topKSequence(sentence);
    for(int i = 0; i < topSequence.length; i ++)
    {
        System.out.println(topSequence[i]);

        double probabilities[] = topSequence[i].getProbs();

    }
}
catch(IOException)
{

}
//Stanfordnlp
MaxentTagger tagger = new MaxentTagger(getModelDir() + "//wsj-0-18-bidirectional-distsim.tagger");

List<List<HasWord>> sentences = MaxentTagger.tokenizeText(new BufferedReader(new FileReader("sentences.txt")));

List<TaggedWord> taggedSentence = tagger.tagSentence(sentence);

for (List<HasWord> sentence : sentences) 
{
    List<TaggedWord> taggedSentence = tagger.tagSentence(sentence);

    System.out.println(taggedSentence);
}

List<TaggedWord> taggedSentence = tagger.tagSentence(sentence);
for (List<HasWord> sentence : sentences) 
{
    List<TaggedWord> taggedSentence = tagger.tagSentence(sentence);
    System.out.println(Sentence.listToString(taggedSentence, false));
}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值