关于org.apache.lucene.queryParser.ParseException: Encountered "<EOF>" 解决方法

现象:

org.apache.lucene.queryParser.ParseException: Encountered "<EOF>" at line 1, column 0.
Was expecting one of:
    <NOT> ...
    "+" ...
    "-" ...
    "(" ...
    <QUOTED> ...
    <TERM> ...
    <PREFIXTERM> ...
    <WILDTERM> ...
    "[" ...
    "{" ...
    <NUMBER> ...
    
at org.apache.lucene.queryParser.QueryParser.generateParseException(QueryParser.java:1226)
at org.apache.lucene.queryParser.QueryParser.jj_consume_token(QueryParser.java:1109)
at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:759)
at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:684)
at ch2.lucenedemo.process.Test.RunVsIndex(Test.java:142)
at ch2.lucenedemo.process.Test.main(Test.java:169)

方法一:

 如果出现了下列错误,那是因为用错了函数。把queryParser.Query改称queryParser.parse就通过了

方法二:

1、提问:

I am working on a classification problem to classify product reviews as positive, negative or neutral as per the training data using Lucene API.

I am using an ArrayList of Review objects - "reviewList" that stores the attributes for each review while crawling the web pages.

The review attributes which include "polarity" & "review content" are then indexed using the indexer. Thereafter, based on the indexes objects, I need to classify the remaining review objects. But while doing so, there is a review object for which the Query parser is encountering an EOF character in the "review content", and hence terminating.

The line causing error has been commented accordingly -

IndexReader reader = IndexReader.open(FSDirectory.open(new File("index")));
    IndexSearcher searcher = new IndexSearcher(reader);
    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_31);
    QueryParser parser = new QueryParser(Version.LUCENE_31, "Review", analyzer);

    int length = Crawler.reviewList.size();
    for (int i = 200; i < length; i++) {
        String true_class;
        double r_stars = Crawler.reviewList.get(i).getStars();

        if (r_stars < 2.0) {
            true_class = "-1";
        } else if (r_stars > 3.0) {
            true_class = "1";
        } else {
            true_class = "0";
        }

        String[] reviewTokens = Crawler.reviewList.get(i).getReview().split(" ");
        String parsedReview = "";

        int j;

        for (j = 0; j < reviewTokens.length; j++) {
            if (reviewTokens[j] != null) {
                if (!((reviewTokens[j].contains("-")) || (reviewTokens[j].contains("!")))) {
                    parsedReview += reviewTokens[j] + " ";
                }
            } else {
                break;
            }
        }

        Query query = parser.parse(parsedReview); // CAUSING ERROR!!

        TopScoreDocCollector results = TopScoreDocCollector.create(5, true);
        searcher.search(query, results);
        ScoreDoc[] hits = results.topDocs().scoreDocs;

I've parsed the text manually to remove the characters that are causing the error, apart from checking if the next string is null...but the error persists.

This is the error stack trace -

Exception in thread "main" org.apache.lucene.queryParser.ParseException: Cannot parse 'I made the choice ... be all "thumbs ': Lexical error at line 1, column 938.  Encountered: <EOF> after : "\"thumbs "
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:216)
at Sentiment_Analysis.Classification.classify(Classification.java:58)
at Sentiment_Analysis.Main.main(Main.java:17)
Caused by: org.apache.lucene.queryParser.TokenMgrError: Lexical error at line 1, column 938.  Encountered: <EOF> after : "\"thumbs "
at org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1229)
at org.apache.lucene.queryParser.QueryParser.jj_scan_token(QueryParser.java:1709)
at org.apache.lucene.queryParser.QueryParser.jj_3R_2(QueryParser.java:1598)
at org.apache.lucene.queryParser.QueryParser.jj_3_1(QueryParser.java:1605)
at org.apache.lucene.queryParser.QueryParser.jj_2_1(QueryParser.java:1585)
at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1280)
at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1266)
at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1313)
at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1266)
at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1226)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206)
... 2 more
Java Result: 1

Please help me solve this problem...have been banging my head with this for hours now!

2、问答

You should escape the double quote and other special characters via

Query query = parser.parse(QueryParser.escape(parsedReview));

As the QueryParser.escape Javadoc suggested,

Returns a String where those characters that QueryParser expects to be escaped are escaped by a preceding '\'.

小结:使用 QueryParser的静态方法QueryParser.escape(string s),进行自动转义特殊字符后再进行关键字的查询

 

原文出处:

现象及方法一:

不设限, org.apache.lucene.queryParser.ParseException: Encountered "<EOF>" at line 1, column 0. https://blog.csdn.net/tengdazhang770960436/article/details/17881671

方法二:

https://stackoverflow.com/questions/10259907/lucene-exception-query-parser-encountered-eof-after-some-word

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值