用Java代码在ElasticSearch中索引PDF文件？

最新推荐文章于 2024-03-06 15:52:27 发布

ywl470812087

最新推荐文章于 2024-03-06 15:52:27 发布

阅读量10w+

点赞数 6

分类专栏： ElasticSearch 文章标签： elasticsearch

原文链接：https://cloud.tencent.com/developer/ask/149562

版权

ElasticSearch 专栏收录该内容

67 篇文章 22 订阅

订阅专栏

以下是我的代码：

            InputStream inputStream = new FileInputStream(new File("mypdf.pdf"));
        try {
            byte[]  fileByteStream = IOUtils.toByteArray(inputStream );
            String base64String = new String(Base64.getEncoder().encodeToString(fileByteStream).getBytes(),"UTF-8");
            String strEncoded = Base64.getEncoder().encodeToString( base64String.getBytes( "utf-8" ));
            this.stream.close();

                    JSONObject correspondenceNode = new JSONObject(); 
                    correspondenceNode.put("data",strEncoded );

                    String strSsonValues = correspondenceNode.toString();
                    HttpEntity entity = new NStringEntity(strSsonValues , ContentType.APPLICATION_JSON);
                    elasticrestClient.put("/2018/documents/"1, entity);

        } catch (IOException e) {
            e.printStackTrace();
        }

以下是解码代码：

String responseBody = elasticrestClient.get("/2018/documents/1");
//some code to fetch the hits
JSONObject h = hitsArray.getJSONObject(0);
source = h.getJSONObject("_source");
String object = (source.getString("data"));
byte[] decodedStr = Base64.getDecoder().decode( object );

FileOutputStream fos = new FileOutputStream("download.pdf");
fos.write(Base64.getDecoder().decode(new String( decodedStr, "utf-8" )));
fos.close();

用户回答回答于 2018-08-02

提取文本和元数据，并将该URL指向二进制文件本身。

{
  "content": "Extracted text here",
  "meta": {
    // Meta data there
  },
  "url": "file://path/to/file"
}

ywl470812087

关注

6
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
用Java代码在ElasticSearch中索引PDF文件？

以下是我的代码： InputStream inputStream = new FileInputStream(new File("mypdf.pdf")); try { byte[] fileByteStream = IOUtils.toByteArray(inputStream ); String ba...
复制链接

扫一扫

专栏目录