hive2solr multivalue功能实现

最新推荐文章于 2024-03-05 16:47:45 发布

weixin_33901641

最新推荐文章于 2024-03-05 16:47:45 发布

阅读量139

点赞数

原文链接：http://blog.51cto.com/caiguangguang/1433770

版权

之前介绍了github上的hive2solr项目和solr的multivalue功能。
线上我们是采用hive计算完数据后推送到solr的方法，如果需要实现multivalue的话，默认的hive2solr是有些问题的。即使在hive中对于的field是多个字，导入solr之后也只是一个整体的字符串，比如下面表的数据如下：

id        test_s  test_ss
3       d       f d h

其中test_ss为multivalue类型,导入solr之后：

{
        "test_ss": [
          "f d h"  //识别为一个元素
        ],
        "test_s": "d",
        "id": "3",
        "_version_": 1472413953618346000
      }

如果直接由hive生成数组插入solr会报array转换string失败的错误。

select id,test_s,split(test_ss,' ') from t2;
FAILED: NoMatchingMethodException No matching method for class org.apache.hadoop.hive.ql.udf.UDFToString 
with (array<string>). Possible choices: _FUNC_(void)  _FUNC_(boolean)  _FUNC_(tinyint)  _FUNC_(smallint) 
 _FUNC_(int)  _FUNC_(bigint)  _FUNC_(float)  _FUNC_(double)  _FUNC_(string)  _FUNC_(timestamp)  _FUNC_(decimal)  _FUNC_(binary)

在hive向solr写入数据主要通过SolrWriter的write方法实现的，其最终是调用了SolrInputDocument的setField方法,可以通过更改代码为如下内容来workaround。
SolrWriter的write方法：

     @Override
     public void write(Writable w) throws IOException {
          MapWritable map = (MapWritable) w;
          SolrInputDocument doc = new SolrInputDocument();
          for (final Map.Entry<Writable, Writable> entry : map.entrySet()) {
               String key = entry.getKey().toString();
               doc.setField(key, entry.getValue().toString());  //调用了SolrInputDocument的setField方法
          }
          table.save(doc);
     }

更改为：

    @Override
    public void write(Writable w) throws IOException {
            MapWritable map = (MapWritable ) w;
            SolrInputDocument doc = new SolrInputDocument();
            for (final Map.Entry<Writable , Writable> entry : map.entrySet()) {
                    String key = entry.getKey().toString();
                    String value = entry.getValue().toString();
                    String[] sl = value.split( "\\s+");  //即把hive输入的数据通过空格分隔，切成数组（hive的sql只要concact即可）      
                    List<String> valuesl = java.util.Arrays.asList(sl);
                    log.info("add entry value lists:" + valuesl);
                    for(String vl :valuesl){
                            doc.addField(key,vl); //改为调用addFiled的方法，防止覆盖
                    }
            }
            table.save(doc);
    }

导入测试结果：

{
        "test_ss": [
          "f",
          "d",
          "h"
        ],
        "test_s": "d",
        "id": "3",
        "_version_": 1472422023801077800
      }

转载于:https://blog.51cto.com/caiguangguang/1433770

weixin_33901641

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hive2solr multivalue功能实现

之前介绍了github上的hive2solr项目和solr的multivalue功能。线上我们是采用hive计算完数据后推送到solr的方法，如果需要实现multivalue的话，默认的hive2solr是有些问题的。即使在hive中对于的field是多个字，导入solr之后也只是一个整体的字符串，比如下面表的数据如下：idtest_stest_ss3...
复制链接

扫一扫