painless数字类型转换_如何在Elasticsearch / painless中将二进制数据转换回float数组...

I am trying to efficiently store and retrieve an array of floats in elasticsearch 6.7.

Numeroc doc values are sorted, which means I can't use them directly.

At first I was using the source value of the field, but the performance on a large query is not great.

I tried to encode the float array as binary and decode it inside my script. Unfortunately I'm stuck at converting a byte[4] array to a float in painless.

In Java this would look like this

Float.intBitsToFloat((vector_bytes[3] << 24) | ((vector_bytes[2] & 0xff) << 16) | ((vector_bytes[1] & 0xff) << 8) | (vector_bytes[0] & 0xff));

But discarding the sign with & 0xff throws a "Illegal tree structure." in painless.

Any idea on how to do this?

Minimal example:

Setting up the index

# Minimal example binary array

# Create the index

PUT binary_array

{

"mappings" : {

"_doc" : {

"properties" : {

"vector_bin": { "type" : "binary", "doc_values": true },

"vector": { "type" : "float" }

}

}

}

}

# Put two documents

PUT binary_array/_doc/1

{

"vector": [1.0, 1.1, 1.2],

"vector_bin": "AACAP83MjD+amZk/"

}

PUT binary_array/_doc/2

{

"vector": [3.0, 2.1, 1.2],

"vector_bin": "AABAQGZmBkCamZk/"

}

Sample search to convert the binary array back to the array

GET binary_array/_search

{

"script_fields": {

"vector_parsed": {

"script": {

"source": """

def vector_bytes = doc["vector_bin"].value.bytes;

def vector = new float[vector_bytes.length/4];

for (int i = 0; i < vector.length; ++i) {

def n = i*4;

// This would be the Java way, discarding the sign of bytes 0-2, but is raises a "Illegal tree structure." in painless

//def intBits = (vector_bytes[n+3] << 24) | ((vector_bytes[n+2] & 0xff) << 16) | ((vector_bytes[n+1] & 0xff) << 8) | (vector_bytes[n] & 0xff);

// This runs but gives incorrect results

def intBits = (vector_bytes[n+3] << 24) | ((vector_bytes[n+2] ) << 16) | ((vector_bytes[n+1] ) << 8) | (vector_bytes[n] );

vector[i] = Float.intBitsToFloat( intBits );

}

return vector;

"""

}

},

"vector_src": {

"script": """params._source["vector"]"""

}

}

}

解决方案

After some more investigation I realized that the bitwise and does work in painless, but the 0xff doesn't.

This solved my issue:

Float.intBitsToFloat( (vector_bytes[n+3] << 24) | ((vector_bytes[n+2] & 255) << 16) | ((vector_bytes[n+1] & 255) << 8) | (vector_bytes[n] & 255) )

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值