一种最简单的方式,就是在数据库端将Blob转换为字符:
select PROPOSAL_ID,TITLE,UTL_RAW.CAST_TO_VARCHAR2(CONTENT) as CONTENT,UTL_RAW.CAST_TO_VARCHAR2(ATTACHMENT) as ATTACHMENT from bcc_proposal
但是这种方法容易造成数据库内存问题。所以考虑另外一种方法,就是在导入solr的时候,利用BlobTransformer将Blob转换为String。继续看下文:
在上一篇文章《在Idea下编译solr 6.1源码》中,我们可以在Idea中查看solr源程序了。在本篇文章中主要介绍通过修改源码在solr中支持Blob字段的导入。
步骤一:新增BlobTransformer .java
package org.apache.solr.handler.dataimport;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.sql.Blob;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
public class BlobTransformer extends Transformer {
public static final String BLOB = "blob";
public Object transformRow(Map<String, Object> aRow, Context context) {
for (Map<String, String> map : context.getAllEntityFields()) {
String fmt = map.get(BLOB);
if (fmt == null) continue;
String column = map.get(DataImporter.COLUMN);
String srcCol = map.get(RegexTransformer.SRC_COL_NAME);
if (srcCol == null) srcCol = column;
Object o = aRow.get(srcCol);
if (o instanceof List) {
List inputs = (List) o;
List<String> results = new ArrayList<String>();
for (Object input : inputs)
results.add(process(input));
aRow.put(column, results);
} else if (o instanceof Blob){
Blob blob = (Blob)o;
aRow.put(column, readFromBlob(blob));
}
}
return aRow;
}
private String process(Object value) {
if (value == null) return null;
byte[] bdata = (byte[]) value;
return new String(bdata);
}
private String readFromBlob(Blob blob) {
try{
InputStream is = blob.getBinaryStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is,"UTF-8"));
String str = "";
String res = "";
while((str=br.readLine())!=null){
res += str;
}
return res;
}catch (Exception e) {
e.printStackTrace();
return "";
}
}
}
步骤二:
编译源代码。在solr-6.1.0\solr目录下运行ant server
步骤三:
配置db_data_config.xml文件
<dataConfig>
<dataSource name="jdbcds" type="JdbcDataSource" driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:@//192.168.60.144:1521/NPCBJ" user="user" password="password" />
<document>
<entity dataSource="jdbcds" name="proposal" query="select * from BCC_PROPOSAL_INTELLIGENT_VIEW" deltaQuery="select PROPOSAL_ID from BCC_PROPOSAL_INTELLIGENT_VIEW where to_char(modify_time,'yyyy-mm-dd hh24:mi:ss') > '${dataimporter.last_index_time}'"
deltaImportQuery="select * from BCC_PROPOSAL_INTELLIGENT_VIEW where PROPOSAL_ID='${dataimporter.delta.PROPOSAL_ID}'"
convertType="true" transformer="BlobTransformer">
<field column="PROPOSAL_ID" name="id" />
<field column="TITLE" name="title" />
<field column="CONTENT" name="content" blob="true" />
</entity>
</document>
</dataConfig>
如此,就可以支持Blob字段了。为了支持她,不容易啊~
参考文章:
Blob values in my table are added to the Solr document as object strings like B@1f23c5
How to index blob field in Apache Solr indexing