data-config.xml配置示例:
<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://127.0.0.1:3306/video"
user="root"
password="root"
batchSize="-1"/>
<document>
<entity name="video" pk="v_id" query="SELECT * FROM y2_video"
deltaImportQuery="select * from y2_video where v_id='${dataimporter.delta.v_id}'"
deltaQuery="select v_id from y2_video where v_create_at>UNIX_TIMESTAMP('${dataimporter.last_index_time}')"
>
<field column="v_id" name="v_id"/>
<field column="v_title" name="v_title"/>
<field column="v_thumb" name="v_thumb"/>
<field column="v_url" name="v_url"/>
<field column="v_tags" name="v_tags"/>
<field column="v_create_at" name="v_create_at"/>
<field column="v_last_index_time" name="v_last_index_time"/>
</entity>
</document>
</dataConfig>
其中batchSize="-1"这个配置很重要,如果不配置,百万级数据全量导入就内存溢出了
entity name="video" pk="v_id"
这个pk也很重要,不配置导入会很慢
deltaQuery=
这个查询语句只能返回表的id键,用来配合上面deltaImportQuery的dataimiporter.delta.v_id,增量导入的时候很重要