以下是我从自己写的PDF说明书上COPY下来的,格式很乱,具体下载附件里的工程文件《抽取工具配置说明.pdf》有问题QQ联系:51003459 所有的包我都删除了,因为太大上传不了,要的QQ传。 zExtractor数据抽取工具配置说明 1. 对每一套的抽取项目编写一个独立的XML文件放在classes/conf/文件夹下。例如抽取A库到B库的所有气象数据配置在一个a.xml里,把抽取B库到C库的N张电网数据表配置在另一个b.xml里,当多个xml配置文件里有相同的数据源(比如以上的B库)该工具只生成一个B库的连接池。 2. XML配置文件中的属性名对大小写敏感。 3. <property name="type" value="ORACLE"></property>要显式定义 因为对于不同的数据库来说:select * from table 与select * from (table), select * from (Select * from table) 与select * from (Select * from table) as temp是不一样的,要通过type来判断。 4. 以下为最简配置: easyDemo <?xml version="1.0" encoding="UTF-8"?><extract-case casename="案例"><!-- 源数据库 --><source-database><property name="type" value="SQLSERVER"></property><property name="driverClassName" value="com.microsoft.jdbc.sqlserver.SQLServerDriver"></property><property name="url" value="jdbc:microsoft:sqlserver://192.168.104.5:1433;databaseName=zy"></property><property name="username" value="sa"></property><property name="password" value="xx"></property><property name="minIdle" value="2"></property><property name="maxActive" value="50"></property><property name="maxWait" value="30000"></property></source-database><!-- 目标数据库 --><destination-database><property name="type" value="ORACLE"></property><property name="driverClassName" value="oracle.jdbc.driver.OracleDriver"></property><property name="url" value="jdbc:oracle:thin:@10.142.1.197:1521:sa"></property><property name="username" value="gdprs"></property><property name="password" value="xxxxx"></property></destination-database><task fromtable="text1" fromschema="dbo" totable="text2" taskinfo="两个表字段一样的最简配置" toschema="gdprs" cronexpress="”0" issamefieldname="true"><extractkeylist><extractkey fromfield="id" tofield="id"></extractkey></extractkeylist></task></extract-case> 默认配置: 抽取器extractorClass ="com.epgis. extract.core. extractor. CommonExtractor" ; 优先级别priority = 10; 批量增加batchInsertSize = 50 一次性得到JDBC数据集RESULTSET的数量fetchSize = 10; 任务允许的最多X小时没有来数,如果超过X小时没来数则短信告知 maxNoneDataHours=null表示不用短信通知功能 5. 抽取工具通过配置的extractKey字段(可能是多个extractKey组成的复合主键)来唯一识别一条记录进行抽取,可以有两种配置: 1). <field-mapping><field name="id" sqltype="int" isextractkey="true"><tofield name="to_id" sqltype="varchar2" isextractkey="true"></tofield></field></field-mapping> 2). <extractkeylist><extractkey fromfield="id" tofield=" to_id "></extractkey></extractkeylist> 6. 源表与目标表的字段对应通过<field-mapping>来配置,在<field>的name写上源表的字段名,<tofield>写上目标表的字段名,最好显式写出sqlType。比如: <task fromtable="t_auto_rtsq" fromschema="gdprs" totable="text5" toschema="gdprs" repeatinterval="20000" startdelay="0" issamefieldname="false"><field-mapping><field name="id" sqltype="int" isextractkey="true"><tofield name="to_id" sqltype="varchar2" isextractkey="true"></tofield></field><field name="TIME" sqltype="varchar2"><tofield name="date" sqltype="varchar2"></tofield></field><field name="senid"><tofield name="sen_id"></tofield></field><field name="ifch"><tofield name="ifch"></tofield></field></field-mapping></task> 注:一般情况下可不配置field里的javaType、sqlType、 precision、scale当此字段为特殊字段时才要显式配上比如:sqlType=”blob”。 7. 如果源与目标表字段一样,可不配<field-mapping>,只要在<task>里配isSameFieldName="true"并配上extractKey: <task fromtable="t_auto_rtsq" totable="text5" repeatinterval="20000" startdelay="0" issamefieldname="true"><extractkeylist><extractkey fromfield="real_time" tofield="real_time"></extractkey> ........... </extractkeylist></task> 说明:(1) 如果isSameFieldName="true",再显式的配上以下字段映射xml时,则该字段(max_wind_speed)以以下XML显式配置的为准 <field-mapping><field name="max_wind_speed1" javatype="double" sqltype="varchar2" precision="10" scale="0"><tofield name="max_wind_speed2"></tofield></field></field-mapping> (2) 如果isSameFieldName="false",则只抽取在<field-mapping>中显式配置的字段。 8. 任务调度可用simpleTrigger与cronTrigger两种中的其中一种,两种都配上时以cronExpression为主。 1)repeatInterval="10000000" startDelay="0" 2) cronExpression="0/10 * * * * ?" 9. <task>里的fromTable 可以是一个表名也可以是一条SQL语句,这样可以灵活抽取所要数据(条件限制,合并某列来抽取等)比如: 1) 条件限制的例子: <task fromtable="select * from t_weather where real_time>'20090527010101'"></task> 2) 合并列的例子: <task fromtable="select id,id||filename as unionField from text" fromschema="gdprs" totable="text2" toschema="gdprs" taskinfo="测试合并列" repeatinterval="200000" startdelay="0" priority="10" issamefieldname="true" batchinsertsize="1000"><field-mapping><field name="unionField" isextractkey="false"><tofield name="filename" scale="0" isextractkey="false"></tofield></field><field name="id" javatype="int" sqltype="int" precision="10" scale="0" isextractkey="true"><tofield name="toid" isextractkey="true"></tofield></field></field-mapping></task> 3) 抽取A表的a字段与B表的b字段对应到C表(只有两个Ca,Cb字段时只要isSameField Name ="true" 就可): <task fromtable="select A.a as Ca , B.b as Cb from A,B" fromschema="test1" totable="test2" issamefieldname="true"></task><?xml version="1.0" encoding="UTF-8"?><configs><!-- 对抽取工具的全局变量定义 --><send-msg-phones><!-- 对数据断时进行短信告警 --><phone name="阿黎明">13850184475</phone><phone name="林雨场">13960866467</phone></send-msg-phones></configs> 10. 在<task>中配置alarmHours="24" 代表当超过24时都没有来数据时则发短信报知,短信号配置在config.xml中。 <?xml version="1.0" encoding="UTF-8"?><configs><!-- 对抽取工具的全局变量定义 --><send-msg-phones><!-- 对数据断时进行短信告警 --><phone name="张三">13945448484</phone><phone name="李四">13945448484</phone></send-msg-phones><msg-database><!-- 短信保存库 --><driverclass>oracle.jdbc.driver.OracleDriver</driverclass><url>jdbc:oracle:thin:@10.142.1.109:1521:gdprs</url><username>gdprs</username><password>xxxx</password></msg-database></configs> 11. 如果是抽取ORACLE中带有BLOB的字段(支持同时抽取多个BLOB字段),需要以下两个配置: 1) extractorClass="com.epgis.extract.core.extractor.CommonBlobExtractor" 2) 为BLOB字段的那个<filed>加上sqlType="blob" 示例如下: <task fromtable="t_weather_radar_picture" fromschema="gdprs" totable="text2" toschema="gdprs" repeatinterval="100000" startdelay="0" extractorclass="com.epgis.extract.core.extractor.CommonBlobExtractor" priority="10" issamefieldname="false"><field-mapping><field name="id" isprimarykey="true"><tofield name="toid" isprimarykey="true"></tofield></field><field name="picture" sqltype="blob"><tofield name="fileblob" sqltype="blob"></tofield></field><field name="picture" sqltype="blob"><tofield name="fileblob2" sqltype="blob"></tofield></field><field name="create_date"><tofield name="filename"></tofield></field></field-mapping><extractkeylist><extractkey fromfield="id" tofield="toid"></extractkey></extractkeylist></task> 12. 对字段可自定义拦截器进行转化,fieldInterceptor有两种配置方式: 1)在<field-mapping>字段里配 <field name="content" fieldinterceptor="com.epgis.extract.business. interceptor.TestFieldInterceptor"><tofield name="content2"></tofield></field> 2)当isSameFieldName="true"时,因为省略了配<field-mapping>,则可以用以下一种方式: <fieldinterceptorlist><fieldinterceptor fromfield="real_time" fromfieldinterceptor="com.String2DateFieldInterceptor" tofield="time" tofieldinterceptor="com.Date2StringFieldInterceptor"></fieldinterceptor> ........... </fieldinterceptorlist> 说明:一般情况下只要配源字段的拦截器,但如果这个字段同时又为extractKey的话,经过转换后到目标表的字段无法与源字段比较,所以要配一个目标表字段toField的拦截器让目标字段转回成和源字段一样格式的才可比较,比如:以上配了一个real_time的拦截器把2009-5-28转成了20090528抽取目标表,而real_time同时又是extractKey,这时抽取时就要配一个toField的拦截器把real_time转回成2009-5-28的格式才可以进行增量(比较大小)抽取。 例子:t_weather_cityweather_info里的real_time为字符串,而目标表text3里的time为Date类型,这时要加个拦截器把real_time转为date比如: public class String2DateFieldInterceptor implements IFieldInterceptor{ public Object convertField(Object field){ try { SimpleDateFormat df = new SimpleDateFormat("yyyyMMddHHmmss"); Date d = df.parse((String)field); java.sql.Timestamp sqlDate = new java.sql.Timestamp(d.getTime()); return sqlDate; } catch (ParseException e) { e.printStackTrace(); } return null; } 而这时extractKey又为real_time这个转换后的字段,所以要再配一个目标字段的拦截器(如果这个转换字段real_time不是extractKey就不要配) public class Date2StringFieldInterceptor implements IFieldInterceptor{ public Object convertField(Object field){ Date d = new Date(((java.sql.Timestamp)field).getTime()); SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmmss"); return sdf.format(d); } } 相应的XML配置: <task fromtable="cityweather" fromschema="gdprs" totable="text3" toschema="gdprs" repeatinterval="1000" startdelay="0" issamefieldname="true"><extractkeylist><!-- 以real_time为主键来抽取 --><extractkey fromfield="real_time" tofield="time"></extractkey></extractkeylist><fieldinterceptorlist><fieldinterceptor fromfield="real_time"> fromFieldInterceptor="com.epgis.extract.business.String2DateFieldInterce ptor " <!-- 把time字段的Date类型转为字符串2009080401010为了与源表比较,因为real_time为extractKey --> <tofield tofieldinterceptor="com.epgis.extract.business.Date2StringFieldInterceptor"></tofield></fieldinterceptor></fieldinterceptorlist><field-mapping><field name="real_time"><!-- real_time与time的对应因为字段名不同要显式声明 --><tofield name="time"></tofield></field></field-mapping></task> 13. 可自定义一个抽取器,要继承一个抽象类Extract实现execute方法,比如<task extractorclass="com.epgis.extract. core.extractor.TestExtractor"> public class TestExtractor extends Extractor { public void execute(){ // do execute logic } } 日志管理与任务明细查看 保存抽取的日志有三种方式.写入数据库,2.用extractDetail类保存实时日志,3. 用log4j写入文件里。 1.写入数据库(t_extract_log表)。 <?xml version="1.0" encoding="UTF-8"?><log-database><!-- 抽取日志保存库 --><driverclass>oracle.jdbc.driver.OracleDriver</driverclass><url>jdbc:oracle:thin:@10.142.1.109:1521:gdprs</url><username>gdprs</username><password>xxxxx</password></log-database> 2. 用extractDetail类保存实时日志 任务明细:http://10.142.4.191:8080/zExtractor/extractDetailShow.jsp 3.日志信息:http://10.142.4.191:8080/zExtractor/log/extractLog.html 类说明 MODEL层结构 /** * 抽象出所有抽取器的公有职能 * 提供一个抽取器的模版Template: * public abstract void execute(); * 当本系统默认的抽取器无法满足需求时,可继承此方法,自已实现execute()方法 * @Author :黄仕勇 * 2009-5-19 */ public abstract class Extractor implements IExtractor { /**具体的抽取实现*/ public abstract void execute(); } 附一:实际案例 <?xml version="1.0" encoding="UTF-8"?><extract-case casename="抽取城市天气预报数据案例"><!-- 源数据库 --><source-database><property name="type" value="SQLServer"></property><property name="driverClassName" value="com.microsoft.jdbc.sqlserver.SQLServerDriver"></property><property name="url" value="jdbc:microsoft:sqlserver://192.168.104.5:1433;databaseName=zyqx"></property><property name="username" value="sa"></property><property name="password" value="xxx"></property><property name="minIdle" value="2"></property><property name="maxActive" value="50"></property><property name="maxWait" value="30000"></property></source-database><!-- 目标数据库 --><destination-database><property name="type" value="ORACLE"></property><property name="driverClassName" value="oracle.jdbc.driver.OracleDriver"></property><property name="url" value="jdbc:oracle:thin:@10.142.13.178:1521:BIETL"></property><property name="username" value="ods"></property><property name="password" value="xxxx"></property></destination-database><task fromtable="select * from forecast where t_day>getdate()-365" totable="buf_qx_forecast" fromschema="dbo" repeatinterval="210000" startdelay="0" tastinfo="城市天气预报" priority="10" issamefieldname="true" batchinsertsize="30"><extractkeylist><extractkey fromfield="id" tofield="id"></extractkey></extractkeylist><field-mapping><field name="forecastTime"><tofield name="forecast_date"></tofield></field></field-mapping><fieldinterceptorlist><fieldinterceptor fromfield="forecastTime" fromfieldinterceptor="com.epgis.extract.business.interceptor.Date2StringFieldInterceptor"></fieldinterceptor><fieldinterceptor fromfield="t_day" fromfieldinterceptor="com.epgis.extract.business.interceptor.Date2StringFieldInterceptor"></fieldinterceptor></fieldinterceptorlist></task><task fromtable="TYPHOON_1" fromschema="dbo" totable="BUF_TF_TYPHOON_1" cronexpression="0 0/11 * * * ?" taskinfo="台风1" issamefieldname="true" batchinsertsize="30"><extractkeylist><extractkey fromfield="id" tofield="id"></extractkey></extractkeylist></task><task fromtable="TYPHOON_2" fromschema="dbo" totable="BUF_TF_TYPHOON_2" repeatinterval="210000" startdelay="0" taskinfo="台风2" issamefieldname="true" batchinsertsize="30"><extractkeylist><extractkey fromfield="id" tofield="id"></extractkey></extractkeylist><!-- 时间格式的拦截器 --><fieldinterceptorlist><fieldinterceptor fromfield="wdatatime" fromfieldinterceptor="com.epgis.extract.business.interceptor.Date2StringFieldInterceptor2"></fieldinterceptor><fieldinterceptor fromfield="nodetime" fromfieldinterceptor="com.epgis.extract.business.interceptor.Date2StringFieldInterceptor2"></fieldinterceptor></fieldinterceptorlist></task><task fromtable="TYPHOON_3" fromschema="dbo" totable="BUF_TF_TYPHOON_3" taskinfo="台风3" repeatinterval="210000" startdelay="0" issamefieldname="true" batchinsertsize="30"><extractkeylist><extractkey fromfield="id" tofield="id"></extractkey></extractkeylist><!-- 时间格式的拦截器 --><fieldinterceptorlist><fieldinterceptor fromfield="forecasttime" fromfieldinterceptor="com.epgis.extract.business.interceptor.Date2StringFieldInterceptor2"></fieldinterceptor><fieldinterceptor fromfield="comingtime" fromfieldinterceptor="com.epgis.extract.business.interceptor.Date2StringFieldInterceptor2"></fieldinterceptor><fieldinterceptor fromfield="wdatatime" fromfieldinterceptor="com.epgis.extract.business.interceptor.Date2StringFieldInterceptor2"></fieldinterceptor></fieldinterceptorlist></task><task fromtable="WEAFORECAST_2" fromschema="dbo" totable="BUF_QX_WEAFORECAST_2" cronexpression="0 0/29 * * * ?" taskinfo="三天九地市与四十小时九地市" issamefieldname="true" batchinsertsize="30"><extractkeylist><extractkey fromfield="id" tofield="id"></extractkey></extractkeylist></task><task fromtable="WEAFORECAST_1" fromschema="dbo" totable="BUF_QX_WEAFORECAST_1" cronexpression="0 0/29 * * * ?" taskinfo="三天九地市与四十小时九地市" issamefieldname="true" batchinsertsize="30"><extractkeylist><extractkey fromfield="id" tofield="id"></extractkey></extractkeylist></task><task fromtable="WEAINFO" fromschema="dbo" totable="BUF_QX_WEAINFO" cronexpression="0 0/3 * * * ?" taskinfo=" 地市气象站实时信息 每时刻12分运行一次,42分钟时也运行一次" issamefieldname="true" batchinsertsize="30"><extractkeylist><extractkey fromfield="id" tofield="id"></extractkey></extractkeylist><fieldinterceptorlist><fieldinterceptor fromfield="wdatatime" fromfieldinterceptor="com.epgis.extract.business.interceptor.Date2StringFieldInterceptor2"></fieldinterceptor><fieldinterceptor fromfield="wdate" fromfieldinterceptor="com.epgis.extract.business.interceptor.Date2StringFieldInterceptor2"></fieldinterceptor></fieldinterceptorlist></task></extract-case> 修订记录 日期 修订版本 修订内容 作者 090605 0.1 初稿拟制 黄仕勇 090615 0.2 初步完成编写</task></field-mapping></field-mapping></filed></task></task></field-mapping></task></field-mapping></tofield></field></field-mapping>
实现ORACLE与SQLSERVER数据库间的数据抽取与转换工具
最新推荐文章于 2023-06-07 15:15:48 发布