阿里的dataworks提供的数据集成,可以高效的完成将数仓的数据写入doris,但是目前提供的页面功能配置项,无法实现部分列更新这个设置。页面设置如:

根据阿里提供数据集成文档中,可以看出数据集成采用的是StreamLoad的方式写入到doris,故查询doris的官方文档,需要设置参数方可实现部分列更新,根据上述页面无添加参数的地方,故需要将页面话配置转换为脚本功能,添加需要参数方可实现部分列数据更新。如下:
需要添加设置参数为:
"loadProps": {
"partial_columns": true,
"strict_mode": false
}
整体脚本如下:
{
"transform": false,
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "odps",
"parameter": {
"partition": [
"dt=${ldt},hh=${lhour},mi=${lmi}"
],
"datasource": "odps_first",
"envType": 1,
"successOnNoPartition": false,
"tunnelQuota": "default",
"isSupportThreeModel": false,
"column": [
"login_id",
"driver_id",
"d_action_s_driving_0023",
"d_action_s_driving_0022",
"d_action_s_driving_0021"
],
"tableComment": "",
"enableWhere": false,
"table": "你的表名"
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "doris",
"parameter": {
"loadProps": {
"partial_columns": true,
"strict_mode": false,
"column_separator": "&*&"
},
"envType": 1,
"datasource": "user_profile",
"column": [
"login_id",
"driver_id",
"d_action_s_driving_0023",
"d_action_s_driving_0022",
"d_action_s_driving_0021"
],
"tableComment": "",
"streamLoadFormat": "json",
"batchSize": 10485760,
"maxBatchRows": 200000,
"table": "你的表名"
},
"name": "Writer",
"category": "writer"
},
{
"copies": 1,
"parameter": {
"nodes": [],
"edges": [],
"groups": [],
"version": "2.0"
},
"name": "Processor",
"category": "processor"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"locale": "zh_CN",
"speed": {
"throttle": true,
"concurrent": 3,
"mbps": 1
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}

被折叠的 条评论
为什么被折叠?



