使用doris 本地导入数据时报错:
[root@bigdata1 ~]# curl --location-trusted -u root -H "lable:112" -T testData.csv http://192.168.10.111:8040/api/example_db/example_range_tbl/_stream_load
Enter host password for user 'root':
{
"TxnId": 4015,
"Label": "844c5a27-de49-4566-b0eb-443b28fde1c4",
"Comment": "",
"TwoPhaseCommit": "false",
"Status": "Fail",
"Message": "[INTERNAL_ERROR]too many filtered rows\n\n\t0# std::_Function_handler<void (doris::RuntimeState*, doris::Status*), doris::StreamLoadExecutor::execute_plan_fragment(std::shared_ptr<doris::StreamLoadContext>)::$_0>::_M_invoke(std::_Any_data const&, doris::RuntimeState*&&, doris::Status*&&) at /root/src/doris-2.0/be/src/common/status.h:354\n\t1# doris::FragmentMgr::_exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::RuntimeState*, doris::Status*)> const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:360\n\t2# std::_Function_handler<void (), doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701\n\t3# doris::ThreadPool::dispatch_thread() at /root/src/doris-2.0/be/src/util/threadpool.cpp:0\n\t4# doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562\n\t5# start_thread\n\t6# __clone\n",
"NumberTotalRows": 2,
"NumberLoadedRows": 0,
"NumberFilteredRows": 2,
"NumberUnselectedRows": 0,
"LoadBytes": 167,
"LoadTimeMs": 157,
"BeginTxnTimeMs": 3,
"StreamLoadPutTimeMs": 15,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 134,
"CommitAndPublishTimeMs": 0,
"ErrorURL": "http://192.168.10.111:8040/api/_load_error_log?file=__shard_8/error_log_insert_stmt_6546dd7fb93bcb16-b24816581d198ae_6546dd7fb93bcb16_b24816581d198ae"
}
打开 errorUrl 链接
报错:Reason: column(date) values is null while columns is not nullable.
查看表结构:show create table example_range_tbl;
CREATE TABLE `example_range_tbl` (
`user_id` largeint(40) NOT NULL COMMENT '用户id',
`date` date NOT NULL COMMENT '数据灌入日期时间',
`timestamp` datetime NOT NULL COMMENT '数据灌入的时间戳',
`city` varchar(20) NULL COMMENT '用户所在城市',
`age` smallint(6) NULL COMMENT '用户年龄',
`sex` tinyint(4) NULL COMMENT '用户性别',
`last_visit_date` datetime REPLACE NULL DEFAULT "1970-01-01 00:00:00" COMMENT '用户最后一次访问时间',
`cost` bigint(20) SUM NULL DEFAULT "0" COMMENT '用户总消费',
`max_dwell_time` int(11) MAX NULL DEFAULT "0" COMMENT '用户最大停留时间',
`min_dwell_time` int(11) MIN NULL DEFAULT "99999" COMMENT '用户最小停留时间'
) ENGINE=OLAP
AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) COMMENT 'OLAP'
PARTITION BY RANGE(`date`)
(PARTITION p201701 VALUES [('0000-01-01'), ('2017-02-01')),
PARTITION p201702 VALUES [('2017-02-01'), ('2017-03-01')),
PARTITION p2 VALUES [('2017-04-01'), ('2018-01-02')))
DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
PROPERTIES ( "replication_allocation" = "tag.location.default: 3", "is_being_synced" = "false", "storage_format" = "V2", "light_schema_change" = "true", "disable_auto_compaction" = "false", "enable_single_replica_compaction" = "false" );
理解:date 列不允许为 null ,但是导入数据给的是 null,所以任务失败
导入的本地文件内容为:
111 '2017-04-01' '2017-04-01 12:00:00' '上海' 20 1 '2017-04-01 11:00:00' 100 20 15
112 '2017-04-01' '2017-04-01 11:00:00' '上海' 20 1 '2017-04-01 10:00:00' 80 15 10
解决方案:去除本地文件内容中的代表字符串或日期格式的所有单引号
111 2017-04-01 2017-04-01 12:00:00 上海 20 1 2017-04-01 11:00:00 100 20 15
112 2017-04-01 2017-04-01 11:00:00 上海 20 1 2017-04-01 10:00:00 80 15 10
原因:是分割符之内的数据就是表对应的列的数据,但是文件中带有单引号的数据不是表中想要的日期格式,会自动转换成 null 值 ,但是表中 date 列为非空,所以报错