clickhouse 在导入csv格式的文件时,因行格式不统一时报错的解决方案(列的数量和列属性不一致时)

 cat xab  | clickhouse-client -h 10.20.0.18 -u admin --password 1234567890   --ignore-error  --format_csv_delimiter "#" --query="insert into ai_datalake.t_tencent_statistic_all FORMAT CSV"
Code: 117, e.displayText() = DB::Exception: Expected end of line: (at row 125)

Row 124:
Column 0,   name: id,          type: UInt32,   parsed text: "989277"
Column 1,   name: category,    type: String,   parsed text: "qqmusic"
Column 2,   name: device_type, type: String,   parsed text: "X2"
Column 3,   name: device_id,   type: String,   parsed text: "A1EE527EA2"
Column 4,   name: user_id,     type: String,   parsed text: "20579517"
Column 5,   name: app_id,      type: String,   parsed text: "MB-SD0-0000"
Column 6,   name: query,       type: String,   parsed text: "放一首地铁等待"
Column 7,   name: create_time, type: DateTime, parsed text: "2021-06-19 21:23:31"
Column 8,   name: update_time, type: DateTime, parsed text: "2021-06-19 21:23:31"

Row 125:
Column 0,   name: id,          type: UInt32,   parsed text: "389278"
Column 1,   name: category,    type: String,   parsed text: "qqmusic"
Column 2,   name: device_type, type: String,   parsed text: "X2"
Column 3,   name: device_id,   type: String,   parsed text: "A1EE5EE05B"
Column 4,   name: user_id,     type: String,   parsed text: "249588"
Column 5,   name: app_id,      type: String,   parsed text: "NULL"
Column 6,   name: query,       type: String,   parsed text: "播放富田麻帆的FlyMetotheStar"
Column 7,   name: create_time, type: DateTime, parsed text: "10"
Column 8,   name: update_time, type: DateTime, parsed text: "2021-06-19 21:23:31"
ERROR: There is no line feed. "2" found instead.
 It's like your file has more columns than expected.
And if your file have right number of columns, maybe it have unquoted string value with comma.

, Stack trace (when copying this message, always include the lines below):

0. DB::skipEndOfLine(DB::ReadBuffer&) @ 0xf721332 in /usr/lib/debug/.build-id/ee/ebfd4dbce267441a591c94ed84fd1931da0b54.debug
1. DB::CSVRowInputFormat::readRow(std::__1::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::__1::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn> > >&, DB::RowReadExtension&) @ 0xf72038e in /usr/lib/debug/.build-id/ee/ebfd4dbce267441a591c94ed84fd1931da0b54.debug
2. DB::IRowInputFormat::generate() @ 0xf6ff388 in /usr/lib/debug/.build-id/ee/ebfd4dbce267441a591c94ed84fd1931da0b54.debug
3. DB::ISource::tryGenerate() @ 0xf68d8b5 in /usr/lib/debug/.build-id/ee/ebfd4dbce267441a591c94ed84fd1931da0b54.debug
4. DB::ISource::work() @ 0xf68d4aa in /usr/lib/debug/.build-id/ee/ebfd4dbce267441a591c94ed84fd1931da0b54.debug
5. DB::ParallelParsingInputFormat::InternalParser::getChunk() @ 0xf755cce in /usr/lib/debug/.build-id/ee/ebfd4dbce267441a591c94ed84fd1931da0b54.debug
6. DB::ParallelParsingInputFormat::parserThreadFunction(std::__1::shared_ptr<DB::ThreadGroupStatus>, unsigned long) @ 0xf75532e in /usr/lib/debug/.build-id/ee/ebfd4dbce267441a591c94ed84fd1931da0b54.debug
7. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x8530578 in /usr/lib/debug/.build-id/ee/ebfd4dbce267441a591c94ed84fd1931da0b54.debug
8. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()&&...)::'lambda'()::operator()() @ 0x853252f in /usr/lib/debug/.build-id/ee/ebfd4dbce267441a591c94ed84fd1931da0b54.debug
9. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x852dabf in /usr/lib/debug/.build-id/ee/ebfd4dbce267441a591c94ed84fd1931da0b54.debug
10. void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()> >(void*) @ 0x85315e3 in /usr/lib/debug/.build-id/ee/ebfd4dbce267441a591c94ed84fd1931da0b54.debug
11. start_thread @ 0x7ea5 in /usr/lib64/libpthread-2.17.so
12. clone @ 0xfeb0d in /usr/lib64/libc-2.17.so
 (version 21.3.12.2 (official build))
Code: 117, e.displayText() = DB::Exception: Expected end of line: (at row 125)

解决方案: 

指定容错参数:

--input_format_allow_errors_num arg       指定允许存在错误的行数或次数(忽略错误,执行下一行)

Maximum absolute amount of errors while reading text formats (like CSV, TSV).

In case    of error, if at least absolute or relative amount of errors is lower than  corresponding value, will skip until next line and continue.


  --input_format_allow_errors_ratio arg         (百分比)                   

Maximum relative amount of errors while reading text formats (like CSV, TSV). In case 
                                                                   of error, if at least absolute or relative amount of errors is lower than 
                                                                   corresponding value, will skip until next line and continue.

cat xab | clickhouse-client -h 10.20.2.18 -u admin --password 123.com   --ignore-error  --input_format_allow_errors_num 100 --format_csv_delimiter "#" --query="insert into ai_datalake.t_tencent_statistic_all FORMAT CSVWithNames"

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值