clickhouse导入csv某些列

安装配置

清华镜像下载:https://mirrors.tuna.tsinghua.edu.cn/clickhouse/rpm/stable/x86_64/

事实上,只需要安装以下3个就可以了:

clickhouse-client-20.4.4.18-2.noarch.rpm                                                                                                                                                                               
clickhouse-server-20.4.4.18-2.noarch.rpm                                                                                                                                                                               
clickhouse-common-static-20.4.4.18-2.x86_64.rpm 
# rpm -ivh *.rpm
Preparing...                          ################################# [100%]
Updating / installing...
   1:clickhouse-common-static-20.4.4.1################################# [ 33%]
   2:clickhouse-client-20.4.4.18-2    ################################# [ 67%]
   3:clickhouse-server-20.4.4.18-2    ################################# [100%]
Created symlink from /etc/systemd/system/multi-user.target.wants/clickhouse-server.service to /etc/systemd/system/clickhouse-server.service.
Path to data directory in /etc/clickhouse-server/config.xml: /var/lib/clickhouse/

修改以下目录(涉及数据路径的最好都改了):

    <logger>
        <!-- Possible levels: https://github.com/pocoproject/poco/blob/poco-1.9.4-release/Foundation/include/Poco/Logger.h#L105 -->
        <level>trace</level>
        <log>/var/log/clickhouse-server/clickhouse-server.log</log>
        <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
        <size>1000M</size>
        <count>10</count>
        <!-- <console>1</console> --> <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->
    </logger>

    <!-- Path to data directory, with trailing slash. -->
    <path>/var/lib/clickhouse/</path>

    <!-- Path to temporary data for processing hard queries. -->
    <tmp_path>/var/lib/clickhouse/tmp/</tmp_path>

然后启动服务:

# service clickhouse-server start
Start clickhouse-server service: Path to data directory in /etc/clickhouse-server/config.xml: /export/ch/
DONE

测试:

# clickhouse-client 
ClickHouse client version 20.4.4.18 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 20.4.4 revision 54434.

host :) select 1

SELECT 1

┌─1─┐
│ 1 │
└───┘

1 rows in set. Elapsed: 0.001 sec. 

导数据

假如要处理csv数据,只取某些列,可以使用cut命令:

# 从order.csv里读取,以逗号(-d,)为分隔符,取2、3、4、6列,输出到geo.csv里
cut -d, -f6,2,3,4 < order.csv > geo.csv

从csv里导数据

clickhouse-client --host=127.0.0.1 --query="insert into t_order_table01 format CSVWithNames" < geo.csv

然后就遇到错误:

# clickhouse-client --host=127.0.0.1 --query="insert into t_order_table01 format CSVWithNames" < geo.csv
Code: 27. DB::Exception: Cannot parse input: expected ',' before: '.31263,29.560795,2014-03-03 11:43:59,3905553754435746098\n106.312758,29.562841,2014-03-03 11:44:29,3905553756460440319\n106.31282,29.564806,2014-03-03 11:44:59,39': (at row 1)

Row 1:
Column 0,   name: geo_idx,  type: UInt64,   parsed text: "106"
ERROR: garbage after UInt64: ".31263,29."

这个错误是由于其读取csv文件是按列定义的顺序读取的,第一列是uint64,但我们csv文件里的顺序是乱的,所以类型不匹配,修改顺序:

clickhouse-client --host=127.0.0.1 --query="insert into t_order_table01(lon,lat,datetime,geo_idx) format CSVWithNames" < geo.csv
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值