安装配置
清华镜像下载:https://mirrors.tuna.tsinghua.edu.cn/clickhouse/rpm/stable/x86_64/
事实上,只需要安装以下3个就可以了:
clickhouse-client-20.4.4.18-2.noarch.rpm
clickhouse-server-20.4.4.18-2.noarch.rpm
clickhouse-common-static-20.4.4.18-2.x86_64.rpm
# rpm -ivh *.rpm
Preparing... ################################# [100%]
Updating / installing...
1:clickhouse-common-static-20.4.4.1################################# [ 33%]
2:clickhouse-client-20.4.4.18-2 ################################# [ 67%]
3:clickhouse-server-20.4.4.18-2 ################################# [100%]
Created symlink from /etc/systemd/system/multi-user.target.wants/clickhouse-server.service to /etc/systemd/system/clickhouse-server.service.
Path to data directory in /etc/clickhouse-server/config.xml: /var/lib/clickhouse/
修改以下目录(涉及数据路径的最好都改了):
<logger>
<!-- Possible levels: https://github.com/pocoproject/poco/blob/poco-1.9.4-release/Foundation/include/Poco/Logger.h#L105 -->
<level>trace</level>
<log>/var/log/clickhouse-server/clickhouse-server.log</log>
<errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
<size>1000M</size>
<count>10</count>
<!-- <console>1</console> --> <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->
</logger>
<!-- Path to data directory, with trailing slash. -->
<path>/var/lib/clickhouse/</path>
<!-- Path to temporary data for processing hard queries. -->
<tmp_path>/var/lib/clickhouse/tmp/</tmp_path>
然后启动服务:
# service clickhouse-server start
Start clickhouse-server service: Path to data directory in /etc/clickhouse-server/config.xml: /export/ch/
DONE
测试:
# clickhouse-client
ClickHouse client version 20.4.4.18 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 20.4.4 revision 54434.
host :) select 1
SELECT 1
┌─1─┐
│ 1 │
└───┘
1 rows in set. Elapsed: 0.001 sec.
导数据
假如要处理csv数据,只取某些列,可以使用cut命令:
# 从order.csv里读取,以逗号(-d,)为分隔符,取2、3、4、6列,输出到geo.csv里
cut -d, -f6,2,3,4 < order.csv > geo.csv
从csv里导数据
clickhouse-client --host=127.0.0.1 --query="insert into t_order_table01 format CSVWithNames" < geo.csv
然后就遇到错误:
# clickhouse-client --host=127.0.0.1 --query="insert into t_order_table01 format CSVWithNames" < geo.csv
Code: 27. DB::Exception: Cannot parse input: expected ',' before: '.31263,29.560795,2014-03-03 11:43:59,3905553754435746098\n106.312758,29.562841,2014-03-03 11:44:29,3905553756460440319\n106.31282,29.564806,2014-03-03 11:44:59,39': (at row 1)
Row 1:
Column 0, name: geo_idx, type: UInt64, parsed text: "106"
ERROR: garbage after UInt64: ".31263,29."
这个错误是由于其读取csv文件是按列定义的顺序读取的,第一列是uint64,但我们csv文件里的顺序是乱的,所以类型不匹配,修改顺序:
clickhouse-client --host=127.0.0.1 --query="insert into t_order_table01(lon,lat,datetime,geo_idx) format CSVWithNames" < geo.csv