clickhouse安装可以看相关的文档,这个照着做就行。
https://clickhouse.com/docs/zh/getting-started/install/
$ grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
sudo apt-get install -y apt-transport-https ca-certificates dirmngr
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754
echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee \
/etc/apt/sources.list.d/clickhouse.list
sudo apt-get update
sudo apt-get install -y clickhouse-server clickhouse-client
sudo service clickhouse-server start
clickhouse-client # or "clickhouse-client --password" if you've set up a password.
一、常见问题
但容易碰到的几个问题:
1、登陆出现516问题
如果是进入默认用户,碰到以下的类似516问题
root@iZ9ni05fy7agndgpndc7gsZ:~# clickhouse-client
ClickHouse exception, code: 516, host: 192.168.0.108, port: 8888; Code: 516, e.displayText() = DB::Exception: default: Authentication failed: password is incorrect or there is no user with such name (version 20.9.2.20 (official build))
vi进入,并修改这个
/etc/clickhouse-server/users.d/default-password.xml
把下面yourpasswd设置成你需要,为空即不设明文密码。
<password>yourpasswd</password>
当然,建议后面改成非明文密码,但是只是玩玩或POC无所谓。
2、PID问题
我在WSL2下碰到。
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to talk to init daemon.
wsl系统中并不能使用systemd,可以试试:
songroom@DESKTOP-MEDPUTU:~$ sudo service clickhouse-server start
如果还不行,可以再加:
sudo chown -R clickhouse: '/var/run/clickhouse-server/'
songroom@DESKTOP-MEDPUTU:~$ clickhouse-client
ClickHouse client version 22.1.3.7 (official build).
Connecting to localhost:9000 as user default.
Code: 210. DB::NetException: Connection refused (localhost:9000). (NETWORK_ERROR)
songroom@DESKTOP-MEDPUTU:~$ sudo service clickhouse-server start
[sudo] password for songroom:
chown -R clickhouse: '/var/run/clickhouse-server/'
Will run su -s /bin/sh 'clickhouse' -c '/usr/bin/clickhouse-server --config-file /etc/clickhouse-server/config.xml --pid-file /var/run/clickhouse-server/clickhouse-server.pid --daemon'
Waiting for server to start
Waiting for server to start
Server started
如果还不行,再重装WSL后解决吧。
3、210问题
Code: 210. DB::NetException: Connection refused (localhost:9000). (NETWORK_ERROR)
我重装WSL后解决。
二、常见用法
1、uninstall clickhouse
sudo apt-get -y remove clickhouse*
sudo find / -name clickhouse* # 找出安装的其他文件,全部删除
2、设置密码
vi进入:
/etc/clickhouse-server/users.xml
3、远程访问
> 修改配置文件,将<listen_host>0.0.0.0</listen_host>注释去掉
> vim /etc/clickhouse-server/config.xml
4、示例数据集下载
POC需要比较标准的数据来验证数据性能。具体可以参考:
https://clickhouse.com/docs/en/getting-started/example-datasets/ontime/
比如,样本标例数据集ontime(这个是下载预处理好的分区数据,不用建表了):
curl -O https://datasets.clickhouse.com/ontime/partitions/ontime.tar
下载数据文件ontime.tar,16G,他包含了所有可以提供下载的数据,相当于就是个数据库格式的数据文件,
curl -O https://datasets.clickhouse.com/ontime/partitions/ontime.tar
解压缩,注意这里不用提前在/var/lib/clickhouse创建data/ontime数据库的文件夹,因为这个压缩文件解压的时候,会带着路径/data/ontime,
tar xvf ontime.tar -C /var/lib/clickhouse
重启ClickHouse Server,
sudo service clickhouse-server restart
如果在WSL下,请使用:
sudo service clickhouse-server start
可以进行常用的操作了。
看数据库名
ls /var/lib/clickhouse/data/
ls /var/lib/clickhouse/metadata/
ClickHouse自带了2个数据库:
default:默认数据库,未切换数据库时默认使用该数据。
system:系统数据库,一般不要操作该数据库。
SHOW DATABASES
查询:
查询总行数:
SELECT
sum(rows) AS `总行数`,
formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`,
formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`,
round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率`
FROM system.parts
Q0.
SELECT avg(c1)
FROM
(
SELECT Year, Month, count(*) AS c1
FROM ontime
GROUP BY Year, Month
);
Q1. 查询从2000年到2008年每天的航班数
SELECT DayOfWeek, count(*) AS c
FROM ontime
WHERE Year>=2000 AND Year<=2008
GROUP BY DayOfWeek
ORDER BY c DESC;
Q2. 查询从2000年到2008年每周延误超过10分钟的航班数。
SELECT DayOfWeek, count(*) AS c
FROM ontime
WHERE DepDelay>10 AND Year>=2000 AND Year<=2008
GROUP BY DayOfWeek
ORDER BY c DESC;
Q3. 查询2000年到2008年每个机场延误超过10分钟以上的次数
SELECT Origin, count(*) AS c
FROM ontime
WHERE DepDelay>10 AND Year>=2000 AND Year<=2008
GROUP BY Origin
ORDER BY c DESC
LIMIT 10;
Q4. 查询2007年各航空公司延误超过10分钟以上的次数
SELECT Carrier, count(*)
FROM ontime
WHERE DepDelay>10 AND Year=2007
GROUP BY Carrier
ORDER BY count(*) DESC;
Q5. 查询2007年各航空公司延误超过10分钟以上的百分比
SELECT Carrier, c, c2, c*100/c2 as c3
FROM
(
SELECT
Carrier,
count(*) AS c
FROM ontime
WHERE DepDelay>10
AND Year=2007
GROUP BY Carrier
)
JOIN
(
SELECT
Carrier,
count(*) AS c2
FROM ontime
WHERE Year=2007
GROUP BY Carrier
) USING Carrier
ORDER BY c3 DESC;
这样,就可以愉快地玩起clickhouse来了。
三、量化数据的实践
1、建库
CREATE DATABASE my_db
ENGINE = Ordinary
2、建表
建立根据代码来分区的表
CREATE TABLE my_db.stock_tb
(
code String,
datetime DateTime,
open Float32,
close Float32,
low Float32,
high Float32,
volume Float64,
money Float64,
factor Float32,
high_limit Float32,
low_limit Float32,
avg Float32,
pre_close Float32,
paused Float32,
open_interest Float64
)
ENGINE = MergeTree
PARTITION BY code
ORDER BY datetime
![在这里插入图片描述](https://img-blog.csdnimg.cn/d68f1ec6a2c544f186a0e8580b7bcb49.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAc29uZ3Jvb20=,size_20,color_FFFFFF,t_70,g_se,x_16)
查看动态表状态:
SELECT
table AS `表名`,
sum(rows) AS `总行数`,
formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`,
formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`,
round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率`
FROM system.parts
WHERE table IN ('stock_tb')
GROUP BY table