clickhouse: WSL下常见问题、常见用法和A股数据实践

clickhouse安装可以看相关的文档,这个照着做就行。
https://clickhouse.com/docs/zh/getting-started/install/

$ grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

sudo apt-get install -y apt-transport-https ca-certificates dirmngr
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754

echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee \
    /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update

sudo apt-get install -y clickhouse-server clickhouse-client

sudo service clickhouse-server start
clickhouse-client # or "clickhouse-client --password" if you've set up a password.

一、常见问题

但容易碰到的几个问题:

1、登陆出现516问题

如果是进入默认用户,碰到以下的类似516问题

root@iZ9ni05fy7agndgpndc7gsZ:~# clickhouse-client 
 ClickHouse exception, code: 516, host: 192.168.0.108, port: 8888; Code: 516, e.displayText() = DB::Exception: default: Authentication failed: password is incorrect or there is no user with such name (version 20.9.2.20 (official build))

vi进入,并修改这个

/etc/clickhouse-server/users.d/default-password.xml

把下面yourpasswd设置成你需要,为空即不设明文密码。

<password>yourpasswd</password>

当然,建议后面改成非明文密码,但是只是玩玩或POC无所谓。

2、PID问题
我在WSL2下碰到。

System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to talk to init daemon.

wsl系统中并不能使用systemd,可以试试:

songroom@DESKTOP-MEDPUTU:~$ sudo service clickhouse-server start

在这里插入图片描述如果还不行,可以再加:

sudo chown -R clickhouse: '/var/run/clickhouse-server/'
songroom@DESKTOP-MEDPUTU:~$ clickhouse-client
ClickHouse client version 22.1.3.7 (official build).
Connecting to localhost:9000 as user default.
Code: 210. DB::NetException: Connection refused (localhost:9000). (NETWORK_ERROR)

songroom@DESKTOP-MEDPUTU:~$ sudo service clickhouse-server start
[sudo] password for songroom: 
 chown -R clickhouse: '/var/run/clickhouse-server/'
Will run su -s /bin/sh 'clickhouse' -c '/usr/bin/clickhouse-server --config-file /etc/clickhouse-server/config.xml --pid-file /var/run/clickhouse-server/clickhouse-server.pid --daemon'
Waiting for server to start
Waiting for server to start
Server started

如果还不行,再重装WSL后解决吧。

3、210问题

Code: 210. DB::NetException: Connection refused (localhost:9000). (NETWORK_ERROR)

我重装WSL后解决。

二、常见用法

1、uninstall clickhouse

sudo apt-get -y remove clickhouse*

sudo find / -name clickhouse*   # 找出安装的其他文件,全部删除

2、设置密码
vi进入:

 /etc/clickhouse-server/users.xml

3、远程访问


> 修改配置文件,将<listen_host>0.0.0.0</listen_host>注释去掉
> vim /etc/clickhouse-server/config.xml

4、示例数据集下载

POC需要比较标准的数据来验证数据性能。具体可以参考:

https://clickhouse.com/docs/en/getting-started/example-datasets/ontime/

比如,样本标例数据集ontime(这个是下载预处理好的分区数据,不用建表了):

curl -O https://datasets.clickhouse.com/ontime/partitions/ontime.tar

下载数据文件ontime.tar,16G,他包含了所有可以提供下载的数据,相当于就是个数据库格式的数据文件,

curl -O https://datasets.clickhouse.com/ontime/partitions/ontime.tar

解压缩,注意这里不用提前在/var/lib/clickhouse创建data/ontime数据库的文件夹,因为这个压缩文件解压的时候,会带着路径/data/ontime,

tar xvf ontime.tar -C /var/lib/clickhouse

重启ClickHouse Server,

sudo service clickhouse-server restart

如果在WSL下,请使用:

sudo service clickhouse-server start

可以进行常用的操作了。

看数据库名

ls /var/lib/clickhouse/data/
ls /var/lib/clickhouse/metadata/

ClickHouse自带了2个数据库:

default:默认数据库,未切换数据库时默认使用该数据。

system:系统数据库,一般不要操作该数据库。
SHOW DATABASES

查询:
查询总行数:

SELECT
    sum(rows) AS `总行数`,
    formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`,
    formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`,
    round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率`
FROM system.parts

Q0.

SELECT avg(c1)
FROM
(
    SELECT Year, Month, count(*) AS c1
    FROM ontime
    GROUP BY Year, Month
);

Q1. 查询从2000年到2008年每天的航班数

SELECT DayOfWeek, count(*) AS c
FROM ontime
WHERE Year>=2000 AND Year<=2008
GROUP BY DayOfWeek
ORDER BY c DESC;

Q2. 查询从2000年到2008年每周延误超过10分钟的航班数。

SELECT DayOfWeek, count(*) AS c
FROM ontime
WHERE DepDelay>10 AND Year>=2000 AND Year<=2008
GROUP BY DayOfWeek
ORDER BY c DESC;

Q3. 查询2000年到2008年每个机场延误超过10分钟以上的次数

SELECT Origin, count(*) AS c
FROM ontime
WHERE DepDelay>10 AND Year>=2000 AND Year<=2008
GROUP BY Origin
ORDER BY c DESC
LIMIT 10;

Q4. 查询2007年各航空公司延误超过10分钟以上的次数

SELECT Carrier, count(*)
FROM ontime
WHERE DepDelay>10 AND Year=2007
GROUP BY Carrier
ORDER BY count(*) DESC;

Q5. 查询2007年各航空公司延误超过10分钟以上的百分比

SELECT Carrier, c, c2, c*100/c2 as c3
FROM
(
    SELECT
        Carrier,
        count(*) AS c
    FROM ontime
    WHERE DepDelay>10
        AND Year=2007
    GROUP BY Carrier
)
JOIN
(
    SELECT
        Carrier,
        count(*) AS c2
    FROM ontime
    WHERE Year=2007
    GROUP BY Carrier
) USING Carrier
ORDER BY c3 DESC;

这样,就可以愉快地玩起clickhouse来了。

三、量化数据的实践
1、建库

CREATE DATABASE my_db
ENGINE = Ordinary

2、建表
建立根据代码来分区的表

CREATE TABLE my_db.stock_tb
(
   code  String,
    datetime DateTime,
    open Float32,
    close Float32,
    low Float32,
    high Float32,
    volume Float64,
    money Float64,
    factor Float32,
    high_limit Float32,
    low_limit Float32,
    avg Float32,
    pre_close Float32,
    paused Float32,
    open_interest Float64
)
ENGINE = MergeTree
PARTITION BY code 
ORDER BY datetime

![在这里插入图片描述](https://img-blog.csdnimg.cn/d68f1ec6a2c544f186a0e8580b7bcb49.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAc29uZ3Jvb20=,size_20,color_FFFFFF,t_70,g_se,x_16)

查看动态表状态:

SELECT
    table AS `表名`,
    sum(rows) AS `总行数`,
    formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`,
    formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`,
    round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率`
FROM system.parts
WHERE table IN ('stock_tb')
GROUP BY table

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值