clickhouse教程-01-02-03

资料:

分片是一个数据节点,副本是集群的架构,存储的数据是一样的。

特点,仅仅支持有限的delete和update功能:

修改和删除的操作是有限的,仅仅适合在大量的这样的操作下才会生效,并且是异步执行的。

---1---

clickhouse官网:https://clickhouse.tech/#quick-start

https://packagecloud.io/altinity/clickhouse

安装:按照这个可以安装成功。

https://blog.csdn.net/jorondo/article/details/105605415?utm_medium=distribute.pc_aggpage_search_result.none-task-blog-2~all~first_rank_v2~rank_v28-1-105605415.nonecase&utm_term=%E7%A6%BB%E7%BA%BF%E5%AE%89%E8%A3%85clickhouse&spm=1000.2123.3001.4430

下载四个rpm包:

yum安装:

yum install *.rpm

启动clickhouse:

systemctl start clickhouse-server

查看状态:

systemctl status clickhouse-server

clickhouse交互:

clickhouse-client

我的clickhouse安装在了changgou这个服务器,并且安装包和数据也是在这个服务器上的。

安装包在这里,版本和书本上的一致。

目前131这个机器上也有了。

简单语句:

1. 安装文件清单
clickhouse-client-${version}.noarch.rpm
clickhouse-common-static-dbg-${version}.x86_64.rpm
clickhouse-common-static-${version}.x86_64.rpm
clickhouse-server-${version}.noarch.rpm

2. 安装命令  

命令1: rpm -ivh *.rpm

命令2: yum install *.rpm

3. 卸载
查看
rpm -qa | grep clickhouse
卸载命令:
rmp -e  rpm包名

4. 启动/关闭服务
systemctl start clickhouse-server
systemctl status clickhouse-server
systemctl restart clickhouse-server

5. 交互式命令
clickhouse-client

select 1

1. 测试数据
https://clickhouse-datasets.s3.yandex.net/hits/tsv/hits_v1.tsv.xz
https://clickhouse-datasets.s3.yandex.net/visits/tsv/visits_v1.tsv.xz

xz -d hits_v1.tsv.xz
xz -d visits_v1.tsv.xz

yum install xz 

2. 创建数据库
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS tutorial"

3. 建表
hits_v1

clickhouse-client  -m //-m是指的是可以换行建表的

CREATE TABLE tutorial.hits_v1
(
    `WatchID` UInt64,
    `JavaEnable` UInt8,
    `Title` String,
    `GoodEvent` Int16,
    `EventTime` DateTime,
    `EventDate` Date,
    `CounterID` UInt32,
    `ClientIP` UInt32,
    `ClientIP6` FixedString(16),
    `RegionID` UInt32,
    `UserID` UInt64,
    `CounterClass` Int8,
    `OS` UInt8,
    `UserAgent` UInt8,
    `URL` String,
    `Referer` String,
    `URLDomain` String,
    `RefererDomain` String,
    `Refresh` UInt8,
    `IsRobot` UInt8,
    `RefererCategories` Array(UInt16),
    `URLCategories` Array(UInt16),
    `URLRegions` Array(UInt32),
    `RefererRegions` Array(UInt32),
    `ResolutionWidth` UInt16,
    `ResolutionHeight` UInt16,
    `ResolutionDepth` UInt8,
    `FlashMajor` UInt8,
    `FlashMinor` UInt8,
    `FlashMinor2` String,
    `NetMajor` UInt8,
    `NetMinor` UInt8,
    `UserAgentMajor` UInt16,
    `UserAgentMinor` FixedString(2),
    `CookieEnable` UInt8,
    `JavascriptEnable` UInt8,
    `IsMobile` UInt8,
    `MobilePhone` UInt8,
    `MobilePhoneModel` String,
    `Params` String,
    `IPNetworkID` UInt32,
    `TraficSourceID` Int8,
    `SearchEngineID` UInt16,
    `SearchPhrase` String,
    `AdvEngineID` UInt8,
    `IsArtifical` UInt8,
    `WindowClientWidth` UInt16,
    `WindowClientHeight` UInt16,
    `ClientTimeZone` Int16,
    `ClientEventTime` DateTime,
    `SilverlightVersion1` UInt8,
    `SilverlightVersion2` UInt8,
    `SilverlightVersion3` UInt32,
    `SilverlightVersion4` UInt16,
    `PageCharset` String,
    `CodeVersion` UInt32,
    `IsLink` UInt8,
    `IsDownload` UInt8,
    `IsNotBounce` UInt8,
    `FUniqID` UInt64,
    `HID` UInt32,
    `IsOldCounter` UInt8,
    `IsEvent` UInt8,
    `IsParameter` UInt8,
    `DontCountHits` UInt8,
    `WithHash` UInt8,
    `HitColor` FixedString(1),
    `UTCEventTime` DateTime,
    `Age` UInt8,
    `Sex` UInt8,
    `Income` UInt8,
    `Interests` UInt16,
    `Robotness` UInt8,
    `GeneralInterests` Array(UInt16),
    `RemoteIP` UInt32,
    `RemoteIP6` FixedString(16),
    `WindowName` Int32,
    `OpenerName` Int32,
    `HistoryLength` Int16,
    `BrowserLanguage` FixedString(2),
    `BrowserCountry` FixedString(2),
    `SocialNetwork` String,
    `SocialAction` String,
    `HTTPError` UInt16,
    `SendTiming` Int32,
    `DNSTiming` Int32,
    `ConnectTiming` Int32,
    `ResponseStartTiming` Int32,
    `ResponseEndTiming` Int32,
    `FetchTiming` Int32,
    `RedirectTiming` Int32,
    `DOMInteractiveTiming` Int32,
    `DOMContentLoadedTiming` Int32,
    `DOMCompleteTiming` Int32,
    `LoadEventStartTiming` Int32,
    `LoadEventEndTiming` Int32,
    `NSToDOMContentLoadedTiming` Int32,
    `FirstPaintTiming` Int32,
    `RedirectCount` Int8,
    `SocialSourceNetworkID` UInt8,
    `SocialSourcePage` String,
    `ParamPrice` Int64,
    `ParamOrderID` String,
    `ParamCurrency` FixedString(3),
    `ParamCurrencyID` UInt16,
    `GoalsReached` Array(UInt32),
    `OpenstatServiceName` String,
    `OpenstatCampaignID` String,
    `OpenstatAdID` String,
    `OpenstatSourceID` String,
    `UTMSource` String,
    `UTMMedium` String,
    `UTMCampaign` String,
    `UTMContent` String,
    `UTMTerm` String,
    `FromTag` String,
    `HasGCLID` UInt8,
    `RefererHash` UInt64,
    `URLHash` UInt64,
    `CLID` UInt32,
    `YCLID` UInt64,
    `ShareService` String,
    `ShareURL` String,
    `ShareTitle` String,
    `ParsedParams` Nested(
        Key1 String,
        Key2 String,
        Key3 String,
        Key4 String,
        Key5 String,
        ValueDouble Float64),
    `IslandID` FixedString(16),
    `RequestNum` UInt32,
    `RequestTry` UInt8
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))
SAMPLE BY intHash32(UserID)
SETTINGS index_granularity = 8192;




CREATE TABLE tutorial.visits_v1
(
    `CounterID` UInt32,
    `StartDate` Date,
    `Sign` Int8,
    `IsNew` UInt8,
    `VisitID` UInt64,
    `UserID` UInt64,
    `StartTime` DateTime,
    `Duration` UInt32,
    `UTCStartTime` DateTime,
    `PageViews` Int32,
    `Hits` Int32,
    `IsBounce` UInt8,
    `Referer` String,
    `StartURL` String,
    `RefererDomain` String,
    `StartURLDomain` String,
    `EndURL` String,
    `LinkURL` String,
    `IsDownload` UInt8,
    `TraficSourceID` Int8,
    `SearchEngineID` UInt16,
    `SearchPhrase` String,
    `AdvEngineID` UInt8,
    `PlaceID` Int32,
    `RefererCategories` Array(UInt16),
    `URLCategories` Array(UInt16),
    `URLRegions` Array(UInt32),
    `RefererRegions` Array(UInt32),
    `IsYandex` UInt8,
    `GoalReachesDepth` Int32,
    `GoalReachesURL` Int32,
    `GoalReachesAny` Int32,
    `SocialSourceNetworkID` UInt8,
    `SocialSourcePage` String,
    `MobilePhoneModel` String,
    `ClientEventTime` DateTime,
    `RegionID` UInt32,
    `ClientIP` UInt32,
    `ClientIP6` FixedString(16),
    `RemoteIP` UInt32,
    `RemoteIP6` FixedString(16),
    `IPNetworkID` UInt32,
    `SilverlightVersion3` UInt32,
    `CodeVersion` UInt32,
    `ResolutionWidth` UInt16,
    `ResolutionHeight` UInt16,
    `UserAgentMajor` UInt16,
    `UserAgentMinor` UInt16,
    `WindowClientWidth` UInt16,
    `WindowClientHeight` UInt16,
    `SilverlightVersion2` UInt8,
    `SilverlightVersion4` UInt16,
    `FlashVersion3` UInt16,
    `FlashVersion4` UInt16,
    `ClientTimeZone` Int16,
    `OS` UInt8,
    `UserAgent` UInt8,
    `ResolutionDepth` UInt8,
    `FlashMajor` UInt8,
    `FlashMinor` UInt8,
    `NetMajor` UInt8,
    `NetMinor` UInt8,
    `MobilePhone` UInt8,
    `SilverlightVersion1` UInt8,
    `Age` UInt8,
    `Sex` UInt8,
    `Income` UInt8,
    `JavaEnable` UInt8,
    `CookieEnable` UInt8,
    `JavascriptEnable` UInt8,
    `IsMobile` UInt8,
    `BrowserLanguage` UInt16,
    `BrowserCountry` UInt16,
    `Interests` UInt16,
    `Robotness` UInt8,
    `GeneralInterests` Array(UInt16),
    `Params` Array(String),
    `Goals` Nested(
        ID UInt32,
        Serial UInt32,
        EventTime DateTime,
        Price Int64,
        OrderID String,
        CurrencyID UInt32),
    `WatchIDs` Array(UInt64),
    `ParamSumPrice` Int64,
    `ParamCurrency` FixedString(3),
    `ParamCurrencyID` UInt16,
    `ClickLogID` UInt64,
    `ClickEventID` Int32,
    `ClickGoodEvent` Int32,
    `ClickEventTime` DateTime,
    `ClickPriorityID` Int32,
    `ClickPhraseID` Int32,
    `ClickPageID` Int32,
    `ClickPlaceID` Int32,
    `ClickTypeID` Int32,
    `ClickResourceID` Int32,
    `ClickCost` UInt32,
    `ClickClientIP` UInt32,
    `ClickDomainID` UInt32,
    `ClickURL` String,
    `ClickAttempt` UInt8,
    `ClickOrderID` UInt32,
    `ClickBannerID` UInt32,
    `ClickMarketCategoryID` UInt32,
    `ClickMarketPP` UInt32,
    `ClickMarketCategoryName` String,
    `ClickMarketPPName` String,
    `ClickAWAPSCampaignName` String,
    `ClickPageName` String,
    `ClickTargetType` UInt16,
    `ClickTargetPhraseID` UInt64,
    `ClickContextType` UInt8,
    `ClickSelectType` Int8,
    `ClickOptions` String,
    `ClickGroupBannerID` Int32,
    `OpenstatServiceName` String,
    `OpenstatCampaignID` String,
    `OpenstatAdID` String,
    `OpenstatSourceID` String,
    `UTMSource` String,
    `UTMMedium` String,
    `UTMCampaign` String,
    `UTMContent` String,
    `UTMTerm` String,
    `FromTag` String,
    `HasGCLID` UInt8,
    `FirstVisit` DateTime,
    `PredLastVisit` Date,
    `LastVisit` Date,
    `TotalVisits` UInt32,
    `TraficSource` Nested(
        ID Int8,
        SearchEngineID UInt16,
        AdvEngineID UInt8,
        PlaceID UInt16,
        SocialSourceNetworkID UInt8,
        Domain String,
        SearchPhrase String,
        SocialSourcePage String),
    `Attendance` FixedString(16),
    `CLID` UInt32,
    `YCLID` UInt64,
    `NormalizedRefererHash` UInt64,
    `SearchPhraseHash` UInt64,
    `RefererDomainHash` UInt64,
    `NormalizedStartURLHash` UInt64,
    `StartURLDomainHash` UInt64,
    `NormalizedEndURLHash` UInt64,
    `TopLevelDomain` UInt64,
    `URLScheme` UInt64,
    `OpenstatServiceNameHash` UInt64,
    `OpenstatCampaignIDHash` UInt64,
    `OpenstatAdIDHash` UInt64,
    `OpenstatSourceIDHash` UInt64,
    `UTMSourceHash` UInt64,
    `UTMMediumHash` UInt64,
    `UTMCampaignHash` UInt64,
    `UTMContentHash` UInt64,
    `UTMTermHash` UInt64,
    `FromHash` UInt64,
    `WebVisorEnabled` UInt8,
    `WebVisorActivity` UInt32,
    `ParsedParams` Nested(
        Key1 String,
        Key2 String,
        Key3 String,
        Key4 String,
        Key5 String,
        ValueDouble Float64),
    `Market` Nested(
        Type UInt8,
        GoalID UInt32,
        OrderID String,
        OrderPrice Int64,
        PP UInt32,
        DirectPlaceID UInt32,
        DirectOrderID UInt32,
        DirectBannerID UInt32,
        GoodID String,
        GoodName String,
        GoodQuantity Int32,
        GoodPrice Int64),
    `IslandID` FixedString(16)
)
ENGINE = CollapsingMergeTree(Sign)
PARTITION BY toYYYYMM(StartDate)
ORDER BY (CounterID, StartDate, intHash32(UserID), VisitID)
SAMPLE BY intHash32(UserID)
SETTINGS index_granularity = 8192


4. 数据导入

date
clickhouse-client --query "INSERT INTO tutorial.hits_v1 FORMAT TSV" --max_insert_block_size=100000 < hits_v1.tsv
date

clickhouse-client --query "INSERT INTO tutorial.visits_v1 FORMAT TSV" --max_insert_block_size=100000 < visits_v1.tsv

select * from hits_v1 limit 1; \G

5. 查询
SELECT
    StartURL AS URL,
    AVG(Duration) AS AvgDuration
FROM tutorial.visits_v1
WHERE StartDate BETWEEN '2014-03-23' AND '2014-03-30'
GROUP BY URL
ORDER BY AvgDuration DESC
LIMIT 10;


SELECT
    sum(Sign) AS visits,
    sumIf(Sign, has(Goals.ID, 1105530)) AS goal_visits,
    (100. * goal_visits) / visits AS goal_percent
FROM tutorial.visits_v1
WHERE (CounterID = 912887) AND (toYYYYMM(StartDate) = 201403) AND (domain(StartURL) = 'yandex.ru');



=========================================================================================
clickhouse-client
交互式模式:clickhouse-client
非交互式模式: 指定了--query参数或将数据发送至标准的stdin。

测试表创建:
create database if not exists test;
use test;
drop table test;
create table test(id UInt8, text String, created DateTime) ENGINE=TinyLog;


批模式数据导入:
echo -ne "1, 'some text', '2016-08-14 00:00:00'\n2, 'some more text', '2016-08-14 00:00:01'" | clickhouse-client --database=test --query="INSERT INTO test FORMAT CSV";

cat <<_EOF | clickhouse-client --database=test --query="INSERT INTO test FORMAT CSV";
3, 'some text', '2016-08-14 00:00:00'
4, 'some more text', '2016-08-14 00:00:01'
_EOF

数据导出
Clickhouse-client --query="select * from test.test" > file.tsv
Cat file.tsv

select * from test.test;\G

总结:
(1)、批模式下的默认数据格式为TabSeparated,可使用FORMAT指定格式。
(2)、使用--multiquery参数,可同时执行多个查询(除了INSERT),每个查询语句使用分号(;)分隔。
(3)、使用--multiline或-m参数,允许执行多行的查询。
(4)、在查询语句的分号之后指定\G或使用\G替换分号, 数据以垂直的格式展示。
(5)、历史的执行语句保存在操作系统文件:~/.clickhouse-client-history。
(6)、退出客户端的方式,按Ctrl + D或Ctrl + C,或执行命令:exit、quit、logout、q、Q、:q。


查询中指定参数:
clickhouse-client --param_myid=3 --database=test --query="select * from test where id>{myid:UInt8}"
clickhouse-client --database=test --query="select * from test where id>3"
查询语句指定参数,参数值通过客户端参数传递。
使用大括号占位要传递的参数值,格式为: {<name>:<data type>}
name:占位标识符。通过clickhouse-client参数指定,格式为--param_<name> = value 。
data type:指定参数值的数据类型,例如:UInt8、String等。

clickhouse-client --param_parName="[1, 2]"  -q "SELECT * FROM table WHERE a = {parName:Array(UInt16)}"

clickhouse-client --query="select * from test.test FORMAT TabSeparated" > file.tsv

clickhouse-client查找配置文件的顺序:
(1)、通过--config-file指定的配置文件。
(2)、./clickhouse-client.xml
(3)、~/.clickhouse-client/config.xml
(4)、/etc/clickhouse-client/config.xml

 {host}
 {port}
 {user}
 {database}
 {display_name}
 
 
 =========================================================================================================================
 MySQL接口操作ClickHouse
 1. 安装MySQL的客户端
 tar -xzf MySQL-client-5.6.38-1.el6.x86_64.tar.gz
 rpm -ivh MySQL-client-5.6.38-1.el6.x86_64.rpm
 
 2. 配置MySQL协议
  vim /etc/clickhouse-server/config.xml
  <mysql_port>9004</mysql_port>

命令行连接到ClickHouse:
mysql --protocol tcp -u default -P 9004


为了兼容所有MySQL客户端,建议使用SHA1加密用户密码:
PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD"  \
| sha1sum | tr -d '-' | xxd -r -p | sha1sum | tr -d '-'

w+YuQ0p3
8879bd5e440c57cf4be5f9188a4b46779c96a322


配置文件:
vim /etc/clickhouse-server/users.xml
<password_double_sha1_hex>8879bd5e440c57cf4be5f9188a4b46779c96a322</password_double_sha1_hex>

执行命令:
mysql --protocol tcp -u default -P 9004 --password=w+YuQ0p3



---------------------------
DBeaver可视化工具安装与使用


配置参数:
timeout
socket_timeout


外部聚合,如果用于聚合内存不足,10G
max_bytes_before_external_group_by=5G


1. 安装DBeaver
2. 下载DBeaver的JDBC驱动
3. 连接ClickHouse, 注意,配置CH的网络
/etc/clickhouse-server/config.xml
<listen_host>0.0.0.0</listen_host>
4. 设置参数
在URL模板中,设置ClickHouse的参数
jdbc:clickhouse://{host}:{port}[/{database}]?socket_timeout=600000&max_bytes_before_external_group_by=20000000

问题:clickhouse的批量数据导入。

---2-1-2-2-2-3-2-4---

TabSeparated模式:

数据格式,注意那个箭头:

语句:

导入数据,如何操作:

总结:

TabSeparated、TabSeparatedRaw、TabSeparatedWithNames和TabSeparatedWithNamesAndTypes

1. TabSeparated
数据按行写入,tab制表符分隔。使用严格Unix命令行。
注意:最后一行必须包含换行符。
默认格式。
简写:TSV
数据插入和数据查询时,均可使用。


CREATE TABLE tsv_demo(srcip String, destip String, time String) ENGINE = TinyLog;




clickhouse-client --query "INSERT INTO tsv_demo FORMAT TabSeparated/TSV" --max_insert_block_size=100000 < tsv_demo.tsv 

Code: 117. DB::Exception: 
You have carriage return (\r, 0x0D, ASCII 13) at end of first row.
It's like your input data has DOS/Windows style line separators, that are illegal in TabSeparated format. You must transform your file to Unix format.
But if you really need carriage return at end of string value of last column, you need to escape it as \r.

这是由于windows使用\r\n表示回车换行, 在linux中使用\n表示换行。 因此,需要将\r给去掉。


转换为UNIX格式:
dos2unix tsv_demo.tsv

输出格式:
select * from tsv_demo FORMAT TSV;

2. TabSeparatedRaw

简称:TSVRaw
TabSeparatedRaw格式不会对行数据进行转义, 即不会将换行、制表符等转换为转义字符。
只能在数据查询的时候使用。


zhangsan	nanjing\tjiangsu	23	From nanjing
lisi	hangzhou\tzhejiang	32	\x41 amazing place
xiaoming	hefei\tanhui	25	notepad

建表语句:
create table escape_demo(name String, addr String, age UInt8, desc String) ENGINE=TinyLog;


select * from escape_demo FORMAT TSV;

zhangsan	nanjing\tjiangsu	23	From nanjing
lisi	hangzhou\tzhejiang	32	A amazing place
xiaoming	hefei\tanhui	25	notepad

3 rows in set. Elapsed: 0.005 sec.



SELECT *
FROM escape_demo
FORMAT TabSeparatedRaw

zhangsan	nanjing	jiangsu	23	From nanjing
lisi	hangzhou	zhejiang	32	A amazing place
xiaoming	hefei	anhui	25	notepad

3 rows in set. Elapsed: 0.005 sec

3. TabSeparatedWithNames
 在查询时,TabSeparatedWithNames格式的第一行显示列的名称
 在数据导入时,第一行完全被忽略,不会解析第一行为表头。
 简称:TSVWithNames
 在数据查询和数据导入均可使用。
 
 数据导入:
 clickhouse-client --query="INSERT INTO escape_demo FORMAT TSVWithNames" < escape_demo.tsv

数据查询:
select * from escape_demo FORMAT TSVWithNames;

4. TabSeparatedWithNamesAndTypes
在查询时,TabSeparatedWithNamesAndTypes格式在主数据的前面额外显示两行数据,第一行显示列的名称, 第二行显示列的数据类型。
在数据导入时,前面两行的数据完全被忽略。
简写:TSVWithNamesAndTypes
在数据导入和查询时均可使用。

---3-1---

暂时不看了

---3-2-3-7---

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值