ClickHouse教程 — 第一章 ClickHouse单机版安装_clickhouse安装教程

img
img

网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。

需要这份系统化的资料的朋友,可以添加戳这里获取

一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!

./clickhouse-client-22.2.2.1/install/doinst.sh


### 2.3 启动



#查看命令
clickhouse --help

#启动
clickhouse start


启动结果:



chown --recursive clickhouse ‘/var/run/clickhouse-server/’
Will run su -s /bin/sh ‘clickhouse’ -c ‘/usr/bin/clickhouse-server --config-file /etc/clickhouse-server/config.xml --pid-file /var/run/clickhouse-server/clickhouse-server.pid --daemon’
Waiting for server to start
Waiting for server to start
Server started


连接`clickhouse`



clickhouse-client


连接结果:



ClickHouse client version 22.2.2.1.
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 22.2.2 revision 54455.


### 2.4 clickhouse相关目录



命令目录

/usr/bin
ll |grep clickhouse

配置文件目录

cd /etc/clickhouse-server/

日志目录

cd /var/log/clickhouse-server/

数据文件目录

cd /var/lib/clickhouse/


### 2.5 允许远程访问


`clickhouse` 默认不允许远程访问,需要修改配置文件:



cd /etc/clickhouse-server/

vim config.xml


注意:只需要把 `<listen_host>::</listen_host>` 这一个的注释放开即可。  
 放开后的配置:



<?xml version="1.0"?>
  • none (turns off logging)
  • fatal
  • critical
  • error
  • warning
  • notice
  • information
  • debug
  • trace
  • test (not for production usage)

–>
trace
/var/log/clickhouse-server/clickhouse-server.log
/var/log/clickhouse-server/clickhouse-server.err.log

1000M
10

    <!-- Per level overrides (legacy):

For example to suppress logging of the ConfigReloader you can use:
NOTE: levels.logger is reserved, see below.
–>

    <!-- Per level overrides:

For example to suppress logging of the RBAC for default user you can use:
(But please note that the logger name maybe changed from version to version, even after minor upgrade)
–>

<!-- Add headers to response in options request. OPTIONS method is used in CORS preflight requests. -->
<!-- It is off by default. Next headers are obligate for CORS.-->
<!-- http\_options\_response>
Access-Control-Allow-Origin \*
Access-Control-Allow-Headers origin, x-requested-with
Access-Control-Allow-Methods POST, GET, OPTIONS
Access-Control-Max-Age 86400
<!-- It is the name that will be shown in the clickhouse-client.

By default, anything with “production” will be highlighted in red in query prompt.
–>

<!-- Port for HTTP API. See also 'https\_port' for secure connections.

This interface is also used by ODBC and JDBC drivers (DataGrip, Dbeaver, …)
and by most of web interfaces (embedded UI, Grafana, Redash, …).
–>
<http_port>8123</http_port>

<!-- Port for interaction by native protocol with:
  • clickhouse-client and other native ClickHouse tools (clickhouse-benchmark, clickhouse-copier);

  • clickhouse-server with other clickhouse-servers for distributed query processing;

  • ClickHouse drivers and applications supporting native protocol
    (this protocol is also informally called as “the TCP protocol”);
    See also ‘tcp_port_secure’ for secure connections.
    –>
    <tcp_port>9000</tcp_port>

ClickHouse will pretend to be MySQL for applications connecting to this port.
–>
<mysql_port>9004</mysql_port>

<!-- Compatibility with PostgreSQL protocol.

ClickHouse will pretend to be PostgreSQL for applications connecting to this port.
–>
<postgresql_port>9005</postgresql_port>

<!-- HTTP API with TLS (HTTPS).

You have to configure certificate to enable this interface.
See the openSSL section below.
–>

<!-- Native interface with TLS.

You have to configure certificate to enable this interface.
See the openSSL section below.
–>

<!-- Native interface wrapped with PROXYv1 protocol

PROXYv1 header sent for every connection.
ClickHouse will extract information about proxy-forwarded client address from the header.
–>

<!-- Port for communication between replicas. Used for data exchange.

It provides low-level data access between servers.
This port should not be accessible from untrusted networks.
See also ‘interserver_http_credentials’.
Data transferred over connections to this port should not go through untrusted networks.
See also ‘interserver_https_port’.
–>
<interserver_http_port>9009</interserver_http_port>

<!-- Port for communication between replicas with TLS.

You have to configure certificate to enable this interface.
See the openSSL section below.
See also ‘interserver_http_credentials’.
–>

<!-- Hostname that is used by other replicas to request this server.

If not specified, than it is determined analogous to ‘hostname -f’ command.
This setting could be used to switch replication to another network interface
(the server may be connected to multiple networks via multiple addresses)
–>

<!--

<interserver_http_host>example.yandex.ru</interserver_http_host>
–>

<!-- You can specify credentials for authenthication between replicas.

This is required when interserver_https_port is accessible from untrusted networks,
and also recommended to avoid SSRF attacks from possibly compromised services in your network.
–>

<!-- Listen specified address.

Use :: (wildcard IPv6 address), if you want to accept connections both with IPv4 and IPv6 from everywhere.
Notes:
If you open connections from wildcard address, make sure that at least one of the following measures applied:

  • server is protected by firewall and not accessible from untrusted networks;

  • all users are restricted to subset of network addresses (see users.xml);

  • all users have strong passwords, only secure (TLS) interfaces are accessible, or connections are only made via TLS interfaces.

  • users without password have readonly access.
    See also: https://www.shodan.io/search?query=clickhouse
    –>
    <listen_host>::</listen_host>

<listen_host>::1</listen_host>
<listen_host>127.0.0.1</listen_host>
–>

<!-- Don't exit if IPv6 or IPv4 networks are unavailable while trying to listen. -->
<!-- <listen\_try>0</listen\_try> -->

<!-- Allow multiple servers to listen on the same address:port. This is not recommended.

–>

<!-- <listen\_backlog>4096</listen\_backlog> -->

<max\_connections>4096</max\_connections>

<!-- For 'Connection: keep-alive' in HTTP 1.1 -->
<keep\_alive\_timeout>3</keep\_alive\_timeout>

<!-- gRPC protocol (see src/Server/grpc\_protos/clickhouse\_grpc.proto for the API) -->
<!-- <grpc\_port>9100</grpc\_port> -->
<grpc>
    <enable\_ssl>false</enable\_ssl>

    <!-- The following two files are used only if enable\_ssl=1 -->
    <ssl\_cert\_file>/path/to/ssl_cert_file</ssl\_cert\_file>
    <ssl\_key\_file>/path/to/ssl_key_file</ssl\_key\_file>

    <!-- Whether server will request client for a certificate -->
    <ssl\_require\_client\_auth>false</ssl\_require\_client\_auth>

    <!-- The following file is used only if ssl\_require\_client\_auth=1 -->
    <ssl\_ca\_cert\_file>/path/to/ssl_ca_cert_file</ssl\_ca\_cert\_file>

    <!-- Default transport compression type (can be overridden by client, see the transport\_compression\_type field in QueryInfo).

Supported algorithms: none, deflate, gzip, stream_gzip -->
<transport_compression_type>none</transport_compression_type>

    <!-- Default transport compression level. Supported levels: 0..3 -->
    <transport\_compression\_level>0</transport\_compression\_level>

    <!-- Send/receive message size limits in bytes. -1 means unlimited -->
    <max\_send\_message\_size>-1</max\_send\_message\_size>
    <max\_receive\_message\_size>-1</max\_receive\_message\_size>

    <!-- Enable if you want very detailed logs -->
    <verbose\_logs>false</verbose\_logs>
</grpc>

<!-- Used with https\_port and tcp\_port\_secure. Full ssl options list: https://github.com/ClickHouse-Extras/poco/blob/master/NetSSL\_OpenSSL/include/Poco/Net/SSLManager.h#L71 -->
<openSSL>
    <server> <!-- Used for https server AND secure tcp port -->
        <!-- openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout /etc/clickhouse-server/server.key -out /etc/clickhouse-server/server.crt -->
        <certificateFile>/etc/clickhouse-server/server.crt</certificateFile>
        <privateKeyFile>/etc/clickhouse-server/server.key</privateKeyFile>
        <!-- dhparams are optional. You can delete the <dhParamsFile> element.

To generate dhparams, use the following command:
openssl dhparam -out /etc/clickhouse-server/dhparam.pem 4096
Only file format with BEGIN DH PARAMETERS is supported.
–>

none
true
true
sslv2,sslv3
true

    <client> <!-- Used for connecting to https dictionary source and secured Zookeeper communication -->
        <loadDefaultCAFile>true</loadDefaultCAFile>
        <cacheSessions>true</cacheSessions>
        <disableProtocols>sslv2,sslv3</disableProtocols>
        <preferServerCiphers>true</preferServerCiphers>
        <!-- Use for self-signed: <verificationMode>none</verificationMode> -->
        <invalidCertificateHandler>
            <!-- Use for self-signed: <name>AcceptCertificateHandler</name> -->
            <name>RejectCertificateHandler</name>
        </invalidCertificateHandler>
    </client>
</openSSL>

<!-- Default root page on http[s] server. For example load UI from https://tabix.io/ when opening http://localhost:8123 -->
<!--

<http_server_default_response><![CDATA[

]]></http_server_default_response>
–>

<!-- Maximum number of concurrent queries. -->
<max\_concurrent\_queries>100</max\_concurrent\_queries>

<!-- Maximum memory usage (resident set size) for server process.

Zero value or unset means default. Default is “max_server_memory_usage_to_ram_ratio” of available physical RAM.
If the value is larger than “max_server_memory_usage_to_ram_ratio” of available physical RAM, it will be cut down.

The constraint is checked on query execution time.
If a query tries to allocate memory and the current memory usage plus allocation is greater
than specified threshold, exception will be thrown.

It is not practical to set this constraint to small values like just a few gigabytes,
because memory allocator will keep this amount of memory in caches and the server will deny service of queries.
–>
<max_server_memory_usage>0</max_server_memory_usage>

<!-- Maximum number of threads in the Global thread pool.

This will default to a maximum of 10000 threads if not specified.
This setting will be useful in scenarios where there are a large number
of distributed queries that are running concurrently but are idling most
of the time, in which case a higher number of threads might be required.
–>

<max\_thread\_pool\_size>10000</max\_thread\_pool\_size>

<!-- Number of workers to recycle connections in background (see also drain\_timeout).

If the pool is full, connection will be drained synchronously. -->

<!-- On memory constrained environments you may have to set this to value larger than 1.

–>
<max_server_memory_usage_to_ram_ratio>0.9</max_server_memory_usage_to_ram_ratio>

<!-- Simple server-wide memory profiler. Collect a stack trace at every peak allocation step (in bytes).

Data will be stored in system.trace_log table with query_id = empty string.
Zero means disabled.
–>
<total_memory_profiler_step>4194304</total_memory_profiler_step>

<!-- Collect random allocations and deallocations and write them into system.trace\_log with 'MemorySample' trace\_type.

The probability is for every alloc/free regardless to the size of the allocation.
Note that sampling happens only when the amount of untracked memory exceeds the untracked memory limit,
which is 4 MiB by default but can be lowered if ‘total_memory_profiler_step’ is lowered.
You may want to set ‘total_memory_profiler_step’ to 1 for extra fine grained sampling.
–>
<total_memory_tracker_sample_probability>0</total_memory_tracker_sample_probability>

<!-- Set limit on number of open files (default: maximum). This setting makes sense on Mac OS X because getrlimit() fails to retrieve

correct maximum value. -->

<!-- Size of cache of uncompressed blocks of data, used in tables of MergeTree family.

In bytes. Cache is single for server. Memory is allocated only on demand.
Cache is used when ‘use_uncompressed_cache’ user setting turned on (off by default).
Uncompressed cache is advantageous only for very short queries and in rare cases.

Note: uncompressed cache can be pointless for lz4, because memory bandwidth
is slower than multi-core decompression on some server configurations.
Enabling it can sometimes paradoxically make queries slower.
–>
<uncompressed_cache_size>8589934592</uncompressed_cache_size>

<!-- Approximate size of mark cache, used in tables of MergeTree family.

In bytes. Cache is single for server. Memory is allocated only on demand.
You should not lower this value.
–>
<mark_cache_size>5368709120</mark_cache_size>

<!-- If you enable the `min\_bytes\_to\_use\_mmap\_io` setting,

the data in MergeTree tables can be read with mmap to avoid copying from kernel to userspace.
It makes sense only for large files and helps only if data reside in page cache.
To avoid frequent open/mmap/munmap/close calls (which are very expensive due to consequent page faults)
and to reuse mappings from several threads and queries,
the cache of mapped files is maintained. Its size is the number of mapped regions (usually equal to the number of mapped files).
The amount of data in mapped files can be monitored
in system.metrics, system.metric_log by the MMappedFiles, MMappedFileBytes metrics
and in system.asynchronous_metrics, system.asynchronous_metrics_log by the MMapCacheCells metric,
and also in system.events, system.processes, system.query_log, system.query_thread_log, system.query_views_log by the
CreatedReadBufferMMap, CreatedReadBufferMMapFailed, MMappedFileCacheHits, MMappedFileCacheMisses events.
Note that the amount of data in mapped files does not consume memory directly and is not accounted
in query or server memory usage - because this memory can be discarded similar to OS page cache.
The cache is dropped (the files are closed) automatically on removal of old parts in MergeTree,
also it can be dropped manually by the SYSTEM DROP MMAP CACHE query.
–>
<mmap_cache_size>1000</mmap_cache_size>

<!-- Cache size in bytes for compiled expressions.-->
<compiled\_expression\_cache\_size>134217728</compiled\_expression\_cache\_size>

<!-- Cache size in elements for compiled expressions.-->
<compiled\_expression\_cache\_elements\_size>10000</compiled\_expression\_cache\_elements\_size>

<!-- Path to data directory, with trailing slash. -->
<path>/var/lib/clickhouse/</path>

<!-- Path to temporary data for processing hard queries. -->
<tmp\_path>/var/lib/clickhouse/tmp/</tmp\_path>

<!-- Policy from the <storage\_configuration> for the temporary files.

If not set <tmp_path> is used, otherwise <tmp_path> is ignored.

Notes:

  • move_factor is ignored

  • keep_free_space_bytes is ignored

  • max_data_part_size_bytes is ignored

  • you must have exactly one volume in that policy
    –>

    <user_files_path>/var/lib/clickhouse/user_files/</user_files_path>

    <ldap_servers>

    </ldap_servers>

to authenticate via Kerberos, define a single ‘kerberos’ section here.
Parameters:
principal - canonical service principal name, that will be acquired and used when accepting security contexts.
This parameter is optional, if omitted, the default principal will be used.
This parameter cannot be specified together with ‘realm’ parameter.
realm - a realm, that will be used to restrict authentication to only those requests whose initiator’s realm matches it.
This parameter is optional, if omitted, no additional filtering by realm will be applied.
This parameter cannot be specified together with ‘principal’ parameter.
Example:

Example:

HTTP/clickhouse.example.com@EXAMPLE.COM

Example:

EXAMPLE.COM

–>

<!-- Sources to read users, roles, access rights, profiles of settings, quotas. -->
<user\_directories>
    <users\_xml>
        <!-- Path to configuration file with predefined users. -->
        <path>users.xml</path>
    </users\_xml>
    <local\_directory>
        <!-- Path to folder where users created by SQL commands are stored. -->
        <path>/var/lib/clickhouse/access/</path>
    </local\_directory>

    <!-- To add an LDAP server as a remote user directory of users that are not defined locally, define a single 'ldap' section

with the following parameters:
server - one of LDAP server names defined in ‘ldap_servers’ config section above.
This parameter is mandatory and cannot be empty.
roles - section with a list of locally defined roles that will be assigned to each user retrieved from the LDAP server.
If no roles are specified here or assigned during role mapping (below), user will not be able to perform any
actions after authentication.
role_mapping - section with LDAP search parameters and mapping rules.
When a user authenticates, while still bound to LDAP, an LDAP search is performed using search_filter and the
name of the logged in user. For each entry found during that search, the value of the specified attribute is
extracted. For each attribute value that has the specified prefix, the prefix is removed, and the rest of the
value becomes the name of a local role defined in ClickHouse, which is expected to be created beforehand by
CREATE ROLE command.
There can be multiple ‘role_mapping’ sections defined inside the same ‘ldap’ section. All of them will be
applied.
base_dn - template used to construct the base DN for the LDAP search.
The resulting DN will be constructed by replacing all ‘{user_name}’, ‘{bind_dn}’, and ‘{user_dn}’
substrings of the template with the actual user name, bind DN, and user DN during each LDAP search.
scope - scope of the LDAP search.
Accepted values are: ‘base’, ‘one_level’, ‘children’, ‘subtree’ (the default).
search_filter - template used to construct the search filter for the LDAP search.
The resulting filter will be constructed by replacing all ‘{user_name}’, ‘{bind_dn}’, ‘{user_dn}’, and
‘{base_dn}’ substrings of the template with the actual user name, bind DN, user DN, and base DN during
each LDAP search.
Note, that the special characters must be escaped properly in XML.
attribute - attribute name whose values will be returned by the LDAP search. ‘cn’, by default.
prefix - prefix, that will be expected to be in front of each string in the original list of strings returned by
the LDAP search. Prefix will be removed from the original strings and resulting strings will be treated
as local role names. Empty, by default.
Example:

my_ldap_server

<my_local_role1 />
<my_local_role2 />

<role_mapping>
<base_dn>ou=groups,dc=example,dc=com</base_dn>
subtree
<search_filter>(&(objectClass=groupOfNames)(member={bind_dn}))</search_filter>
cn
clickhouse_
</role_mapping>

Example (typical Active Directory with role mapping that relies on the detected user DN):

my_ad_server
<role_mapping>
<base_dn>CN=Users,DC=example,DC=com</base_dn>
CN
subtree
<search_filter>(&(objectClass=group)(member={user_dn}))</search_filter>
clickhouse_
</role_mapping>

–>
</user_directories>

<!-- Default profile of settings. -->
<default\_profile>default</default\_profile>

<!-- Comma-separated list of prefixes for user-defined settings. -->
<custom\_settings\_prefixes></custom\_settings\_prefixes>

<!-- System profile of settings. This settings are used by internal processes (Distributed DDL worker and so on). -->
<!-- <system\_profile>default</system\_profile> -->

<!-- Buffer profile of settings.

This settings are used by Buffer storage to flush data to the underlying table.
Default: used from system_profile directive.
–>

<!-- Default database. -->
<default\_database>default</default\_database>

<!-- Server time zone could be set here.

Time zone is used when converting between String and DateTime types,
when printing DateTime in text formats and parsing DateTime from text,
it is used in date and time related functions, if specific time zone was not passed as an argument.

Time zone is specified as identifier from IANA time zone database, like UTC or Africa/Abidjan.
If not specified, system time zone at server startup is used.

Please note, that server could display time zone alias instead of specified name.
Example: W-SU is an alias for Europe/Moscow and Zulu is an alias for UTC.
–>

<!-- You can specify umask here (see "man umask"). Server will apply it on startup.

Number is always parsed as octal. Default umask is 027 (other users cannot read logs, data files, etc; group can only read).
–>

<!-- Perform mlockall after startup to lower first queries latency

and to prevent clickhouse executable from being paged out under high IO load.
Enabling this option is recommended but will lead to increased startup time for up to a few seconds.
–>
<mlock_executable>true</mlock_executable>

<!-- Reallocate memory for machine code ("text") using huge pages. Highly experimental. -->
<remap\_executable>false</remap\_executable>

<![CDATA[

Uncomment below in order to use JDBC table engine and function.

To install and run JDBC bridge in background:
* [Debian/Ubuntu]
export MVN_URL=https://repo1.maven.org/maven2/ru/yandex/clickhouse/clickhouse-jdbc-bridge
export PKG_VER=$(curl -sL KaTeX parse error: Undefined control sequence: \* at position 61: …' | sed -e 's|.\̲*̲>\(.\*\)<.\*|\1…PKG_VER/clickhouse-jdbc-bridge_ P K G _ V E R − 1 _ a l l . d e b a p t i n s t a l l − − n o − i n s t a l l − r e c o m m e n d s − f . / c l i c k h o u s e − j d b c − b r i d g e _ PKG\_VER-1\_all.deb apt install --no-install-recommends -f ./clickhouse-jdbc-bridge\_ PKG_VER1_all.debaptinstallnoinstallrecommendsf./clickhousejdbcbridge_PKG_VER-1_all.deb
clickhouse-jdbc-bridge &

* [CentOS/RHEL]
export MVN_URL=https://repo1.maven.org/maven2/ru/yandex/clickhouse/clickhouse-jdbc-bridge
export PKG_VER=$(curl -sL KaTeX parse error: Undefined control sequence: \* at position 61: …' | sed -e 's|.\̲*̲>\(.\*\)<.\*|\1…PKG_VER/clickhouse-jdbc-bridge- P K G _ V E R − 1. n o a r c h . r p m y u m l o c a l i n s t a l l − y c l i c k h o u s e − j d b c − b r i d g e − PKG\_VER-1.noarch.rpm yum localinstall -y clickhouse-jdbc-bridge- PKG_VER1.noarch.rpmyumlocalinstallyclickhousejdbcbridgePKG_VER-1.noarch.rpm
clickhouse-jdbc-bridge &

Please refer to https://github.com/ClickHouse/clickhouse-jdbc-bridge#usage for more information.
]]>

<!-- Configuration of clusters that could be used in Distributed tables.

https://clickhouse.com/docs/en/operations/table_engines/distributed/
–>
<remote_servers>

<test_shard_localhost>
<!-- Inter-server per-cluster secret for Distributed queries
default: no secret (no authentication will be performed)

If set, then Distributed queries will be validated on shards, so at least:

  • such cluster should exist on the shard,
  • such cluster should have the same secret.

And also (and which is more important), the initial_user will
be used as current user for the query.

Right now the protocol is pretty simple and it only takes into account:

  • cluster name
  • query

Also it will be nice if the following will be implemented:

  • source hostname (see interserver_http_host), but then it will depends from DNS,
    it can use IP address instead, but then the you need to get correct on the initiator node.

  • target hostname / ip address (same notes as for source hostname)

  • time-based security tokens
    –>

         <shard>
             <!-- Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas). -->
             <!-- <internal\_replication>false</internal\_replication> -->
             <!-- Optional. Shard weight when writing data. Default: 1. -->
             <!-- <weight>1</weight> -->
             <replica>
                 <host>localhost</host>
                 <port>9000</port>
                 <!-- Optional. Priority of the replica for load\_balancing. Default: 1 (less value has more priority). -->
                 <!-- <priority>1</priority> -->
             </replica>
         </shard>
     </test\_shard\_localhost>
     <test\_cluster\_one\_shard\_three\_replicas\_localhost>
         <shard>
             <internal\_replication>false</internal\_replication>
             <replica>
                 <host>127.0.0.1</host>
                 <port>9000</port>
             </replica>
             <replica>
                 <host>127.0.0.2</host>
                 <port>9000</port>
             </replica>
             <replica>
                 <host>127.0.0.3</host>
                 <port>9000</port>
             </replica>
         </shard>
         <!--shard>
    

<internal_replication>false</internal_replication>

127.0.0.1
9000


127.0.0.2
9000


127.0.0.3
9000


</test_cluster_one_shard_three_replicas_localhost>
<test_cluster_two_shards_localhost>


localhost
9000




localhost
9000


</test_cluster_two_shards_localhost>
<test_cluster_two_shards>


127.0.0.1
9000




127.0.0.2
9000


</test_cluster_two_shards>
<test_cluster_two_shards_internal_replication>

<internal_replication>true</internal_replication>

127.0.0.1
9000



<internal_replication>true</internal_replication>

127.0.0.2
9000


</test_cluster_two_shards_internal_replication>
<test_shard_localhost_secure>


localhost
9440
1


</test_shard_localhost_secure>
<test_unavailable_shard>


localhost
9000




localhost
1


</test_unavailable_shard>
</remote_servers>

<!-- The list of hosts allowed to use in URL-related storage engines and table functions.

If this section is not present in configuration, all hosts are allowed.
–>

    <!-- Regular expression can be specified. RE2 engine is used for regexps.

Regexps are not aligned: don’t forget to add ^ and $. Also don’t forget to escape dot (.) metacharacter
(forgetting to do so is a common source of error).
–>

<!-- If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.

By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in ‘include_from’ element.
Values for substitutions are specified in /clickhouse/name_of_substitution elements in that file.
–>

<!-- ZooKeeper is used to store metadata about replicas, when using Replicated tables.

Optional. If you don’t use replicated tables, you could omit that.

See https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication/
–>

<!--
example1 2181 example2 2181 example3 2181 -->
<!-- Substitutions for parameters of replicated tables.

Optional. If you don’t use replicated tables, you could omit that.

See https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication/#creating-replicated-tables
–>

<!-- Reloading interval for embedded dictionaries, in seconds. Default: 3600. -->
<builtin\_dictionaries\_reload\_interval>3600</builtin\_dictionaries\_reload\_interval>


<!-- Maximum session timeout, in seconds. Default: 3600. -->
<max\_session\_timeout>3600</max\_session\_timeout>

<!-- Default session timeout, in seconds. Default: 60. -->
<default\_session\_timeout>60</default\_session\_timeout>

<!-- Sending data to Graphite for monitoring. Several sections can be defined. -->
<!--

interval - send every X second
root_path - prefix for keys
hostname_in_path - append hostname to root_path (default = true)
metrics - send data from table system.metrics
events - send data from table system.events
asynchronous_metrics - send data from table system.asynchronous_metrics
–>
<!–

localhost
42000
0.1
60
<root_path>one_min</root_path>
<hostname_in_path>true</hostname_in_path>

true
true
<events_cumulative>false</events_cumulative>
<asynchronous_metrics>true</asynchronous_metrics>


localhost
42000
0.1
1
<root_path>one_sec</root_path>

true
true
<events_cumulative>false</events_cumulative>
<asynchronous_metrics>false</asynchronous_metrics>

–>

<!-- Serve endpoint for Prometheus monitoring. -->
<!--

endpoint - mertics path (relative to root, statring with “/”)
port - port to setup server. If not defined or 0 than http_port used
metrics - send data from table system.metrics
events - send data from table system.events
asynchronous_metrics - send data from table system.asynchronous_metrics
status_info - send data from different component from CH, ex: Dictionaries status
–>
<!–

/metrics
9363

true
true
<asynchronous_metrics>true</asynchronous_metrics>
<status_info>true</status_info>

–>

<!-- Query log. Used only for queries with setting log\_queries = 1. -->
<query\_log>
    <!-- What table to insert data. If table is not exist, it will be created.

When query log structure is changed after system update,
then old table will be renamed and new table will be created automatically.
–>
system

query_log


<partition_by>toYYYYMM(event_date)</partition_by>
<!–
Table TTL specification: https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree/#mergetree-table-ttl
Example:
event_date + INTERVAL 1 WEEK
event_date + INTERVAL 7 DAY DELETE
event_date + INTERVAL 2 WEEK TO DISK ‘bbb’

event_date + INTERVAL 30 DAY DELETE
–>

    <!-- Instead of partition\_by, you can provide full engine expression (starting with ENGINE = ) with parameters,

Example: ENGINE = MergeTree PARTITION BY toYYYYMM(event_date) ORDER BY (event_date, event_time) SETTINGS index_granularity = 1024
–>

    <!-- Interval of flushing data. -->
    <flush\_interval\_milliseconds>7500</flush\_interval\_milliseconds>
</query\_log>

<!-- Trace log. Stores stack traces collected by query profilers.

See query_profiler_real_time_period_ns and query_profiler_cpu_time_period_ns settings. -->
<trace_log>
system

trace_log

    <partition\_by>toYYYYMM(event_date)</partition\_by>
    <flush\_interval\_milliseconds>7500</flush\_interval\_milliseconds>
</trace\_log>

<!-- Query thread log. Has information about all threads participated in query execution.

Used only for queries with setting log_query_threads = 1. -->
<query_thread_log>
system

query_thread_log

<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_thread_log>

<!-- Query views log. Has information about all dependent views associated with a query.

Used only for queries with setting log_query_views = 1. -->
<query_views_log>
system

query_views_log

<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_views_log>

<!-- Uncomment if use part log.

Part log contains information about all actions with parts in MergeTree tables (creation, deletion, merges, downloads).–>
<part_log>
system

part_log

<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</part_log>

<!-- Uncomment to write text log into table.

Text log contains all information from usual server log but stores it in structured and efficient way.
The level of the messages that goes to the table can be limited (), if not specified all messages will go to the table.
<text_log>
system

img
img

网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。

需要这份系统化的资料的朋友,可以添加戳这里获取

一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!

iews_log>

<!-- Uncomment if use part log.

Part log contains information about all actions with parts in MergeTree tables (creation, deletion, merges, downloads).–>
<part_log>
system

part_log

<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</part_log>

<!-- Uncomment to write text log into table.

Text log contains all information from usual server log but stores it in structured and efficient way.
The level of the messages that goes to the table can be limited (), if not specified all messages will go to the table.
<text_log>
system

[外链图片转存中…(img-kfxxPcGX-1715800606260)]
[外链图片转存中…(img-LUXr5AK1-1715800606260)]

网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。

需要这份系统化的资料的朋友,可以添加戳这里获取

一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!

  • 13
    点赞
  • 21
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值