StarRocks入门到熟练

1、部署

1.1、注意事项

需要根据业务需求设计严谨的集群架构,一般来说,需要注意以下几项:

1.1.1、FE数量及高可用
  • FE的Follower要求为奇数个,且并不建议部署太多,通常我们推荐部署1个或3个Follower。
  • 在三个Follower时,即可实现高可用(HA)。此时,若Leader节点进程挂掉或与集群脱离通信,其他2个Follower节点会通过bdbje协议快速重新选主出一个Leader,保证集群的正常工作(FE
    Leader节点异常仅影响集群写入,不会对集群对外的查询功能有影响)。
  • 集群中需要有半数以上的Follower节点存活,才可进行FE的重新选主。
  • 一般情况下,推荐部署1 Follower+多Observer的FE架构,通过增加Observer来扩展集群的高并发读服务能力。
  • 通常,一个FE节点可以应对10-20台BE节点,建议总的FE节点数量在10个以下,而一般3个即可满足绝大部分业务需求。
1.1.2、BE实例数量
  • BE实例数量直接决定了集群整体查询延迟,生产环境BE数量建议至少为3个。
  • 为保障数据安全,建议至少部署三个BE实例(不同Host)。
1.1.3、Broker实例数量
  • 通常与BE混布,与BE数量保持相同,建议所有的Broker使用相同的名称,这样在执行Broker任务时可以并行使用多个Broker实例。
  • 如果业务中不需要和Hadoop类的产品对接,也可以不部署Broker。
1.1.3、FE与BE是否混布
  • 单台机器下,同集群FE不能混布,BE虽然能混布但是没有价值。
  • FE和BE是可以混部的,但生产环境下需要注意CPU、内存等资源竞争问题,并保证元数据目录和数据目录分属不同磁盘。

1.2、集群架构设计

根据实际业务情况粗估集群规模后,无法较为准确的判断出对应的集群规模,建议先部署一套最小架构的集群,后面逐步扩容。

1.2.1、服务器配置
主机名配置
192.168.110.101(node-101)6核 32GB 万兆网卡 CPU支持AVX2指令集
192.168.110.102(node-102)6核 32GB 万兆网卡 CPU支持AVX2指令集
192.168.110.103(node-103)6核 32GB 万兆网卡 CPU支持AVX2指令集
192.168.110.104(node-104)6核 32GB 万兆网卡 CPU支持AVX2指令集
192.168.110.105(node-105)6核 32GB 万兆网卡 CPU支持AVX2指令集
1.2.2、集群架构设计
机器节点部署服务
192.168.110.101(node-101)FE(Leader)、MySQL-Client
192.168.110.102(node-102)FE(Observer)
192.168.110.103(node-103)BE、Broker
192.168.110.104(node-104)BE、Broker
192.168.110.105(node-105)BE、Broker
1.2.2、节点目录设计
FE部署目录:/opt/module/starrocks/fe;日志目录:/data/starrocks/log/fe;元数据目录:/data/starrocks/data/meta
BE部署目录:/opt/module/starrocks/be;日志目录:/data/starrocks/log/be;数据存储目录:/data/starrocks/data/storage
Broker部署目录:/opt/module/starrocks/apache_hdfs_broker

1.3、使用普通用户

实际生产中,个别场景下是使用用户名作为鉴权方式,为贴合实际业务,后续的部署操作分别新建starrocks用户进行(密码也暂设为starrocks):

1.3.1、创建用户
# 设置用户密码的方式为明文
ansible cluster -m user -a "name=starrocks password='123456'"
# 对用户starrocks的登录密码进行加密
ansible cluster -m user -a "name=starrocks password={{'123456'|password_hash('sha512')}}"
# 查看starrocks用户的密码状态
grep starrocks /etc/shadow

再分别对其它节点新建用户starrocks(操作同上,略)。

1.3.2、创建目录并授权

使用root用户分别在各节点上新建目录,并将文件夹所有者变更为starrocks用户:

ansible cluster -m shell -a "mkdir -p /opt/module/starrocks/"
ansible cluster -m shell -a "mkdir -p /data/starrocks/log/{fe,be}"
ansible cluster -m shell -a "mkdir -p /data/starrocks/data/{meta,storage}"
ansible cluster -m shell -a "chown -R starrocks:starrocks /data/starrocks"

1.3.3、免密登录

为starrocks用户配置集群间SSH免密。SSH免密配置方法比较灵

1、生成密钥
 su starrocks
 ssh-keygen -t rsa
2、分发密钥

分发至集群其他所有节点

ssh-copy-id 192.168.110.101

1.4、解压安装

1.4.1、解压安装包
tar -zxf StarRocks-3.1.1.tar.gz -C /opt/module/
1.4.2、分发部署文件
ansible cluster -m copy -a 'src=/opt/module/StarRocks-3.1.1/fe dest=/opt/module/starrocks/'
ansible cluster -m copy -a 'src=/opt/module/StarRocks-3.1.1/be dest=/opt/module/starrocks/'
ansible cluster -m copy -a 'src=/opt/module/StarRocks-3.1.1/apache_hdfs_broker dest=/opt/module/starrocks/'

1.4.3、修改FE配置文件
结合业务情况修改对应配置

a、修改Java堆内存,避免GC建议16G以上;
b、设置元数据目录,默认目录为fe/meta,我们需要新建目录并修改配置文件,上文已创建;
c、注意默认端口,避免端口冲突,正常情况下不需要修改;
d、绑定IP(CIDR表示法),避免多网卡情况下FE无法自动找到正确的IP。再次注意,如果不清楚CIDR表示法,就直接填写完整ip,例如配置为priority_networks = 192.168.110.101,这样的写法等同于priority_networks = 192.168.110.101/32;

vi fe/conf/fe.conf

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

#####################################################################
## The uppercase properties are read and exported by bin/start_fe.sh.
## To see all Frontend configurations,
## see fe/src/com/starrocks/common/Config.java

# the output dir of stderr/stdout/gc
# LOG_DIR = ${STARROCKS_HOME}/log
LOG_DIR = /data/starrocks/log/fe

JAVA_HOME=/usr/java/jdk-17

DATE = "$(date +%Y%m%d-%H%M%S)"
JAVA_OPTS="-Dlog4j2.formatMsgNoLookups=true -Xmx16384m -XX:+UseMembar -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -Xloggc:${LOG_DIR}/fe.gc.log.$DATE -XX:+PrintConcurrentLocks"

# For jdk 11+, this JAVA_OPTS will be used as default JVM options
JAVA_OPTS_FOR_JDK_11="-Dlog4j2.formatMsgNoLookups=true -Xmx8192m -XX:+UseG1GC -Xlog:gc*:${LOG_DIR}/fe.gc.log.$DATE:time"

##
## the lowercase properties are read by main program.
##

# DEBUG, INFO, WARN, ERROR, FATAL
sys_log_level = INFO

# store metadata, create it if it is not exist.
# Default value is ${STARROCKS_HOME}/meta
# meta_dir = ${STARROCKS_HOME}/meta
meta_dir = /data/starrocks/data/meta

http_port = 8030
rpc_port = 9020
query_port = 9030
edit_log_port = 9010
mysql_service_nio_enabled = true

# Enable jaeger tracing by setting jaeger_grpc_endpoint
# jaeger_grpc_endpoint = http://localhost:14250

# Choose one if there are more than one ip except loopback address. 
# Note that there should at most one ip match this list.
# If no ip match this rule, will choose one randomly.
# use CIDR format, e.g. 10.10.10.0/24
# Default value is empty.
# priority_networks = 10.10.10.0/24;192.168.0.0/16
priority_networks =192.168.110.101

# Advanced configurations 
# log_roll_size_mb = 1024
sys_log_dir = /data/starrocks/log/fe
# sys_log_roll_num = 10
# sys_log_verbose_modules = 
audit_log_dir = /data/starrocks/log/fe
# audit_log_modules = slow_query, query
# audit_log_roll_num = 10
# meta_delay_toleration_second = 10
# qe_max_connection = 1024
# max_conn_per_user = 100
# qe_query_timeout_second = 300
# qe_slow_log_ms = 5000
max_routine_load_batch_size = 524288000
routine_load_task_consume_second = 3
routine_load_task_timeout_second = 15

1.4.4、修改BE配置文件
结合业务需求参考官方文档进行修改:

a、注意默认端口,避免端口冲突,正常情况下不需要修改;
b、绑定IP,避免多网卡情况下BE无法自动找到正确的IP;
c、设置数据存储目录,默认目录为be/storage,我们建议根据磁盘情况新建目录并修改配置文件

vi be/conf/be.conf 
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

# INFO, WARNING, ERROR, FATAL
sys_log_level = INFO

# ports for admin, web, heartbeat service 
be_port = 9060
be_http_port = 8040
heartbeat_service_port = 9050
brpc_port = 8060

# Enable jaeger tracing by setting jaeger_endpoint
# jaeger_endpoint = localhost:6831

# Choose one if there are more than one ip except loopback address. 
# Note that there should at most one ip match this list.
# If no ip match this rule, will choose one randomly.
# use CIDR format, e.g. 10.10.10.0/24
# Default value is empty.
# priority_networks = 10.10.10.0/24;192.168.0.0/16
priority_networks = 192.168.110.103

# data root path, separate by ';'
# you can specify the storage medium of each root path, HDD or SSD, seperate by ','
# eg:
# storage_root_path = /data1,medium:HDD;/data2,medium:SSD;/data3
# /data1, HDD;
# /data2, SSD;
# /data3, HDD(default);
#
# Default value is ${STARROCKS_HOME}/storage, you should create it by hand.
# storage_root_path = ${STARROCKS_HOME}/storage
storage_root_path = /data/starrocks/data/storage,medium:ssd

# Advanced configurations
sys_log_dir = /data/starrocks/log/be
# sys_log_roll_mode = SIZE-MB-1024
# sys_log_roll_num = 10
# sys_log_verbose_modules = *
# log_buffer_level = -1

# JVM options for be
JAVA_HOME=/usr/java/jdk-17
# eg:
# JAVA_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf"
# For jdk 9+, this JAVA_OPTS will be used as default JVM options
# JAVA_OPTS_FOR_JDK_9="-Djava.security.krb5.conf=/etc/krb5.conf"
base_compaction_check_interval_seconds = 10
cumulative_compaction_num_threads_per_disk = 4
base_compaction_num_threads_per_disk = 2
cumulative_compaction_check_interval_seconds = 2
tablet_max_versions = 15000

1.4.5、分发配置文件
1、分发FE、BE配置文件
ansible cluster -m copy -a 'src=/opt/module/starrocks/fe/conf/fe.conf dest=/opt/module/starrocks/fe/conf/'
ansible cluster -m copy -a 'src=/opt/module/starrocks/be/conf/be.conf dest=/opt/module/starrocks/be/conf/'

2、修个各节点FE、BE配置IP及目录

FE

priority_networks = 192.168.110.102

BE

priority_networks = 192.168.110.104
storage_root_path = /opt/module/storage

1.5、安装mysql-client

rpm -ivh mysql-community-client-plugins-8.0.34-1.el7.x86_64.rpm
rpm -ivh mysql-community-libs-8.0.34-1.el7.x86_64.rpm
rpm -ivh mysql-community-client-8.0.34-1.el7.x86_64.rpm
mysql --version

1.6、安装jdk-17

vi ~/.bashrc

export JAVA_HOME=/usr/java/jdk-17
export CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar
export PATH=$JAVA_HOME/bin:$PATH

1.7、部署FE实例

首个启动的FE自动成为Leader

1.7.1、启动FE

可在FE日志目录中查看日志追踪原因,FE的主要日志在fe.log中,所有查询的审计日志在fe.audit.log中。

 ./start_fe.sh --daemon
1.7.2、查看FE状态
jps | grep StarRocksFe
1.7.3、登录StarRocks
mysql -h192.168.110.101 -P9030 -uroot
1.7.4、添加其他FE实例进集群

建议“先将实例添加进入集群”,然后逐个“启动实例”。将node02的FE作为Observer先添加入集群,端口为edit_log_port,默认是9010。

alter system add observer '192.168.110.102:9010';

若需要将其作为Follower角色加入集群,这里的sql写法为:

alter system add follower "fe_host:edit_log_port";

如果添加时IP或端口信息输入有误,或者由于其他情况需要将实例从集群中删除,sql写法如下:
删除Follower:

alter system drop follower "fe_host:edit_log_port";

删除Observer:

alter system drop observer "fe_host:edit_log_port";
1.7.4、启动其他FE

特别注意:除首个启动的FE外,其他FE节点首次启动时,必需指定一个已存在的FE节点作为helper(后面再启动时就不需要)。
首次启动node-102节点,指定node-101的FE实例作为helper:

cd /opt/module/starrocks/fe/bin/
./start_fe.sh --helper 192.168.110.101:9010 --daemon

 jps | grep StarRocksFe
1.7.5、查看集群FE状态

在node-101的mysql-client中查看FE状态:
Alive均为true,说明FE状态正常。若状态为false,可以在日志中定位问题,如果异常排查比较耗时,由于是初次启动,可以清空FE的元数据目录,再从头开始操作。

show frontends\G

1.8、部署BE实例

1.8.1、添加BE实例进入集群

使用mysql-client先将这3个BE实例添加进入集群,这里需要使用的端口是portheartbeat_service_port,默认为9050:

alter system add backend '192.168.110.103:9050';
alter system add backend '192.168.110.104:9050';
alter system add backend '192.168.110.105:9050';

如果BE实例添加时IP或端口信息输入有误,或者由于其他异常情况我们需要在集群中删除BE实例,sql写法:

 alter system dropp backend "be_host:be_heartbeat_service_port";
  • 删除BE实例为危险操作,故删除命令设计为“DROPP”。执行DROPP指令后,集群会立刻删除该BE节点,丢失的副本由FE调度补齐,但若为单副本数据,这部分数据将彻底丢失。
  • 集群投入生产后,每个BE节点都存储了大量的业务数据,此时若使用DROPP指令从集群中删除某个BE实例,则存储在该实例上的数据副本会直接丢失。
  • 安全删除的写法:DECOMMISSION。DECOMMISSION会先保证副本补齐,然后再下掉该BE节点。
  • 在后续业务正常缩容是会建议采用这种方式进行缩容
alter system decommission backend "be_host:be_heartbeat_service_port";
1.8.2、启动BE实例

进程状态异常可在BE日志目录中查看日志追踪原因,BE的主要日志在be.INFO中,其他的日志在be.out中。

cd /opt/module/starrocks/be/bin/
./start_be.sh --daemon

# 检查进程状态
ps -ef | grep starrocks_be
1.8.3、查看启动BE状态

使用mysql-client访问StarRocks集群:
Alive均为true,状态正常,若为false,可根据日志排查问题。同样的,因为当前BE为初次启动,如果出现无法快速定位的问题,可以清空storage数据目录和日志目录,重新启动服务。

show backends\G

1.9、部署Broker实例

1.9.1、修改Broker配置文件

Broker实例不需要绑定IP。生产环境下,通常也不需要修改Broker配置文件中的其他配置。

1.9.2、添加Broker实例到集群

在node-101的mysql-client中,先将3个Broker实例添加进入集群,这里端口使用broker_ipc_port,默认端口为8000:

alter system add broker hdfs_broker '192.168.110.103:8000';
alter system add broker hdfs_broker '192.168.110.104:8000';
alter system add broker hdfs_broker '192.168.110.105:8000';

若需要在集群中删除Broker,sql写法为:

ALTER SYSTEM DROP BROKER broker_name "broker_host:broker_ipc_port";
1.9.3、启动Broker服务
有Broker导入

部署了一组独立的 Broker,并将 hdfs-site.xml 文件放在 HDFS 集群对应的 Broker 节点的 {deploy}/conf 目录下。Broker 进程重启

# 分发 hdfs-site.xml
ansible cluster -m copy -a 'src=/opt/module/starrocks/apache_hdfs_broker/conf/hdfs-site.xml dest=/opt/module/starrocks/apache_hdfs_broker/conf/'

启动Broker服务

注:各节点依次启动即可;如果进程状态异常可查看日志追踪原因。

 cd /opt/module/starrocks/apache_hdfs_broker/bin/
./start_broker.sh --daemon
检查Broker进程状态
jps | grep BrokerBootstrap
1.9.4、查看集群中Broker状态

Alive均为true,状态正常。Broker日志在apache_hdfs_broker.log中,若状态为false,可依据日志定位问题。

mysql -h192.168.110.101 -P9030 -uroot
 
show broker\G
1.9.5、无Broker导入

将 hdfs-site.xml 文件放在每个 FE 节点和每个 BE 节点的 {deploy}/conf 目录下。

# 分发 hdfs-site.xml
ansible cluster -m copy -a 'src=/opt/module/starrocks/fe/conf/hdfs-site.xml dest=/opt/module/starrocks/fe/conf'
ansible cluster -m copy -a 'src=/opt/module/starrocks/be/conf/hdfs-site.xml dest=/opt/module/starrocks/be/conf'

2、集群管理操作

集群部署完成后,若机器重启或有服务down掉,需要手动或编写脚本启停服务。

2.1.1、手动启停

1、 FE启停
cd /opt/module/starrocks/fe/bin/
#FE启动
./start_fe.sh --daemon
# FE停止
./stop_fe.sh
2、 BE启停
cd /opt/module/starrocks/be/bin
# BE启动
./start_be.sh --daemon
# BE停止
./stop_be.sh
3、 Broker启停
 cd /opt/module/starrocks/apache_hdfs_broker/bin/
 # Broker启动
 ./start_broker.sh --daemon
 # Broker停止
 ./stop_broker.sh

2.1.2、集群启停脚本

1、脚本

在node-101节点,使用starrocks用户在/home/starrocks目录下创建starrocks.sh文件:

#!/bin/bash

# use-method: starrocks.sh start|stop|restart

case $1 in

"start"){

    for i in node-101 node-102 node-103 node-104 node-105

    do

        echo "=================== start $i's service ================"

        ssh $i "source /etc/profile.d/my_env.sh ;cd /opt/module/starrocks;./fe/bin/start_fe.sh --daemon"

         ssh $i "/opt/module/starrocks/be/bin/start_be.sh --daemon"

         ssh $i "source /etc/profile.d/my_env.sh ;cd /opt/module/starrocks;./apache_hdfs_broker/bin/start_broker.sh --daemon"

    done

};;

"stop"){

    for i in node-101 node-102 node-103 node-104 node-105

    do

        echo "=================== stop $i's service ================"

         ssh $i "/opt/module/starrocks/apache_hdfs_broker/bin/stop_broker.sh"

         ssh $i "/opt/module/starrocks/be/bin/stop_be.sh"

         ssh $i "/opt/module/starrocks/fe/bin/stop_fe.sh"

    done

};;

"restart")

 starrocks.sh stop

 sleep 2

 starrocks.sh start

;;

*)

 echo "Parameter ERROR!!!"

 ;;

esac
2、可执行权限
chmod a+x starrocks.sh
3、全局调用

切换至root用户,将脚本移动至/bin目录下以便全局调用

mv starrocks.sh /bin/
4、执行脚本

切换至starrocks用户,测试使用脚本启动集群(先确定集群各实例都处于未启动状态):

# 启动集群
starrocks.sh start
# 停止集群服务
starrocks.sh stop
# 重启集群服务
starrocks.sh restart

2.1.3、使用实例

1、修改管理员账户密码

当前版本,StarRocks部署完成后自带一个高权限用户:root,其默认密码为空。在node-101上使用mysql-client访问StarRocks集群:

 mysql -h192.168.110.101 -P9030 -uroot
# 以root用户为例,生产环境建议设置复杂密码
set password=password('StarRocks*2308');
2、创建数据库
create database ods;
3、创建用户并授权
CREATE USER 'starrocks'@'%' IDENTIFIED BY 'StarRocks*2308' DEFAULT ROLE user_admin;
grant all on ods.* to 'starrocks'@'%';
grant  create table  on  database ods to user 'starrocks'@'%';

4、更换用户登录
mysql -h192.168.110.101 -P9030 -ustarrocks -pStarRocks*2308
5、创建数据表
use star;

CREATE TABLE IF NOT EXISTS `customer` (
  `c_custkey` int(11) NOT NULL COMMENT "",
  `c_name` varchar(26) NOT NULL COMMENT "",
  `c_address` varchar(41) NOT NULL COMMENT "",
  `c_city` varchar(11) NOT NULL COMMENT "",
  `c_nation` varchar(16) NOT NULL COMMENT "",
  `c_region` varchar(13) NOT NULL COMMENT "",
  `c_phone` varchar(16) NOT NULL COMMENT "",
  `c_mktsegment` varchar(11) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`c_custkey`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`c_custkey`) BUCKETS 3
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);
6、插入数据
insert into customer
select 
1,'Customer#000000001','j5JsirBM9P','MOROCCO  0','MOROCCO','AFRICA','25-989-741-2988','BUILDING';
7、Stream Load导入
curl --location-trusted -u starrocks:StarRocks*2308-H "label:star_customer" -H "column_separator:|" -T /home/starrocks/customer.tbl http://192.168.110.101:8030/api/star/customer/_stream_load

3、数据同步

3.1、Broker Load

3.1.1、创建表
CREATE TABLE dwd.dwd_enterprise_info(
  id BIGINT NOT NULL AUTO_INCREMENT COMMENT '自增主键', 
  `eid` string COMMENT '企业eid', 
  `credit_no` string COMMENT '统一社会信用代码', 
  `name` string COMMENT '企业名称', 
  `ipcs` string COMMENT '专利IPC分类号', 
  `titles` string COMMENT '专利标题', 
  `scope` string COMMENT '经营范围', 
  `introduction` string COMMENT '企业简介', 
  `main_business` string COMMENT '主营业务', 
  `industry_code` string COMMENT '行业代码', 
  `industrial_field_new` string COMMENT '行业领域', 
  `html` string COMMENT '网站', 
  `hat_name` string COMMENT '企业资质')
PRIMARY KEY (id,eid)
COMMENT '企业产业链匹配专用表 - 含企业简介主营专利信息'
DISTRIBUTED BY HASH(id,eid) BUCKETS 3
PROPERTIES (
"replication_num" = "1",
"replicated_storage" = "true"
);

3.1.2、同步数据
LOAD LABEL dwd.dwd_alg_enterprise_business_patents_scope_info
(
DATA INFILE("hdfs://IP:8020/warehouse/tablespace/external/hive/dwd.db/dwd_enterprise_info/*")
INTO TABLE dwd_alg_enterprise_business_patents_scope_info
COLUMNS TERMINATED BY "|^|"
FORMAT AS "orc"
(eid, credit_no, name,ipcs,titles,scope,introduction,main_business,industry_code,industrial_field_new,html,hat_name)
)
WITH BROKER 'hdfs_broker';

4、权限

4.1、普通用户

4.1.1、只读用户
1、创建只读role

CREATE ROLE readonly;
2、授权只读role
GRANT SELECT ON ALL TABLES IN DATABASE dim TO ROLE readonly;
GRANT SELECT ON ALL TABLES IN DATABASE tmp TO ROLE readonly;
GRANT SELECT ON ALL TABLES IN DATABASE ods TO ROLE readonly;
GRANT SELECT ON ALL TABLES IN DATABASE dwd TO ROLE readonly;
GRANT SELECT ON ALL TABLES IN DATABASE dws TO ROLE readonly;
GRANT SELECT ON ALL TABLES IN DATABASE ads TO ROLE readonly;
3、创建只读用户
CREATE USER 'ReadOnly'@'%' IDENTIFIED BY 'Read-Only*2023' DEFAULT ROLE readonly;

4、授权只读用户
GRANT SELECT ON dim.* TO 'ReadOnly'@'%';
5、查看权限
SHOW GRANTS; -- 查看当前用户的权限信息。
SHOW GRANTS FOR ROLE <role_name>; -- 查看指定角色的权限信息。
SHOW GRANTS FOR <user_identity>; -- 查看指定用户的权限信息。

4.2、资源授权

4.2.1、创建Hive 资源
1、创建一个名为 hive_pro 的 Hive 资源
CREATE EXTERNAL RESOURCE "hive_pro"
PROPERTIES (
  "type" = "hive",
  "hive.metastore.uris" = "thrift://IP:9083,thrift://IP:9083"
);
2、查看 StarRocks 中创建的资源
SHOW RESOURCES;
3、删除名为 hive_pro 的资源
DROP RESOURCE "hive_pro";
4、给 RESOURCE 授权
GRANT ALL ON RESOURCE hive_pro TO 'starrocks'@'%';

4.3、catalog

4.3.1、创建Hive catalog
1、创建一个名为 hive_metastore_catalog的 Hive Catalog
CREATE EXTERNAL CATALOG hive_metastore_catalog
PROPERTIES
(
"type" = "hive",
"hive.metastore.uris" = "thrift://IP:9083,thrift://IP:9083",
"enable_metastore_cache" = "true",
"metastore_cache_refresh_interval_sec"="30"
);
4.3.2、授权 Hive catalog
1、赋权给starrocks用户
grant all on catalog hive_metastore_catalog TO 'starrocks'@'%';

2、查看catalog
show databases from hive_metastore_catalog;
show tables from hive_metastore_catalog.ads;
3、切换catalog
set catalog hive_metastore_catalog;

SET catalog default_catalog;

show databases;
4、授权database

set catalog hive_metastore_catalog;

grant all on dim.* to 'starrocks'@'%';
grant all on tmp.* to 'starrocks'@'%';
grant all on ods.* to 'starrocks'@'%';
grant all on dwd.* to 'starrocks'@'%';
grant all on dws.* to 'starrocks'@'%';
grant all on ads.* to 'starrocks'@'%';
5、刷新Hive外表元数据

use ods_log;
set enable_profile =true;
refresh external table ods_sensors_cd_video_production_operationpositionexposure_hive;

4.4、物化视图

4.4.1、权限
1、赋权
set catalog default_catalog;

grant create materialized view on database ods to user 'starrocks'@'%';
grant select on all materialized views in database ods to user 'starrocks'@'%';
grant drop on all materialized views in database ods to user 'starrocks'@'%';

4.4.2、创建视图
1、创建异步物化视图
create materialized view if not exists ods.ods_qxb_license_info_num
comment "行政许可"
distributed by hash(eid,diff) buckets 2
refresh manual
as
select  t.eid,t.diff,count(1) as license_info_num
from (
select eid,end_date , 
(case 
	when datediff(CAST(end_date AS DATETIME),now()) >= 0 then '1'
	when datediff(CAST(end_date AS DATETIME),now()) < 0	then '-1'
	else '0' end
) as diff
from ods.ods_qxb_t_license_info ) t 
group by t.eid,t.diff;
2、查看异步物化视图创建语句
show materialized views from ods where name = "ods_qxb_license_info_num" ;
show materialized views from ods where name like "ods_qxb_%" ;
3、删除异步物化视图
DROP MATERIALIZED VIEW ods.ods_qxb_license_info_num;
4、手动刷新异步物化视图
-- 异步调用刷新任务。
REFRESH MATERIALIZED VIEW ods.ods_qxb_license_info_num;
-- 同步调用刷新任务。
REFRESH MATERIALIZED VIEW ods.ods_qxb_license_info_num WITH SYNC MODE;
5、查看物化视图数据量
show data from  dws.dws_qxb_license_info_num

5、集群管理

5.1、ADMIN SET CONFIG

该语句用于设置集群的配置项,设置后的配置项,在 FE 重启之后会恢复成 fe.conf 文件中的配置或者默认值。如果需要让配置长期生效,建议设置完之后同时修改 fe.conf 文件,防止重启后修改失效。

5.1.1、语法
1、示例
ADMIN SHOW FRONTEND CONFIG LIKE "%storage_usage_soft_limit_percent%";
ADMIN SHOW FRONTEND CONFIG LIKE "%storage_usage_soft_limit_reserve_bytes%";

ADMIN SET FRONTEND CONFIG ("storage_usage_soft_limit_percent" = "95");
ADMIN SET FRONTEND CONFIG ("storage_usage_soft_limit_reserve_bytes" = "50000000000");

ADMIN SHOW FRONTEND CONFIG LIKE "%storage_flood_stage_usage_percent%";
ADMIN SHOW FRONTEND CONFIG LIKE "%storage_flood_stage_left_capacity_bytes%";

ADMIN SET FRONTEND CONFIG ("storage_flood_stage_left_capacity_bytes" = "50000000000");

5.2、内存占用计算公式

主键模型,排序键位dt、id,数据类型为date(4个字节)、bigint(8个字节)。则排序键占12个字节。若该表数据有1000万行,存储为三副本。则内存占用计算公式为:

(12+9(每行固定开销))*1000W*3*1.5(哈希表平均额外开销)=945M

6、表模型

6.1、动态分区表

可以按需对新数据动态创建分区,同时 StarRocks 会自动删除过期分区,从而确保数据的实效性,实现对分区的生命周期管理(Time to Life,简称 “TTL”),降低运维管理的成本。

CREATE TABLE tmp.site_access(
event_day DATE,
site_id INT DEFAULT '10',
city_code VARCHAR(100),
user_name VARCHAR(32) DEFAULT '',
pv BIGINT DEFAULT '0'
)
DUPLICATE KEY(event_day, site_id, city_code, user_name)
COMMENT "用户订阅续订报表"
PARTITION BY RANGE(event_day)
(START ("2024-01-01") END ("2025-01-01") EVERY (INTERVAL 1 MONTH))
DISTRIBUTED BY HASH(event_day, site_id)
BUCKETS 3
PROPERTIES (
"replication_num" = "3",
"dynamic_partition.enable" = "true", -- 开启动态分区特性,取值为 true(默认)或 false
"dynamic_partition.time_unit" = "MONTH", -- HOUR(yyyyMMddHH)、DAY(yyyyMMdd)、WEEK(yyyy_ww)、MONTH(yyyyMM)或YEAR(yyyy)
"dynamic_partition.time_zone" = "Asia/Shanghai",
"dynamic_partition.start" = "-24",		-- 保留近2年的数据;不设置此参数代表保存所有数据
"dynamic_partition.end" = "3", -- 提前创建的分区数量,取值范围为正整数
"dynamic_partition.prefix" = "p", -- 动态分区的前缀名,默认值为 p
"dynamic_partition.buckets" = "3", -- 动态分区的分桶数量
"dynamic_partition.history_partition_num" = "0", -- 动态分区的创建历史分区的个数,默认值为0。当值>0时会提前创建历史分区
"dynamic_partition.start_day_of_month" = "1", -- 指定每月的第一天,每月的第一天不支持为29日、30日、31日
"in_memory" = "false",
"enable_persistent_index" = "true",
"replicated_storage" = "true",
"compression" = "LZ4"
);

6.2、批量创建分区

建表时批量创建日期分区
当分区键为日期类型时,建表时通过 start、end指定批量分区的开始日期和结束日期,every子句指定分区增量值。并且every子句中用 interval关键字表示日期间隔,目前仅支持日期间隔的单位为day、week、month、year。

#如下,批量分区的开始日期为2021-01-01和结束日期为2021-01-04,增量值为一天:
create table site_access (
    datekey date,
    site_id int,
    city_code smallint,
    user_name varchar(32),
    pv bigint default '0'
)
engine=olap
duplicate key(datekey, site_id, city_code, user_name)
partition by range (datekey) (
    start ("2021-01-01") end ("2021-01-04") every (interval 1 day)
)
distributed by hash(site_id) buckets 10
properties (
    "replication_num" = "3" 
);
-- 则相当于在建表语句中使用如下partition by子句:前闭后开
partition by range (datekey) (
    partition p20210101 values [('2021-01-01'), ('2021-01-02')),
    partition p20210102 values [('2021-01-02'), ('2021-01-03')),
    partition p20210103 values [('2021-01-03'), ('2021-01-04'))
)

建表时批量创建区划分区
当分区键为区域类型时,建表时通过 start、end指定批量分区的开始和结束,every子句指定分区增量值。

create table tmp.tmp_cbht (
      cbhtbm                string               comment "",
      fbfbm                 string               comment "",
      cbfbm                 string               comment "",
      cbfs                  string               comment "",
      cbqxq                 datetime             comment "",
      cbqxz                 datetime             comment "",
      partition_no          bigint               comment ""
) engine=olap
duplicate key(cbhtbm)
comment ""
partition  by   range(partition_no) (
start("110000")  end ("160000")  every (10000),
start("210000")  end ("240000")  every (10000),
start("310000")  end ("380000")  every (10000),
start("410000")  end ("470000")  every (10000),
start("500000")  end ("550000")  every (10000),
start("610000")  end ("660000")  every (10000),
start("710000")  end ("720000")  every (10000),
start("810000")  end ("830000")  every (10000)
)
distributed by hash(cbhtbm) buckets 8
properties (
"replication_num" = "3",
"in_memory" = "false",
"storage_format" = "default"
);

6.3、查看分区表

-- 查看分区
show partitions from tmp.site_access;
-- 查看tablet
show tablet from tmp.site_access;
show tablet tmp.site_access partition(p20241105);
-- 修复tablet
alter tablet replace repair with tablet_id=95524579;

7、tablet状态修复

7.1、tablet状态修复

# 查看有问题的tablet 在哪个分区 :
show tablet 95524579;

# 查看分区所的所有tablet状态
SHOW TABLET FROM shennong.ods_sensors_production_anrTrace partition(p20241028);

# 修复命令
ADMIN SET REPLICA STATUS PROPERTIES("tablet_id" = "95524579", "backend_id" = "10007", "status" = "bad");
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值