摘要:由于mysql的数据存储容量有限,考虑使用TiDB替代mysql,需要部署TiDB,将mysql数据迁移到TiDB。原来的FineBI是直接连的MySQL,后期使用FineBI直接连接TiDB。
问题点:MySQL和数据中台及Flink cdc 有关联,MySQL应该还是不能够完全替换掉。
- 数据中台是否提供了TiDB元数据管理、hive是否可以直接迁移到TiDB(均无)
- Flink cdc是否支持TiDB(目前不支持)
1. 离线安装部署TiDB,单机模拟集群环境
部署参考链接:https://my.oschina.net/hackenhu/blog/4987911
本地部署的是5.2.0
部署路径:/home/tidb/tidb-community-server-v5.2.0-linux-amd64
topology.yaml 文件
[root@cdh05 tidb]# cat /home/tidb/topology.yaml
global:
user: "tidb"
ssh_port: 22
deploy_dir: "/data/tidb/tidb-deploy"
data_dir: "/data/tidb/tidb-data"
monitored:
node_exporter_port: 9100
blackbox_exporter_port: 9115
server_configs:
tidb:
log.slow-threshold: 300
tikv:
readpool.storage.use-unified-pool: false
readpool.coprocessor.use-unified-pool: true
pd:
replication.enable-placement-rules: true
replication.location-labels: ["host"]
tiflash:
logger.level: "info"
pd_servers:
- host: 172.16.43.159
tidb_servers:
- host: 172.16.43.159
tikv_servers:
- host: 172.16.43.159
port: 20160
status_port: 20180
config:
server.labels: { host: "logic-host-1" }
- host: 172.16.43.159
port: 20161
status_port: 20181
config:
server.labels: { host: "logic-host-2" }
- host: 172.16.43.159
port: 20162
status_port: 20182
config:
server.labels: { host: "logic-host-3" }
tiflash_servers:
- host: 172.16.43.159
monitoring_servers:
- host: 172.16.43.159
grafana_servers:
- host: 172.16.43.159
检查集群状态,命令:tiup cluster display tidb-test
2. TiDB 工具包(TiDB-community-toolkit)下载
TiDB 工具包中包含了一些常用的 TiDB 工具,例如数据导出工具 Dumpling、数据导入工具 TiDB Lightning、备份恢复工具 BR。
下载地址:
https://pingcap.com/zh/product-community/
3. 使用TiDB Dumpling导出MySQL数据
3.1 dumpling介绍
dumpling是PingCAP之前所维护的Mydumper分支工具的下一代。针对TiDB的特性进行了优化,Mydumper绝大部分功能已经被Dumpling取代。该工具可以把存储在TiDB/MySQL中的数据导出为 SQL 或者 CSV 格式,可以用于逻辑备份或导出
3.2 dumpling安装
要初始化tiup,需要切换用户,sh local_install.sh,source /home/tidb/.bash_profile,不然命令行会报找不到tiup command
[tidb@cdh05 root]$ su tidb
Password:
[tidb@cdh05 root]$ cd /home/tidb/tidb-community-server-v5.2.0-linux-amd64
[tidb@cdh05 tidb-community-server-v5.2.0-linux-amd64]$ sh local_install.sh
Disable telemetry success
Successfully set mirror to /home/tidb/tidb-community-server-v5.2.0-linux-amd64
Detected shell: bash
Shell profile: /home/tidb/.bash_profile
/home/tidb/.bash_profile has been modified to to add tiup to PATH
open a new terminal or source /home/tidb/.bash_profile to use it
Installed path: /home/tidb/.tiup/bin/tiup
===============================================
1. source /home/tidb/.bash_profile
2. Have a try: tiup playground
===============================================
[tidb@cdh05 tidb-community-server-v5.2.0-linux-amd64]$ source /home/tidb/.bash_profile
推荐使用TiUP 执行 tiup install dumpling 命令安装dumpling。也可以下载tidb-toolkit安装包(https://docs.pingcap.com/zh/tidb/stable/download-ecosystem-tools#dumpling)进行安装。
[tidb@cdh05 tidb-community-server-v5.2.0-linux-amd64]$ tiup install dumpling
component dumpling version v5.2.0 is already installed
3.3 运行dumpling
使用 tiup dumpling 命令运行 Dumpling
[tidb@cdh05 tidb-community-server-v5.2.0-linux-amd64]$ tiup dumpling
Starting component `dumpling`: /home/tidb/.tiup/components/dumpling/v5.2.0/dumpling
Release version: v5.2.0
Git commit hash: 4e012e54ce89fab9551e7ff58ebcffe57590244a
Git branch: heads/refs/tags/v5.2.0
Build timestamp: 2021-08-23 02:22:44Z
Go version: go version go1.16.4 linux/amd64
[2022/07/13 14:38:53.594 +08:00] [INFO] [versions.go:55] ["Welcome to dumpling"] ["Release Version"=v5.2.0] ["Git Commit Hash"=4e012e54ce89fab9551e7ff58ebcffe57590244a] ["Git Branch"=heads/refs/tags/v5.2.0] ["Build timestamp"="2021-08-23 02:22:44"] ["Go Version"="go version go1.16.4 linux/amd64"]
[2022/07/13 14:38:53.597 +08:00] [INFO] [config.go:608] ["detect server type"] [type=TiDB]
[2022/07/13 14:38:53.597 +08:00] [INFO] [config.go:627] ["detect server version"] [version=5.2.0]
[2022/07/13 14:38:53.602 +08:00] [INFO] [client.go:214] ["[pd] create pd client with endpoints"] [pd-address="[172.16.43.159:2379]"]
[2022/07/13 14:38:53.606 +08:00] [INFO] [base_client.go:346] ["[pd] switch leader"] [new-leader=http://172.16.43.159:2379] [old-leader=]
[2022/07/13 14:38:53.606 +08:00] [INFO] [base_client.go:126] ["[pd] init cluster id"] [cluster-id=7117463424670356047]
[2022/07/13 14:38:53.606 +08:00] [INFO] [client.go:238] ["[pd] create tso dispatcher"] [dc-location=global]
[2022/07/13 14:38:53.607 +08:00] [INFO] [dump.go:1146] ["generate dumpling gc safePoint id"] [id=dumpling_1657694333607622949]
[2022/07/13 14:38:53.612 +08:00] [INFO] [dump.go:93] ["begin to run Dump"] [conf="{\"s3\":{\"endpoint\":\"\",\"region\":\"\",\"storage-class\":\"\",\"sse\":\"\",\"sse-kms-key-id\":\"\",\"acl\":\"\",\"access-key\":\"\",\"secret-access-key\":\"\",\"provider\":\"\",\"force-path-style\":true,\"use-accelerate-endpoint\":false},\"gcs\":{\"endpoint\":\"\",\"storage-class\":\"\",\"predefined-acl\":\"\",\"credentials-file\":\"\"},\"AllowCleartextPasswords\":false,\"SortByPk\":true,\"NoViews\":true,\"NoHeader\":false,\"NoSchemas\":false,\"NoData\":false,\"CompleteInsert\":false,\"TransactionalConsistency\":true,\"EscapeBackslash\":true,\"DumpEmptyDatabase\":true,\"PosAfterConnect\":false,\"CompressType\":0,\"Host\":\"127.0.0.1\",\"Port\":4000,\"Threads\":4,\"User\":\"root\",\"Security\":{\"CAPath\":\"\",\"CertPath\":\"\",\"KeyPath\":\"\"},\"LogLevel\":\"info\",\"LogFile\":\"\",\"LogFormat\":\"text\",\"OutputDirPath\":\"./export-2022-07-13T14:38:53+08:00\",\"StatusAddr\":\":8281\",\"Snapshot\":\"434554623386714115\",\"Consistency\":\"snapshot\",\"CsvNullValue\":\"\\\\N\",\"SQL\":\"\",\"CsvSeparator\":\",\",\"CsvDelimiter\":\"\\\"\",\"Databases\":[],\"Where\":\"\",\"FileType\":\"sql\",\"ServerInfo\":{\"HasTiKV\":true,\"ServerType\":3,\"ServerVersion\":\"5.2.0\"},\"Rows\":0,\"ReadTimeout\":900000000000,\"TiDBMemQuotaQuery\":0,\"FileSize\":0,\"StatementSize\":1000000,\"SessionParams\":{\"tidb_snapshot\":\"434554623386714115\"},\"Tables\":null}"]
[2022/07/13 14:38:53.679 +08:00] [INFO] [collector.go:232] ["backup success summary"] [total-ranges=1] [ranges-succeed=1] [ranges-failed=0] [total-take=663.356µs]
[2022/07/13 14:38:53.679 +08:00] [INFO] [main.go:81] ["dump data successfully, dumpling will exit now"]
3.4 dumpling常用选项
-B:指定需要导出的数据库,默认Dumpling会导出系统数据库外的所有库
-T:指定需要导出的表,只接受库名.表名形式,如-T employees.WorkOrder代表employees.WorkOrder表
-f:指定需要导出的表,支持通配符
-t:指定导出线程数,增加线程数会增加 Dumpling 并发度提高导出速度,但也会加大数据库内存消耗
-F:指定导出时数据拆分文件大小,如果想使用TiDB Lightning 将该文件加载到 TiDB 实例中,建议控制到256MB或以下
-o:指定导出路径
–where:用于过滤需要导出的数据
–filter:指定需要过滤的库表,如–filter "employees.*"代表导出employees库的所有表
3.5 dumpling 使用示例
dumpling成功执行后,会输出metadata、{schema}-schema-create.sql、{schema}.{table}-schema.sql这样的文件,其中metadata记录了position信息,其他文件则是建表和导入数据的SQL
3.5.1 将数据导出为SQL文件
运行之后,可以看到 dumpling命令的存放位置,Starting component dumpling
: /home/tidb/.tiup/components/dumpling/v5.2.0/dumpling
使用dumpling 将mysql 数据库的数据导出为sql数据文件
[tidb@cdh05 tidb-community-server-v5.2.0-linux-amd64]$ sudo /home/tidb/.tiup/components/dumpling/v5.2.0/dumpling -h 172.16.43.159 -P 3306 -u root -p root@123 -t 3 -F 256MB -B test -o /data/backup/mysql/20220713/
Release version: v5.2.0
Git commit hash: 4e012e54ce89fab9551e7ff58ebcffe57590244a
Git branch: heads/refs/tags/v5.2.0
Build timestamp: 2021-08-23 02:22:44Z
Go version: go version go1.16.4 linux/amd64
[2022/07/13 14:40:06.447 +08:00] [INFO] [versions.go:55] ["Welcome to dumpling"] ["Release Version"=v5.2.0] ["Git Commit Hash"=4e012e54ce89fab9551e7ff58ebcffe57590244a] ["Git Branch"=heads/refs/tags/v5.2.0] ["Build timestamp"="2021-08-23 02:22:44"] ["Go Version"="go version go1.16.4 linux/amd64"]
[2022/07/13 14:40:06.449 +08:00] [INFO] [config.go:608] ["detect server type"] [type=MySQL]
[2022/07/13 14:40:06.449 +08:00] [INFO] [config.go:627] ["detect server version"] [version=5.7.15-log]
[2022/07/13 14:40:06.449 +08:00] [INFO] [dump.go:93] ["begin to run Dump"] [conf="{\"s3\":{\"endpoint\":\"\",\"region\":\"\",\"storage-class\":\"\",\"sse\":\"\",\"sse-kms-key-id\":\"\",\"acl\":\"\",\"access-key\":\"\",\"secret-access-key\":\"\",\"provider\":\"\",\"force-path-style\":true,\"use-accelerate-endpoint\":false},\"gcs\":{\"endpoint\":\"\",\"storage-class\":\"\",\"predefined-acl\":\"\",\"credentials-file\":\"\"},\"AllowCleartextPasswords\":false,\"SortByPk\":true,\"NoViews\":true,\"NoHeader\":false,\"NoSchemas\":false,\"NoData\":false,\"CompleteInsert\":false,\"TransactionalConsistency\":true,\"EscapeBackslash\":true,\"DumpEmptyDatabase\":true,\"PosAfterConnect\":false,\"CompressType\":0,\"Host\":\"172.16.43.159\",\"Port\":3306,\"Threads\":3,\"User\":\"root\",\"Security\":{\"CAPath\":\"\",\"CertPath\":\"\",\"KeyPath\":\"\"},\"LogLevel\":\"info\",\"LogFile\":\"\",\"LogFormat\":\"text\",\"OutputDirPath\":\"/data/backup/mysql/20220713/\",\"StatusAddr\":\":8281\",\"Snapshot\":\"\",\"Consistency\":\"flush\",\"CsvNullValue\":\"\\\\N\",\"SQL\":\"\",\"CsvSeparator\":\",\",\"CsvDelimiter\":\"\\\"\",\"Databases\":[\"test\"],\"Where\":\"\",\"FileType\":\"sql\",\"ServerInfo\":{\"HasTiKV\":false,\"ServerType\":1,\"ServerVersion\":\"5.7.15-log\"},\"Rows\":0,\"ReadTimeout\":900000000000,\"TiDBMemQuotaQuery\":0,\"FileSize\":268435456,\"StatementSize\":1000000,\"SessionParams\":{},\"Tables\":null}"]
[2022/07/13 14:40:06.455 +08:00] [INFO] [dump.go:191] ["All the dumping transactions have started. Start to unlock tables"]
[2022/07/13 14:40:06.468 +08:00] [INFO] [collector.go:232] ["backup success summary"] [total-ranges=13] [ranges-succeed=13] [ranges-failed=0] [total-take=12.212346ms] [total-rows=12] [total-kv-size=1.185kB] [average-speed=97.54kB/s]
[2022/07/13 14:40:06.468 +08:00] [INFO] [main.go:81] ["dump data successfully, dumpling will exit now"]
运行成功之后就可以看到mysql 的test 数据库的数据已经导出到/data/backup/mysql/20220713/ 目录下了
把mysql 非系统数据库之外的数据库全部导出
[tidb@cdh05 tidb-community-server-v5.2.0-linux-amd64]$ sudo /home/tidb/.tiup/components/dumpling/v5.2.0/dumpling -h 172.16.43.159 -P 3306 -u root -p root@123 -t 3 -F 256MB -o /data/backup/mysql/20220713/
3.5.2 将数据导出为CSV文件
[tidb@cdh05 20220713]$ sudo /home/tidb/.tiup/components/dumpling/v5.2.0/dumpling -h 172.16.43.159 -P 3306 -u root -p root@123 -t 3 -F 256MB -B test --filetype csv --sql 'select * from test.users where id < 10' -o /data/backup/mysql/20220714
Release version: v5.2.0
Git commit hash: 4e012e54ce89fab9551e7ff58ebcffe57590244a
Git branch: heads/refs/tags/v5.2.0
Build timestamp: 2021-08-23 02:22:44Z
Go version: go version go1.16.4 linux/amd64
[2022/07/13 15:16:32.178 +08:00] [INFO] [versions.go:55] ["Welcome to dumpling"] ["Release Version"=v5.2.0] ["Git Commit Hash"=4e012e54ce89fab9551e7ff58ebcffe57590244a] ["Git Branch"=heads/refs/tags/v5.2.0] ["Build timestamp"="2021-08-23 02:22:44"] ["Go Version"="go version go1.16.4 linux/amd64"]
[2022/07/13 15:16:32.179 +08:00] [INFO] [config.go:608] ["detect server type"] [type=MySQL]
[2022/07/13 15:16:32.179 +08:00] [INFO] [config.go:627] ["detect server version"] [version=5.7.15-log]
[2022/07/13 15:16:32.180 +08:00] [INFO] [dump.go:93] ["begin to run Dump"] [conf="{\"s3\":{\"endpoint\":\"\",\"region\":\"\",\"storage-class\":\"\",\"sse\":\"\",\"sse-kms-key-id\":\"\",\"acl\":\"\",\"access-key\":\"\",\"secret-access-key\":\"\",\"provider\":\"\",\"force-path-style\":true,\"use-accelerate-endpoint\":false},\"gcs\":{\"endpoint\":\"\",\"storage-class\":\"\",\"predefined-acl\":\"\",\"credentials-file\":\"\"},\"AllowCleartextPasswords\":false,\"SortByPk\":true,\"NoViews\":true,\"NoHeader\":false,\"NoSchemas\":false,\"NoData\":false,\"CompleteInsert\":false,\"TransactionalConsistency\":true,\"EscapeBackslash\":true,\"DumpEmptyDatabase\":true,\"PosAfterConnect\":false,\"CompressType\":0,\"Host\":\"172.16.43.159\",\"Port\":3306,\"Threads\":3,\"User\":\"root\",\"Security\":{\"CAPath\":\"\",\"CertPath\":\"\",\"KeyPath\":\"\"},\"LogLevel\":\"info\",\"LogFile\":\"\",\"LogFormat\":\"text\",\"OutputDirPath\":\"/data/backup/mysql/20220714\",\"StatusAddr\":\":8281\",\"Snapshot\":\"\",\"Consistency\":\"flush\",\"CsvNullValue\":\"\\\\N\",\"SQL\":\"select * from test.users where id \\u003c 10\",\"CsvSeparator\":\",\",\"CsvDelimiter\":\"\\\"\",\"Databases\":[\"test\"],\"Where\":\"\",\"FileType\":\"csv\",\"ServerInfo\":{\"HasTiKV\":false,\"ServerType\":1,\"ServerVersion\":\"5.7.15-log\"},\"Rows\":0,\"ReadTimeout\":900000000000,\"TiDBMemQuotaQuery\":0,\"FileSize\":268435456,\"StatementSize\":1000000,\"SessionParams\":{},\"Tables\":null}"]
[2022/07/13 15:16:32.187 +08:00] [INFO] [dump.go:191] ["All the dumping transactions have started. Start to unlock tables"]
[2022/07/13 15:16:32.189 +08:00] [INFO] [collector.go:232] ["backup success summary"] [total-ranges=1] [ranges-succeed=1] [ranges-failed=0] [total-take=1.951148ms] [total-rows=4] [total-kv-size=256B] [average-speed=133kB/s]
[2022/07/13 15:16:32.190 +08:00] [INFO] [main.go:81] ["dump data successfully, dumpling will exit now"]
[tidb@cdh05 20220713]$ cd ..
[tidb@cdh05 mysql]$ cd 20220714
[tidb@cdh05 20220714]$ ll
total 8
-rw-r--r-- 1 root root 142 Jul 13 15:16 metadata
-rw-r--r-- 1 root root 256 Jul 13 15:16 result.000000000.csv
4. 使用TiDB Lightning导入数据
4.1TiDB Lightning工作流程
在导入数据前自动将TiKV集群切换为导入模式 (import mode),优化写入效率并停止自动压缩,集群将无法正常对外提供服务。若tidb-lightning 崩溃,集群会留在导入模式。此时需要使用tidb-lightning-ctl手动将集群转回普通模式
bin/tidb-lightning-ctl --switch-mode=normal
在目标数据库建立架构和表,并获取其元数据
· 每张表会被分割为多个连续的区块,这样对于大表 (200 GB+)数据就可以用增量方式并行导入
· tidb-lightning会为每一个区块准备一个引擎文件 (engine file)来处理键值对。tidb-lightning并发读取SQL dump并将数据源转换成TiDB相同编码的键值对,然后将这些键值对排序写入本地临时存储文件
· 当一个引擎文件数据写入完毕后会对目标TiKV集群数据进行分裂和调度,然后导入数据到TiKV集群
· 整张表相关联的所有引擎文件完成导入后,进行校验确保导入数据无损,然后TiDB 分析 (ANALYZE) 新增数据以优化日后的操作。同时tidb-lightning 调整 AUTO_INCREMENT 值防止之后新增数据时发生冲突(表的自增 ID 是通过行数的上界估计值得到的,与表的数据文件总大小成正比。如果最后自增 ID比实际行数大属于正常现象,因为在 TiDB 中自增 ID 不一定是连续分配的)。
· tidb-lightning 自动将 TiKV 切换回普通模式(normal mode),TiDB 集群可以正常对外提供服务
4.2 安装TiDB Lightning
在工具下载页面(https://docs.pingcap.com/zh/tidb/stable/download-ecosystem-tools)下载 TiDB Lightning 安装包。TiDB Lightning 完全兼容较低版本的 TiDB 集群,建议选择最新稳定版本。如果后期需要升级的话可以直接替换二进制文件,无需其他配置。
4.3 TiDB Lightning使用方法
· 创建tidb-lightning.toml配置文件,用于指定数据源文件和还原路径等信息,下面文件内容是Local-backend最基本配置,详细都可以参考https://docs.pingcap.com/zh/tidb/stable/tidb-lightning-configuration
[tidb@cdh05 tidb-community-toolkit-v5.2.0-linux-amd64]$ vi ../tidb-lightning.toml
[lightning]
# 转换数据的并发数,默认为逻辑 CPU 数量,不需要配置。
# # # 混合部署的情况下可以配置为逻辑 CPU 的 75% 大小。
# # # region-concurrency =
level = "info"
file = "/data/tidb/log/tidb-lightning.log"
[tikv-importer]
# # # backend 设置为 local 模式
backend = "local"
#本地临时存储路径
sorted-kv-dir = "/data/backup/tidb_temp/"
#备份文件路径
[mydumper]
data-source-dir = "/data/backup/mysql/20220713/"
[tidb]
host = "172.16.43.159"
port = 4000
user = "root"
password = ""
# 表架构信息在从 TiDB 的“状态端口”获取。
status-port = 10080
# # 集群 pd 的地址
pd-addr = "172.16.43.159:2379"
配置合适的参数运行 tidb-lightning。如果直接在命令行中用 nohup 启动程序,可能会因为 SIGHUP 信号而退出,建议把 nohup 放到脚本里面,如:
[tidb@cdh05 tidb-community-toolkit-v5.2.0-linux-amd64]$ sudo bin/tidb-lightning -check-requirements=false -config ../tidb-lightning.toml > nohup.out &
4.4 使用navicat 连接TiDB,查看test库数据
看到表及表数据已经同步到TiDB了
5. 使用FineBI连接TiDB
进入FineBI决策系统,【系统管理】-> 【数据库连接管理】-> 【新建数据连接】-> 【所有】 -搜索TiDB
点击【测试连接】,提示连接成功
创建SQL数据集,查询tidb数据,右边可以预览数据
创建了数据集之后,就可以使用FineBI的仪表盘等功能进行数据的图形渲染,这里不再描述,请参考FineBI官方文档https://help.fanruan.com/finebi/。
更多TiDB 学习参考:https://www.wenjiangs.com/doc/tidb-get-started-with-tidb-lightning
https://docs.pingcap.com/zh/tidb/stable/dumpling-overview