准实时索引搭建canal

准实时索引搭建canal
canal 是阿里的一款中间件,source 为 mysql,target 为其他存储,阿里的 canal 借助于 mysql 主备同步的机制,伪装成 mysql 的一个备库,去感知 mysql 当中的 binlog 二进制信息的变化,同时同步出来一个结构化的数据交给 target 消费端进行信息模型的转换,可以将 mysql 中变化的数据通过管道存储到其他的存储中。
下载解压

下载阿里巴巴的canal组件,下载地址canal,之后上传到集群节点的/opt/software目录中,然后将其解压到/opt/apps目录下:

# 这里我为了方便期间,下载了以下四个内容,如果只需要使用 canal,下载 adapter 和 deployer 即可
# 解压,解压前需要先创建 /opt/apps/adapter-1.1.4、/opt/apps/admin-1.1.4、/opt/apps/deployer-1.1.4 和 example-1.1.4 目录
[yangqi@yankee software]$ tar -zvxf canal.adapter-1.1.4.tar.gz -C ../apps/adapter-1.1.4
[yangqi@yankee software]$ tar -zvxf canal.adapter-1.1.4.tar.gz -C ../apps/admin-1.1.4
[yangqi@yankee software]$ tar -zvxf canal.adapter-1.1.4.tar.gz -C ../apps/deployer-1.1.4
[yangqi@yankee software]$ tar -zvxf canal.adapter-1.1.4.tar.gz -C ../apps/example-1.1.4
配置mysql

配置mysql 开启主从:

# 因为 mysql 默认是没有开启主从的,所以需要先修改 mysql 为 master 节点
# linux 如果使用 rpm 安装的,那么 my.cnf 配置文件一般在 /etc/my.cnf
[yangqi@yankee software]$ sudo vi /etc/my.cnf
# 在最后加上以下内容:
=====================================================================
server-id=1
binlog_format=ROW
log_bin=mysql_bin
=====================================================================
# 配置完成之后,重启 mysqld 服务
[yangqi@yankee software]$ sudo systemctl restart mysqld

# 查看是否配置成功
# 连接 mysql
[yangqi@yankee software]$ mysql -u root -pxiaoer
mysql> show variables like 'log_bin';
# 出现以下内容则表示已经配置好了该节点的 mysql 开启了 bin_log
=====================================================================
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| log_bin       | ON    |
+---------------+-------+
1 row in set (0.07 sec)
=====================================================================

# 一般情况下不会将 root 账户交给主从使用,所以需要新建一个账户,我已经新建过了,所以就不再进行新建
# 但是需要进行授权,授予 select、replication slave 和 replication client 权限
mysql> grant select,replication slave,replication client on *.* to 'yangqi'@'%' identified by 'xiaoer';
# 出现以下内容则表示已经成功
=====================================================================
Query OK, 0 rows affected, 1 warning (0.05 sec)
=====================================================================
# 还需要给 yangqi 账户 localhost 连接授予权限
# 但是需要进行授权,授予 select、replication slave 和 replication client 权限
mysql> grant select,replication slave,replication client on *.* to 'yangqi'@'localhost' identified by 'xiaoer';
# 出现以下内容则表示已经成功
=====================================================================
Query OK, 0 rows affected, 1 warning (0.05 sec)
=====================================================================
# 刷新权限
mysql> flush privileges;
配置canal管道
[yangqi@yankee apps]$ cd deployer-1.1.4/conf/example
# 编辑 instance.properties 文件
[yangqi@yankee example]$ vi instance.properties
# 修改以下内容
=====================================================================
## mysql serverId , v1.0.26+ will autoGen
canal.instance.mysql.slaveId=2

# username/password
canal.instance.dbUsername=yangqi
canal.instance.dbPassword=xiaoer
=====================================================================
# 启动 deployer
[yangqi@yankee example]$ cd ../../
[yangqi@yankee deployer-1.1.4]$ bin/startup.sh
# 查看 deployer 是否启动
[yangqi@yankee deployer-1.1.4]$ ps -ef | grep canal
# 或者查看端口 11111 是否被占用
=====================================================================
[yangqi@yankee deployer-1.1.4]$ netstat -ntulp | grep 11111
# 出现如下信息则表示启动成功
tcp        0      0 0.0.0.0:11111           0.0.0.0:*               LISTEN      45531/java  
=====================================================================
启动错误

有时候因为内存的关系可能没有启动成功,可以查看日志文件logs/canal/canal_stdout.log文件,如果报错类似如下:

# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 1073741824 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /opt/apps/deployer-1.1.4/bin/hs_err_pid45386.log
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=96m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: UseCMSCompactAtFullCollection is deprecated and will likely be removed in a future release.
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000700000000, 1073741824, 0) failed; error='Cannot allocate memory' (errno=12)

可能是由于内存不足引起的错误,可以修改startup.sh中的如下参数:

# 可以根据自己的机器适当调整 -Xms -Xmx -Xmn 参数
if [ -n "$str" ]; then
        JAVA_OPTS="-server -Xms256m -Xmx256m -Xmn256m -XX:SurvivorRatio=2 -XX:PermSize=96m -XX:MaxPermSize=256m -Xss256k -XX:-UseAdaptiveSizePolicy -XX:MaxTenuringThreshold=15 -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError"
else
        JAVA_OPTS="-server -Xms256m -Xmx256m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:MaxPermSize=128m "
fi
配置canal adapter
修改canal-adapter模块源码
# 由于 adapter 不兼容 elasticsearch-7.3.0,我们将源码包下载到本地进行重新编译,下载时一定要注意选择对应的版本
# 修改 canal-adapter 模块中的 pom.xml 文件中的四个 elasticsearch 相关的依赖包版本为 7.3.0
# 进入命令行,进入源码的根目录下,我的是 canal-canal-1.1.4 目录,执行
mvn clean package -DskipTests

# 第一次执行可能会报如下错误
=====================================================================
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.7.0:compile (default-compile) on project client-adapter.elasticsearch: Compilation failure
[ERROR] /E:/code/JavaEE/github/canal-canal-1.1.4/client-adapter/elasticsearch/src/main/java/com/alibaba/otter/canal/client/adapter/es/ESAdapter.java:[223,56] 不兼容的类型: org.apache.lucene.search.TotalHits无法转换为long
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :client-adapter.elasticsearch
=====================================================================
# 修改 ESAdapter 类中的第 233 行,修改为如下
=====================================================================
long rowCount = response.getHits().getTotalHits().value;
=====================================================================

# 第二次执行可能会报如下错误
=====================================================================
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.7.0:compile (default-compile) on project client-adapter.elasticsearch: Compilation failure
[ERROR] /E:/code/JavaEE/github/canal-canal-1.1.4/client-adapter/elasticsearch/src/main/java/com/alibaba/otter/canal/client/adapter/es/support/ESConnection.java:[420,47] 无法将类 org.elasticsearch.client.RestHighLevelClient中的方法 bulk应用 到给定类型;
[ERROR]   需要: org.elasticsearch.action.bulk.BulkRequest,org.elasticsearch.client.RequestOptions
[ERROR]   找到: org.elasticsearch.action.bulk.BulkRequest
[ERROR]   原因: 实际参数列表和形式参数列表长度不同
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :client-adapter.elasticsearch
=====================================================================
# 修改 ESConnection 类中的第 420 行,修改为如下
=====================================================================
return restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
=====================================================================

# 等打包完成后,进入 client-adapter/launcher/target/ 目录中,将新编译好的 canal-adapter 上传到集群节点的 apps 目录中,删掉之前解压的 adapter-1.1.4
配置adapter-1.1.4
# 修改 canal-adapter 目录名为 adapter-1.1.4
[yangqi@yankee apps]$ mv canal-adapter adapter-1.1.4
# 修改 adapter-1.1.4 相关配置
[yangqi@yankee adapter-1.1.4]$ vi ./conf/application.yml
# 修改为以下内容
=====================================================================
srcDataSources:
    defaultDS:
      url: jdbc:mysql://127.0.0.1:3306/recommendedsystem?useUnicode=true
      username: yangqi
      password: xiaoer
      
- name: es
	    # 自己 es 集群的地址
        hosts: 192.168.21.89:9300 
        properties:
          mode: transport 
          # security.auth: test:123456 #  only used for rest mode
          # 自己 es 集群的名字
          cluster.name: Yankee
=====================================================================

# 修改 es 相关配置,在 adapter-1.1.4/conf/es 目录下新建 shop.yml
[yangqi@yankee adapter-1.1.4]$ vi ./conf/es/shop.yml
# 写入以下内容
=====================================================================
dataSourceKey: defaultDS
destination: example
groupId: 
esMapping: 
    _index: shop
    _type: _doc
    _id: id
    upsert: true
    sql: "select a.id, a.name, a.tags, concat(a.latitude, ',', a.longitude) as location, a.remark_score, a.price_per_man, a.category_id, b.name as category_name, a.seller_id, c.remark_score as seller_remark_score, c.disabled_flag as seller_disabled_flag from shop a inner join category b on a.category_id = b.id inner join seller c on c.id = a.seller_id"
    commitBash: 3000
=====================================================================
启动adapter
# 由于 adapter-1.1.4 是新编译生成的,所以要给 bin/startup.sh 和 bin/stop.sh 授予可执行权限
[yangqi@yankee adapter-1.1.4]$ chmod 764 bin/startup.sh
[yangqi@yankee adapter-1.1.4]$ chmod 764 bin/stop.sh

# 启动 adapter
[yangqi@yankee adapter-1.1.4]$ bin/startup.sh
启动错误

有时候因为内存的关系可能没有启动成功,可以查看日志文件bin/hs_err_pid48030.log文件,如果报错类似如下:

Memory: 4k page, physical 1863104k(71356k free), swap 4001788k(578132k free)

可能是由于内存不足引起的错误,可以修改startup.sh中的如下参数:

# 可以根据自己的机器适当调整 -Xms -Xmx -Xmn 参数
if [ -n "$str" ]; then
        JAVA_OPTS="-server -Xms256m -Xmx256m -Xmn256m -XX:SurvivorRatio=2 -XX:PermSize=96m -XX:MaxPermSize=256m -Xss256k -XX:-UseAdaptiveSizePolicy -XX:MaxTenuringThreshold=15 -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError"
else
        JAVA_OPTS="-server -Xms256m -Xmx256m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:MaxPermSize=128m "
fi
启动
# 启动 adapter
[yangqi@yankee adapter-1.1.4]$ bin/startup.sh
# 查看 adapter 是否启动
[yangqi@yankee adapter-1.1.4]$ ps -ef | grep canal
# 或者查看端口 11111 是否被占用
=====================================================================
[yangqi@yankee adapter-1.1.4]$ netstat -en | grep 11111
# 出现如下信息则表示启动成功
tcp        0      0 127.0.0.1:44766         127.0.0.1:11111         ESTABLISHED 1000       5460274 
tcp        0      0 127.0.0.1:11111         127.0.0.1:44766         ESTABLISHED 1000       5459446
=====================================================================
测试canal
# 继续监视 adapter-1.1.4/logs/adapter/adapter.log
[yangqi@yankee adapter-1.1.4]$ tail -f logs/adapter/adapter.log

# 修改 mysql 数组库中的内容,可以看到 adapter.log 日志近乎同时打印出来了所修改的内容

在这里插入图片描述

查看adapter.log日志,看到以下报错信息:

在这里插入图片描述

修改aplication.yml文件,删除掉以下内容:

# 删除掉 es 模块中的以下内容
mode: transport

重新启动 adapater测试,观察adapter.log文件内容:

在这里插入图片描述

构建方式
canal 在发现 mysql 中的数据发生了变化之后,会进行准实时的更新,在更新时,canal 会检测是哪一个 id 发生了改变,从而去更新某一个被修改的 id 的内容,但是在修改时并不是很智能,加入我们修改的是 name 字段,那么它只会修改 id 为 1 的 name 字段的值,比如同时存在两个不同的 name,那么此时 canal 会将这两个 name 同时进行修改,并且修改为刚才在数据库中所修改的内容。
所以直接使用 adapter 进行构建明显不能满足比较复杂的情况。
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 4
    评论
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Yanko24

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值