1. Hadoop
Hadoop是Apache软件基金会旗下的一个开源分布式计算平台
由三个核心子系统组成: HDFS, YARN ,MapReduce其中HDFS是一套分布式文件系统;YARN是资源管理系统,MapReduce是运行在YARN上的应用,负责分布式处理管理。
HDFS:一个高度容错性的分布式文件系统,适合部署在大量廉价的机器上,提供高吞吐量的数据访问。
YARN(Yet Another Resource Negotiator):资源管理器,可为上层应用提供统一的资源管理和调度,兼容多计算框架。
MapReduce: 是一种分布式编程模型,把对大规模数据集的处理分发(Map)给网络上的多个节点 ,之后收集处理结果进行规约 Reduce
包括HBase(列数据库),Cassandra(分布式数据库),Hive(支持SQL语句),Pig(流处理引擎),Zookeeper(分布式应用协调服务)
1,1基于官方镜像
docker pull sequenceiq/hadoop-docker
docker run -it sequenceiq/hadoop-docker /etc/bootstrap.sh -bash
[root@kubernetes /data/docker/elasticsearch]# docker run -it sequenceiq/hadoop-docker /etc/bootstrap.sh -bash
/
Starting sshd: [ OK ]
Starting namenodes on [9e6e76143d3d]
9e6e76143d3d: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-9e6e76143d3d.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-9e6e76143d3d.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-9e6e76143d3d.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-9e6e76143d3d.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-9e6e76143d3d.out
bash-4.1# cat /usr/local/hadoop/logs/hadoop-root-namenode-9e6e76143d3d.out
ulimit -a for user root
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 6945
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
cd $HADOOP_PREFIX
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep input output 'dfs[a-z.]+'
bin/hdfs dfs -cat output/*
2,Storm
试试计算框架,Storm及群中有两种节点: 主节点和工作节点,主节点运行一个叫 "Nimbud"的守护进程(daemon),与Hadoop 的“任务跟踪器” (Jobtracker )类似 .Nimbus 负责向集群中分发代 码,向各机器分配任务,以及监测故障 工作节点运行“Supervisor ,守护进程 ,负责监听Nimbus 指派到机器的任务,根据指派信息来管理工作者进程( worker process ),每一个工作 者进程执行 topology 的任务子集。
2.1 使用compose搭建Storm集群
包含如下容器
zoo keeper: Apache Zookeeper 三节点部署
nimbus: Storm Nimbus;
ui: Storm UI;
supervisor: Storm Supervisor (一个或多个)
topology: Topology 部署工具,其中示例应用基于官方示例 storm-starter 代码构建
2.2 下载代码
git clone https://github.com/denverdino/docker-storm.git
2.3 docker-compose.yml描述典型的Storm应用架构
version: '2'
services:
zookeeper1:
image: registry.aliyuncs.com/denverdino/zookeeper:3.4.8
container_name: zk1.cloud
environment:
- SERVER_ID=1
- ADDITIONAL_ZOOKEEPER_1=server.1=0.0.0.0:2888:3888
- ADDITIONAL_ZOOKEEPER_2=server.2=zk2.cloud:2888:3888
- ADDITIONAL_ZOOKEEPER_3=server.3=zk3.cloud:2888:3888
zookeeper2:
image: registry.aliyuncs.com/denverdino/zookeeper:3.4.8
container_name: zk2.cloud
environment:
- SERVER_ID=2
- ADDITIONAL_ZOOKEEPER_1=server.1=zk1.cloud:2888:3888
- ADDITIONAL_ZOOKEEPER_2=server.2=0.0.0.0:2888:3888
- ADDITIONAL_ZOOKEEPER_3=server.3=zk3.cloud:2888:3888
zookeeper3:
image: registry.aliyuncs.com/denverdino/zookeeper:3.4.8
container_name: zk3.cloud
environment:
- SERVER_ID=3
- ADDITIONAL_ZOOKEEPER_1=server.1=zk1.cloud:2888:3888
- ADDITIONAL_ZOOKEEPER_2=server.2=zk2.cloud:2888:3888
- ADDITIONAL_ZOOKEEPER_3=server.3=0.0.0.0:2888:3888
ui:
image: registry.aliyuncs.com/denverdino/baqend-storm:1.0.0
command: ui -c nimbus.host=nimbus
environment:
- STORM_ZOOKEEPER_SERVERS=zk1.cloud,zk2.cloud,zk3.cloud
restart: always
container_name: ui
ports:
- 8080:8080
depends_on:
- nimbus
nimbus:
image: registry.aliyuncs.com/denverdino/baqend-storm:1.0.0
command: nimbus -c nimbus.host=nimbus
restart: always
environment:
- STORM_ZOOKEEPER_SERVERS=zk1.cloud,zk2.cloud,zk3.cloud
container_name: nimbus
ports:
- 6627:6627
supervisor:
image: registry.aliyuncs.com/denverdino/baqend-storm:1.0.0
command: supervisor -c nimbus.host=nimbus -c supervisor.slots.ports=[6700,6701,6702,6703]
restart: always
environment:
- affinity:role!=supervisor
- STORM_ZOOKEEPER_SERVERS=zk1.cloud,zk2.cloud,zk3.cloud
depends_on:
- nimbus
topology:
build: ../storm-starter
command: -c nimbus.host=nimbus jar /topology.jar org.apache.storm.starter.RollingTopWords production-topology remote
depends_on:
- nimbus
networks:
default:
external:
name: test-storm
2.4 构建测试镜像
docker-compose build
2.5 部署
docker-compose up -d
2.6 部署完成检查
docker-compose ps
2.7 伸缩实例 伸缩到3个
docker-compose scale supervisor=3
2.8 确认是否正常运行
docker-compose start redis
3. Elasticsearch
Elasticsearch 支持实时分布式数据存储和分析查询功能,可以轻松扩展到上百台服务器,同时支持处理 PB 级结构化或非结构化数据
3.1 基于官方镜像
docker run -d elasticsearch:7.6.1
docker run -d elasticsearch:7.6.1 elasticsearch -Des.node.name="TestNode"
3.2 使用自定义配置
docker run -d -v "$PWD/config":/usr/share/elasticsearch/config elasticsearch:7.6.1
3.3 数据持久化,需要使用数据卷
docker run -d -v "$PWD/esdata":/usr/share/elasticsearch/data elasticsearch:7.6.1
3.4 使用docker-compose搭建elasticsearch
version: '3.1'
services:
elasticsearch:
image: elasticsearch
kibana:
image: kibana
ports:
- 5601:5601