分布式处理与大数据平台

1. Hadoop

    Hadoop是Apache软件基金会旗下的一个开源分布式计算平台

    由三个核心子系统组成: HDFS, YARN ,MapReduce其中HDFS是一套分布式文件系统;YARN是资源管理系统,MapReduce是运行在YARN上的应用,负责分布式处理管理。

HDFS:一个高度容错性的分布式文件系统,适合部署在大量廉价的机器上,提供高吞吐量的数据访问。

YARN(Yet Another Resource Negotiator):资源管理器,可为上层应用提供统一的资源管理和调度,兼容多计算框架。

MapReduce: 是一种分布式编程模型,把对大规模数据集的处理分发(Map)给网络上的多个节点 ,之后收集处理结果进行规约 Reduce

    包括HBase(列数据库),Cassandra(分布式数据库),Hive(支持SQL语句),Pig(流处理引擎),Zookeeper(分布式应用协调服务)

1,1基于官方镜像

docker pull sequenceiq/hadoop-docker
docker run -it sequenceiq/hadoop-docker /etc/bootstrap.sh -bash
[root@kubernetes /data/docker/elasticsearch]# docker run -it sequenceiq/hadoop-docker /etc/bootstrap.sh -bash
/
Starting sshd:                                             [  OK  ]
Starting namenodes on [9e6e76143d3d]
9e6e76143d3d: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-9e6e76143d3d.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-9e6e76143d3d.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-9e6e76143d3d.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-9e6e76143d3d.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-9e6e76143d3d.out

bash-4.1# cat /usr/local/hadoop/logs/hadoop-root-namenode-9e6e76143d3d.out 
ulimit -a for user root
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 6945
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1048576
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
cd $HADOOP_PREFIX

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep input output 'dfs[a-z.]+'

bin/hdfs   dfs -cat output/*

 

2,Storm

    试试计算框架,Storm及群中有两种节点: 主节点和工作节点,主节点运行一个叫 "Nimbud"的守护进程(daemon),与Hadoop 的“任务跟踪器” (Jobtracker )类似 .Nimbus 负责向集群中分发代 码,向各机器分配任务,以及监测故障 工作节点运行“Supervisor ,守护进程 ,负责监听Nimbus 指派到机器的任务,根据指派信息来管理工作者进程( worker process ),每一个工作 者进程执行 topology 的任务子集。

2.1 使用compose搭建Storm集群

包含如下容器

zoo keeper: Apache Zookeeper 三节点部署

nimbus: Storm Nimbus;

ui: Storm UI;

supervisor: Storm Supervisor (一个或多个)

topology: Topology 部署工具,其中示例应用基于官方示例 storm-starter 代码构建

2.2 下载代码

git clone https://github.com/denverdino/docker-storm.git

2.3 docker-compose.yml描述典型的Storm应用架构

version: '2'
services:
  zookeeper1:
    image: registry.aliyuncs.com/denverdino/zookeeper:3.4.8
    container_name: zk1.cloud
    environment:
      - SERVER_ID=1
      - ADDITIONAL_ZOOKEEPER_1=server.1=0.0.0.0:2888:3888
      - ADDITIONAL_ZOOKEEPER_2=server.2=zk2.cloud:2888:3888 
      - ADDITIONAL_ZOOKEEPER_3=server.3=zk3.cloud:2888:3888
  zookeeper2:
    image: registry.aliyuncs.com/denverdino/zookeeper:3.4.8
    container_name: zk2.cloud
    environment:
      - SERVER_ID=2
      - ADDITIONAL_ZOOKEEPER_1=server.1=zk1.cloud:2888:3888
      - ADDITIONAL_ZOOKEEPER_2=server.2=0.0.0.0:2888:3888 
      - ADDITIONAL_ZOOKEEPER_3=server.3=zk3.cloud:2888:3888
  zookeeper3:
    image: registry.aliyuncs.com/denverdino/zookeeper:3.4.8
    container_name: zk3.cloud
    environment:
      - SERVER_ID=3
      - ADDITIONAL_ZOOKEEPER_1=server.1=zk1.cloud:2888:3888
      - ADDITIONAL_ZOOKEEPER_2=server.2=zk2.cloud:2888:3888 
      - ADDITIONAL_ZOOKEEPER_3=server.3=0.0.0.0:2888:3888
  ui:
    image: registry.aliyuncs.com/denverdino/baqend-storm:1.0.0
    command: ui -c nimbus.host=nimbus
    environment:
      - STORM_ZOOKEEPER_SERVERS=zk1.cloud,zk2.cloud,zk3.cloud
    restart: always
    container_name: ui
    ports:
      - 8080:8080
    depends_on:
      - nimbus
  nimbus:
    image: registry.aliyuncs.com/denverdino/baqend-storm:1.0.0
    command: nimbus -c nimbus.host=nimbus
    restart: always
    environment:
      - STORM_ZOOKEEPER_SERVERS=zk1.cloud,zk2.cloud,zk3.cloud
    container_name: nimbus
    ports:
      - 6627:6627
  supervisor:
    image: registry.aliyuncs.com/denverdino/baqend-storm:1.0.0
    command: supervisor -c nimbus.host=nimbus -c supervisor.slots.ports=[6700,6701,6702,6703]
    restart: always
    environment:
      - affinity:role!=supervisor
      - STORM_ZOOKEEPER_SERVERS=zk1.cloud,zk2.cloud,zk3.cloud
    depends_on:
      - nimbus
  topology:
    build: ../storm-starter
    command: -c nimbus.host=nimbus jar /topology.jar org.apache.storm.starter.RollingTopWords production-topology remote
    depends_on:
      - nimbus
networks:
  default:
    external: 
      name: test-storm

2.4 构建测试镜像

docker-compose build

2.5 部署

docker-compose up -d

2.6 部署完成检查

docker-compose ps 

2.7 伸缩实例 伸缩到3个

docker-compose scale supervisor=3

2.8 确认是否正常运行

 docker-compose  start redis

 

3. Elasticsearch

     Elasticsearch 支持实时分布式数据存储和分析查询功能,可以轻松扩展到上百台服务器,同时支持处理 PB 级结构化或非结构化数据
3.1 基于官方镜像

docker run -d elasticsearch:7.6.1
docker run -d elasticsearch:7.6.1 elasticsearch -Des.node.name="TestNode"

3.2 使用自定义配置

docker run -d -v "$PWD/config":/usr/share/elasticsearch/config elasticsearch:7.6.1

3.3 数据持久化,需要使用数据卷

docker run -d -v "$PWD/esdata":/usr/share/elasticsearch/data elasticsearch:7.6.1

3.4 使用docker-compose搭建elasticsearch

version: '3.1'
services:
elasticsearch:
    image: elasticsearch
kibana:
    image: kibana
    ports:
        - 5601:5601

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值