Alluxio on Amazon EMR 集成实践

ec3fe263a477bab1eb084bcba65670a3.jpeg

背景

Alluxio 是大数据技术堆栈的分布式缓存,对于 S3, hdfs 等数据的 warm up 有显著的性能提升,且与上层计算引擎如 Hive, spark, Trino 都有深度的集成,做为大数据领域的查询加速是一个不可多得的功能组件。

Alluxio 社区与 Amazon EMR 服务有深入的交互和集成,官方提供了 on Amazon EMR 的集成方案,详见 Alluxio 社区文档,亚马逊云科技也提供了快速安装部署的 bootstrap 脚本及配置,详见亚马逊云科技官方 blog

以上文档基于 emr 5.2x 版本,其 hive,Spark 等组件版本较老,且没有考虑 EMR 的多主,Task 计算实例组的集成场景,在客户使用高版本 Amazon EMR,启用 HA 及 Task 计算实例的时候,其安装部署存在缺陷导致部署失败。

本文档从 Alluxio 整体架构作为切入点,详细介绍了 Alluxio 的设计思路,使得读者能更深入的理解在 Amazon EMR 上的集成方法,同时重新梳理并修正了 Alluxio 社区 on Amazon EMR 集成的方案的缺陷,新增加了对 EMR task 实例组及多主高可用集群的支持,使得 Alluxio 在 Amazon EMR 上更能适应客户的生产环境。

Alluxio 社区文档:

https://docs.alluxio.io/os/user/stable/en/cloud/AWS-EMR.html

亚马逊云科技官方 blog:

https://aws.amazon.com/cn/blogs/china/five-minitues-to-use-alluxio-guide-emr-spark/

Alluxio architecture overview

4dbe72c6dc044b0897b329d964cfe17e.png

主要功能组件有:

Master 节点: 类似 NN 的设计,同样有 standby Master(HA)和 secondary Master(元数据镜像合并)概念, Jounary 日志节点随 master 启动,做为快速 recovery

2a564f6397058be4fe4966f7244d5d05.png

Worker 节点:与 DataNode 类似,缓存层提供 Tier Storage(MEM, SSD, HDD 三层),短路读和常规缓存穿透,3 种写缓存模式(内存 only, cache_through 可以同步和异步,throught 不写缓存)

0cc36a9755b73ffb60547567b4c1f886.png

Job master & Job worker:缓存数据的读写,alluxio 提供了类似 hadoop MR 的框架,由 job master 负责资源分配, job worker 执行数据的 pipeline 管道,缓存副本默认为1

Alluxio 的主要业务场景有:

  • hdfs/S3 缓存,查询加速

  • 多对象存储统一 UFS 路径

  • 跨 bucket, hdfs 集群数据缓存

主要功能 feature:

  • 针对 hdfs, s3 多 layer 的 backend 存储

  • 缓存读写,写支持 cache through 模式,异步更新 backend storage;读支持 push 下压,缓存击穿后直接读 backend storage

  • ttl 缓存过期时间配置

e.g:
alluxio.user.file.create.ttl = -1
alluxio.user.file.create.ttl.action = DELETE
alluxio.master.ttl.checker.interval = 1hour

*左滑查看更多

  • Impersonal/Acl/SASL HDFS 类似的权限管控功能同样适用于 Alluxio

  • 缓存同步与清理

e.g: 
缓存清理:Alluxio rm -r -U alluxio:///<path>
缓存同步:alluxio load alluxio:///<path>

*左滑查看更多

Alluxio on Amazon EMR 集成

集成架构

Alluxio 在 Amazon EMR 上架构如下所示:

8410ced50877acd8b3b2686382643640.png

如上图所示,Alluxio Master 组件作为管理模块,安装部署在 Amazon EMR 主实例组,如果需要 Alluxio HA 高可用,可以通过将 EMR 部署为多主,在 bootstrap 中打开 alluxio  HA(-h) 的 switch 开关,部署脚本会将 Alluxio Master 部署到每个 EMR 主节点实例,并在 S3 注册目录以供 Alluxio 主节点 fail over 时做 Raft 选举。

Alluxio Worker 组件安装部署在 Amazon EMR 的核心及任务实例组,由于 task 实例组客户可能配置扩缩,扩缩 task 计算节点时 Alluxio work 也会相应扩缩,其上面的缓存节点会做 rebalance,造成缓存层性能抖动,因此对于 Task 任务实例组是否安装部署 Alluxio,在 bootstrap 脚本中同样提供了 switch 开关 (-g)。

Alluxio tier storage 配置为 mem layer,UFS backend 配置为 S3 数据湖存储。

相应的 Alluxio job master, job worker 组件,和 master, worker 节点同样的部署方式,分布安装在 EMR 主节点实例组和核心、任务实例组。

集成步骤

以下章节详细介绍 Alluxio 在 Amazon EMR 上集成的实施步骤 

  • alluxio 官网下载社区版tar安装包(本文采用 7.3 )

  • 可以通过 Amazon cli 或者 emr console,指定初始化配置 json 和 bootstrap 方式进行 EMR 上 alluxio 的集成安装和部署

  • Amazon emr cli 方式:

aws emr create-cluster \
--release-label emr-6.5.0 \
–instance-groups '[{"InstanceCount":2,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":64,"VolumeType":"gp2″},"InstanceGroupType":"CORE","InstanceType":"m5.xlarge","Name":"Core-2″}, \
{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":64,"VolumeType":"gp2″},"VolumesPerInstance":2}]},"InstanceGroupType":"MASTER","InstanceType":"m5.xlarge","Name":"Master-1″}]' \
--applications Name=Spark Name=Presto Name=Hive \
--name try-alluxio \
--bootstrap-actions \
Path=s3://xxxxxx.serverless-analytics/alluxiodemobucket/alluxio-emr.sh,\
Args=[s3://xxxxxx.serverless-analytics/alluxiodemobucket/data/,-d,"s3://xxxxxx.serverless-analytics/alluxiodemobucket/install/alluxio-2.7.3-bin.tar.gz",-p,"alluxio.user.block.size.bytes.default=122M|alluxio.user.file.writetype.default=CACHE_THROUGH",-s,"|"] \
--configurations s3://xxxxxx.serverless-analytics/alluxiodemobucket/ \
--ec2-attributes KeyName=ec203.pem

*左滑查看更多

  • emr 控制台上方式:

boostrap 初始化参数
s3://xxxxxx.serverless-analytics/alluxiodemobucket/data/ -d s3://xxxxxx.serverless-analytics/alluxiodemobucket/install/alluxio-2.7.3-bin.tar.gz -p alluxio.user.block.size.bytes.default=122M|alluxio.user.file.writetype.default=CACHE_THROUGH -s |

boostrap初始化参数
s3://xxxxxx.serverless-analytics/alluxiodemobucket/data/ -d s3://xxxxxx.serverless-analytics/alluxiodemobucket/install/alluxio-2.7.3-bin.tar.gz -p alluxio.user.block.size.bytes.default=122M|alluxio.user.file.writetype.default=CACHE_THROUGH -s |

配置文件及boostrap脚本:
s3://xxxxxx.serverless-analytics/alluxiodemobucket/install:安装tar包
s3://xxxxxx.serverless-analytics/alluxiodemobucket/data:测试under store底层存储
s3://xxxxxx.serverless-analytics/alluxiodemobucket/*.sh|*.json : bootstrap脚本及initial 配置

初始化Alluxio json集群配置:
{"Classification":"presto-connector-hive","Properties":{"hive.force-local-scheduling":"true","hive.metastore":"glue","hive.s3-file-system-type":"PRESTO"}},{"Classification":"hadoop-env","Configurations":[{"Classification":"export","Properties":{"HADOOP_CLASSPATH":"/opt/alluxio/client/alluxio-client.jar:${HADOOP_CLASSPATH}"}}],"Properties":{}}

*左滑查看更多

Boostrap 启动脚本说明

  • Bootstrap 主要完成 alluxio 集成步骤,包括解压 alluxio tar 安装包,等待 emr hdfs 等关键组件启动,然后解压并修改 alluxio 配置文件,启动 alluxio 各个组件进程

  • Alluxio 社区官方提供了和 Amazon emr 的集成 boostrap,但只限于 27 版本,高版本(e.g: emr6.5)上组件组件端口会冲突,且没有考虑 task 节点实例类型的扩缩及 HA 等场景,本方案将原有的脚本主要升级和优化如下:  

  • Bootstrap 脚本在 task 节点挂起,因为找不到 DataNode 进程,官方脚本内没有判断 task 实例类型,会一直循环等待

wait_for_hadoop func需要修改,如果是task,不再等待datanode进程,进入下一步骤
  local -r is_task="false"
  grep -i "instanceRole" /mnt/var/lib/info/job-flow-state.txt|grep -i task
  if [ $? = "0" ];then
     is_task="true"
  fi

*左滑查看更多

  • 如果不需要扩展 Task 实例上的 Alluxio worker,需要 boostrap 脚本中指定参数以便识别放过 Task 实例节点的 alluxio 安装部署过程

e)ignore_task_node="true"
        ;;
  if [[ "${ignore_task_node}" = "true" ]]; then
     "don't install alluxio on task node, boostrap exit!"
     exit 0
  fi

*左滑查看更多

  • 默认没有支持 HA 的 bootstrap 脚本,需要在 bootstrap 里面判断多个 master 节点并启动 standby alluxio master

  • 这里采用 embedded JN 日志节点的形式,不占用 EMR 上 Zookeeper 的资源:

  • Alluxio HA 模式下 task 节点需要增加 HA rpc 访问地址列表

if [[ "${ha_mode}" = "true" ]]; then
      namenodes=$(xmllint --xpath "/configuration/property[name='${namenode_prop}']/value/text()" "${ALLUXIO_HOME}/conf/hdfs-site.xml")

      alluxio_journal_addre=""
      alluxio_rpc_addre=""
      for namenode in ${namenodes//,/ }; do
        if [[ "${alluxio_rpc_addre}" != "" ]]; then
          alluxio_rpc_addre=$alluxio_rpc_addre","
          alluxio_journal_addre=$alluxio_journal_addre","
        fi
        alluxio_rpc_addre=$alluxio_rpc_addre"${namenode}:19998"
        alluxio_journal_addre=$alluxio_journal_addre"${namenode}:19200"
      done      
      set_alluxio_property alluxio.master.rpc.addresses $alluxio_rpc_addre
  fi

*左滑查看更多

验证 Alluxio works

EMR 启动后,会自动拉起  Alluxio master, worker 进程,在 Alluxio 的 admin 29999 端口的管理控制台 console 上,可以方便的查看到集群的状态及 capacity 容量、UFS 路径等信息

Alluxio console

cbc6dcfc4fcaf06f20814cbc5c4fd252.png

cf48e18278a506c0d5de9eb6d19c931a.png

计算框架集成

create external table s3_test1 (userid INT,
age INT,
gender CHAR(1),
occupation STRING,
zipcode STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LOCATION 's3://xxxxxx.serverless-analytics/alluxiodemobucket/data/s3_test1'

Hive alluxio读写
0: jdbc:hive2://xx.xx.xx.xx:10000/default> shwo create table alluxiodb.test1;
|                   createtab_stmt                   |
+----------------------------------------------------+
| CREATE EXTERNAL TABLE `alluxiodb.test1`(           |
|   `userid` int,                                    |
|   `age` int,                                       |
|   `gender` char(1),                                |
|   `occupation` string,                             |
|   `zipcode` string)                                |
| ROW FORMAT SERDE                                   |
|   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  |
| WITH SERDEPROPERTIES (                             |
|   'field.delim'='|',                               |
|   'serialization.format'='|')                      |
| STORED AS INPUTFORMAT                              |
|   'org.apache.hadoop.mapred.TextInputFormat'       |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION                                           |
|   'alluxio:/testTable'                             |
| TBLPROPERTIES (                                    |
|   'bucketing_version'='2')                         |
+----------------------------------------------------+
0: jdbc:hive2://xx.xx.xx.xx:10000/default>INSERT INTO alluxiodb.test1 VALUES (2, 24, 'F', 'Developer', '12345');
0: jdbc:hive2://xx.xx.xx.xx:10000/default> select * from test1;
--+
| test1.userid  | test1.age  | test1.gender  | test1.occupation  | test1.zipcode  |
+---------------+------------+---------------+-------------------+----------------+
| 1             | 24         | F             | Developer         | 12345          |
| 4             | 46         | F             | Developer         | 12345          |
| 5             | 56         | A             | Developer         | 12345          |
| 2             | 224        | F             | Developer         | 12345        



Trino alluxio query:
trino:alluxiodb> select * from test1;
 userid | age | gender | occupation | zipcode
--------+-----+--------+------------+---------
      1 |  24 | F      | Developer  | 12345
      2 | 224 | F      | Developer  | 12345


Spark alluxio读写
>>> spark.sql("insert into  alluxiodb.test1 values (3,33,'T','Admin','222222')")
>>> spark.sql("select * from alluxiodb.test1").show(1000,False)                 +------+---+------+----------+-------+
|userid|age|gender|occupation|zipcode|
+------+---+------+----------+-------+
|2     |224|F     |Developer |12345  |
|3     |33 |T     |Admin     |222222 |
|1     |24 |F     |Developer |12345  |
+------+---+------+----------+-------+

*左滑查看更多

benchmark 测试

采用 hive tpcds benchmark utility 生成并 load 测试数据,可以方便的对比通过 s3 路径和 alluxio 缓存路径两种场景下查询性能。

  • alluxio hive benchmarch result:

hive -i testbench_alluxio.settings
hive> use tpcds_bin_partitioned_orc_30;
hive> source query55.sql;
+-----------+------------------------+---------------------+
| brand_id  |         brand          |      ext_price      |
+-----------+------------------------+---------------------+
| 2002002   | importoimporto #2      | 328158.27           |
| 4004002   | edu packedu pack #2    | 278740.06999999995  |
| 2004002   | edu packimporto #2     | 243453.09999999998  |
| 2001002   | amalgimporto #2        | 226828.09000000003  |
| 4003002   | exportiedu pack #2     | 194363.72000000003  |
| 5004002   | edu packscholar #2     | 178895.29000000004  |
| 5003002   | exportischolar #2      | 158463.69           |
| 3003002   | exportiexporti #2      | 126980.51999999999  |
| 4001002   | amalgedu pack #2       | 107703.01000000001  |
| 5002002   | importoscholar #2      | 104491.46000000002  |
| 3002002   | importoexporti #2      | 87758.88            |
| 8010006   | univmaxi #6            | 87110.54999999999   |
| 10004013  | edu packunivamalg #13  | 76879.23            |
| 8008006   | namelessnameless #6    | 74991.82            |
| 6010006   | univbrand #6           | 72163.57            |
| 7006008   | corpbrand #8           | 71066.42            |
| 2003002   | exportiimporto #2      | 69029.02            |
| 6015006   | scholarbrand #6        | 66395.84            |
| 4002002   | importoedu pack #2     | 65223.01999999999   |
| 8013002   | exportimaxi #2         | 63271.69            |
| 9007002   | brandmaxi #2           | 61539.36000000001   |
| 3001001   | edu packscholar #2     | 60449.65            |
| 10003014  | exportiunivamalg #14   | 56505.57000000001   |
| 3001001   | exportiexporti #2      | 55458.64            |
| 7015004   | scholarnameless #4     | 55006.78999999999   |
| 5002001   | exportischolar #2      | 54996.270000000004  |
| 6014008   | edu packbrand #8       | 54793.31999999999   |
| 4003001   | amalgcorp #8           | 53875.51000000001   |
| 8011006   | amalgmaxi #6           | 52845.8             |
| 1002002   | importoamalg #2        | 52328.259999999995  |
| 2003001   | maxinameless #6        | 50577.89            |
| 9016008   | corpunivamalg #8       | 49700.12            |
| 7015006   | scholarnameless #6     | 49592.7             |
| 9005004   | scholarmaxi #4         | 49205.19            |
| 4003001   | exportiimporto #2      | 48604.97            |
| 2002001   | edu packamalg #2       | 48451.979999999996  |
| 9012002   | importounivamalg #2    | 48429.990000000005  |
| 7012004   | importonameless #4     | 48303.979999999996  |
| 10009004  | edu packamalg #2       | 48301.05            |
| 1004001   | amalgexporti #2        | 48215.880000000005  |
| 1001002   | amalgamalg #2          | 47018.94            |
| 9015010   | scholarunivamalg #10   | 46495.380000000005  |
| 6005007   | importobrand #6        | 46233.630000000005  |
| 9010004   | univunivamalg #4       | 46164.04            |
| 8015006   | scholarmaxi #6         | 46143.41            |
| 7016002   | corpnameless #2        | 46133.31            |
| 10006011  | corpunivamalg #11      | 46085.81            |
| 9001003   | importoamalg #2        | 45303.18            |
| 10015011  | scholarnameless #2     | 45299.06            |
| 5002001   | importoexporti #2      | 44757.73000000001   |
| 10010004  | univamalgamalg #4      | 43347.899999999994  |
| 2004001   | importoamalg #2        | 43127.46000000001   |
| 9002011   | edu packcorp #8        | 41740.42            |
| 10008009  | namelessunivamalg #9   | 41369.479999999996  |
| 8002010   | importonameless #10    | 41046.02            |
| 6002008   | importocorp #8         | 40795.42999999999   |
| 7007010   | brandbrand #10         | 40591.95            |
| 6012002   | importobrand #2        | 40545.72            |
| 2003001   | amalgexporti #2        | 39679.76            |
| 8005007   | exportischolar #2      | 39593.39            |
| 9015011   | importoscholar #2      | 39419.41            |
| 9005012   | scholarmaxi #12        | 39151.020000000004  |
| 9016012   | corpunivamalg #12      | 39117.53            |
| 5003001   | exportiexporti #2      | 39061.0             |
| 9002002   | importomaxi #2         | 38763.61            |
| 6010004   | univbrand #4           | 38375.29            |
| 8016009   | edu packamalg #2       | 37759.44            |
| 8003010   | exportinameless #10    | 37605.38            |
| 10010013  | univamalgamalg #13     | 37567.33            |
| 4003001   | importoexporti #2      | 37455.68            |
| 4001001   | importoedu pack #2     | 36809.149999999994  |
| 8006003   | edu packimporto #2     | 36687.04            |
| 6004004   | edu packcorp #4        | 36384.1             |
| 5004001   | scholarbrand #8        | 36258.58            |
| 10006004  | importonameless #10    | 36226.62            |
| 2002001   | scholarbrand #4        | 36138.93            |
| 7001010   | amalgbrand #10         | 35986.36            |
| 8015005   | edu packunivamalg #4   | 35956.33            |
| 10014008  | edu packamalgamalg #8  | 35371.05            |
| 7004005   | amalgamalg #2          | 35265.32            |
| 6016004   | corpbrand #4           | 35256.990000000005  |
| 4002001   | amalgedu pack #2       | 35183.9             |
+-----------+------------------------+---------------------+

*左滑查看更多

  • s3 hive benchmarch result:

hive -i testbench_s3.settings
hive> use tpcds_bin_partitioned_orc_30;
hive> source query55.sql;
+-----------+------------------------+---------------------+
| brand_id  |         brand          |      ext_price      |
+-----------+------------------------+---------------------+
| 4003002   | exportiedu pack #2     | 324254.89           |
| 4004002   | edu packedu pack #2    | 241747.01000000004  |
| 2004002   | edu packimporto #2     | 214636.82999999996  |
| 3003002   | exportiexporti #2      | 158815.92           |
| 2002002   | importoimporto #2      | 126878.37000000002  |
| 2001002   | amalgimporto #2        | 123531.46           |
| 4001002   | amalgedu pack #2       | 114080.09000000003  |
| 5003002   | exportischolar #2      | 103824.98000000001  |
| 5004002   | edu packscholar #2     | 97543.4             |
| 3002002   | importoexporti #2      | 90002.6             |
| 6010006   | univbrand #6           | 72953.48000000001   |
| 6015006   | scholarbrand #6        | 67252.34000000001   |
| 7001010   | amalgbrand #10         | 60368.53            |
| 4002001   | amalgmaxi #12          | 59648.09            |
| 5002002   | importoscholar #2      | 59202.14            |
| 9007008   | brandmaxi #8           | 57989.22            |
| 2003002   | exportiimporto #2      | 57869.27            |
| 1002002   | importoamalg #2        | 57119.29000000001   |
| 3001001   | exportiexporti #2      | 56381.43            |
| 7010004   | univnameless #4        | 55796.41            |
| 4002002   | importoedu pack #2     | 55696.91            |
| 8001010   | amalgnameless #10      | 54025.19            |
| 9016012   | corpunivamalg #12      | 53992.149999999994  |
| 5002001   | exportischolar #2      | 53784.57000000001   |
| 4003001   | amalgcorp #8           | 52727.09            |
| 9001002   | amalgmaxi #2           | 52115.3             |
| 1002001   | amalgnameless #2       | 51994.130000000005  |
| 8003010   | exportinameless #10    | 51100.64            |
| 9003009   | edu packamalg #2       | 50413.2             |
| 10007003  | scholarbrand #6        | 50027.27            |
| 7006008   | corpbrand #8           | 49443.380000000005  |
| 9016010   | corpunivamalg #10      | 49181.66000000001   |
| 9005010   | scholarmaxi #10        | 49019.619999999995  |
| 4001001   | importoedu pack #2     | 47280.47            |
| 4004001   | amalgcorp #2           | 46830.21000000001   |
| 10007011  | brandunivamalg #11     | 46815.659999999996  |
| 9003008   | exportimaxi #8         | 46731.72            |
| 1003001   | amalgnameless #2       | 46250.08            |
| 8010006   | univmaxi #6            | 45460.4             |
| 8013002   | exportimaxi #2         | 44836.49            |
| 5004001   | scholarbrand #8        | 43770.06            |
| 10006011  | corpunivamalg #11      | 43461.3             |
| 2002001   | edu packamalg #2       | 42729.89            |
| 6016001   | importoamalg #2        | 42298.35999999999   |
| 5003001   | univunivamalg #4       | 42290.45            |
| 7004002   | edu packbrand #2       | 42222.060000000005  |
| 6009004   | maxicorp #4            | 42131.72            |
| 2002001   | importoexporti #2      | 41864.04            |
| 8006006   | corpnameless #6        | 41825.83            |
| 10008009  | namelessunivamalg #9   | 40665.31            |
| 4003001   | univbrand #2           | 40330.67            |
| 7016002   | corpnameless #2        | 40026.4             |
| 2004001   | corpmaxi #8            | 38924.82            |
| 7009001   | amalgedu pack #2       | 38711.04            |
| 6013004   | exportibrand #4        | 38703.41            |
| 8002010   | importonameless #10    | 38438.670000000006  |
| 9010004   | univunivamalg #4       | 38294.21            |
| 2004001   | importoimporto #2      | 37814.93            |
| 9010002   | univunivamalg #2       | 37780.55            |
| 3003001   | amalgexporti #2        | 37501.25            |
| 8014006   | edu packmaxi #6        | 35914.21000000001   |
| 8011006   | amalgmaxi #6           | 35302.51            |
| 8013007   | amalgcorp #4           | 34994.01            |
| 7003006   | exportibrand #6        | 34596.55            |
| 6009006   | maxicorp #6            | 44116.12            |
| 8002004   | importonameless #4     | 43876.82000000001   |
| 8001008   | amalgnameless #8       | 43666.869999999995  |
| 7002006   | importobrand #6        | 43574.33            |
| 7013008   | exportinameless #8     | 43497.73            |
| 6014008   | edu packbrand #8       | 43381.46            |
| 10014007  | edu packamalgamalg #7  | 42982.090000000004  |
| 9006004   | corpmaxi #4            | 42437.49            |
| 9016008   | corpunivamalg #8       | 41782.0             |
| 10006015  | amalgamalg #2          | 31716.129999999997  |
| 2003001   | univnameless #4        | 31491.340000000004  |
+-----------+------------------------+----------

*左滑查看更多

可以看到平均任务的 QPS 提升 30%~40% 左右,部分任务提升 50% 以上。

参考资料

Alluxio on Amazon EMR 安装部署 :

https://aws.amazon.com/cn/blogs/china/five-minitues-to-use-alluxio-guide-emr-spark

Alluxio 社区 EMR 集成指南: 

https://docs.alluxio.io/os/user/stable/en/cloud/AWS-EMR.html

Amazon EMR 集群:

https://docs.aws.amazon.com/zh_cn/emr/latest/ManagementGuide/emr-what-is-emr.html

小结

本文详细介绍了在 Amazon EMR 上 alluxio 集群的安装部署,包括 bootstrap 脚本及 EMR 集群初始化 json 配置,并通过 hive tpcds 标准 benchmark,比较了开启 Alluxio 加速的 EMR 集群上 hive sql 查询的性能提升。

本篇作者

e9ef34751a071485bb833afcfaffdb22.jpeg

唐清原

亚马逊云科技数据分析解决方案架构师,负责Amazon Data Analytic服务方案架构设计以及性能优化,迁移,治理等Deep Dive支持。10+数据领域研发及架构设计经验,历任Oracle 高级咨询顾问,咪咕文化数据集市高级架构师,澳新银行数据分析领域架构师职务。在大数据,数据湖,智能湖仓等项目有丰富实战经验。

e104d681bee556c248240c8768fce3c1.jpeg

陈昊

亚马逊云科技合作伙伴解决方案架构师,有将近 20 年的 IT 从业经验,在企业应用开发、架构设计及建设方面具有丰富的实践经验。目前主要负责 亚马逊云科技 (中国)合作伙伴的方案架构咨询和设计工作,致力于亚马逊云服务在国内的应用推广以及帮助合作伙伴构建更高效的亚马逊云服务解决方案。

22ecb9ff0c3df4345a6c73cdcac5acc1.jpeg

419899dc70d24d505b97094b4864124b.gif

c9ea7076bfe073a22ecfdd6a7b8dcd0a.gif

听说,点完下面4个按钮

就不会碰到bug了!

8dd844dc6ed00571ac11134b79476849.gif

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值