Ambari设计

原文在这里

设计目标

与平台无关

该系统体系结构必须支持任何硬件和操作系统,如RHEL , SLES的,Ubuntu ,Windows等。依赖平台的组件(例如,处理yum , rpm包, Debian软件包的组件,
等)应采用定义良好的接口使其可插拔。


可插拔组件

该架构必须不假定具体的工具和技术。任何具体的工具和技术必须通过可插拔的组件封装。该架构将专注于Puppet及相关部件的可插入性,这些相关部件是provisioning,自选的配置工具,和用于持久化状态的数据库。我们的目标不是要马上支持Puppet的替代,而是未来该架构容易扩展而进行替换。
这个可插入性的目标并不包括的组件间的协议或与第三方组件协作的接口的标准化。


版本管理与升级

各个节点上运行Ambari组件必须支持多个版本的协议,该协议支持组件的独立升级。任何Ambari的组件的升级必须不影响集群状态。


可扩展性

该设计应支持轻松添加新的服务,组件和API。 扩展性也意味着易于为Hadoop栈修改任何配置或provisioning步骤。此外,支撑Hadoop栈而不是HDP的可能性需要被考虑在内。


故障恢复

该系统必须能够从任何元件故障恢复到一致的状态。该系统恢复后应尽量完成挂起操作。如果某些错误是不可恢复的,故障还是应该保持系统处于一致状态。


安全

安全意味着:

1)认证和Ambari用户(包括API和Web UI)的基于角色的授权, 

2 )安装,管理和监控Hadoop的堆栈,通过Kerberos担保,以及

3)认证和加密Ambari组件之间的有线通信(例如, Ambari的master-agent通信) 。


错误跟踪

设计努力简化跟踪失败的过程。故障应该以足够的细节和分析点展现给用户。


操作的准实时反馈

对于需要一段时间才能完成操作,系统需要能够及时(准实时)向用户提供反馈当前正在运行的任务的进展, 操作完成的百分比,操作日志的链接等。在Ambari以前的版本中,这是不可用的,这是由于Puppet的Master-Agent的体系结构和它的状态报告机制。


术语 

Service服务 

服务是指在Hadoop堆栈中的服务。 如HDFS,HBase和Pig。一个服务可能有多个组件(例如,HDFS有NameNode,Secondary NameNode,DataNode上,等等)。一个服务可以只是一个客户端库(例如,Pig没有任何守护进程服务,而只是一个客户端库)。 


Component组件

一个服务包含由一个或多个组件。例如,HDFS有3部分:NameNode,DataNode上和Secondary NameNode。组件可能是可选的。一个组件可以跨越多个节点(例如,DataNode实例在多个节点上)。


Node/Host节点/主机

节点是指集群中的一台机器。在本文中节点和主机可以互换使用。



Node-Component节点组件

节点组件是指在特定节点上的一个组件的实例。例如,在一个特定节点上的特定的DataNode实例是节点组件。


Operation作业

An operation refers to a set of changes or actions performed on a cluster to satisfy 
a user request or to achieve a desirable state change in the cluster. For example, 
starting of a service is an operation and running a smoke test is an operation. If a 
user requests to add a new service to the cluster and that includes running a smoke 
test as well, then the entire set of actions to meet the user request will constitute an 
operation.  An operation can consist of multiple “actions” that are ordered (see 
below).


Task任务

Task is the unit of work that is sent to a node to execute. A task is the work that 
node has to carry out as part of an action.  For example, an “action” can consist of 
installing a datanode on Node n1 and installing a datanode  and a secondary 
namenode on Node n2.  In this case, the “task” for n1 will be to install a datanode 

and the “tasks” for n2 will be to install both a datanode and a secondary namenode.


Stage

Stage是指完成一个作业所需要的一组任务集合,他们是相互独立的,在同一阶段的所有任务可以并行地在不同的节点上运行。


Action

一个“Action”是由一台或一组机器上的一个或多个任务组成的。每个action是由一个动作ID跟踪,并且节点至少在action的粒度报告状态。一个action可以被认为是执行的一个stage。除果没有特别指定,本文中一个stage和一个action是一一对应。一个action ID一一映射request ID ,stage-ID。




Stage Plan阶段计划

一个操作通常包括在多台机器执行的多任务,并且这些任务通常有依赖关系,要求他们按照一个特定的顺序运行。有些任务需要在别的任务被调度前完成。因此,作业所需的任务需要被划分为多个stages,使每个阶段的任务必须在下一stage之前完成,但同一阶段的所有任务可以被安排在不同的节点平行地执行。


Manifest

清单指的是被发送到执行节点一个任务的定义。该清单必须完全定义任务,而且必须是可序列化。清单还可以持久化到磁盘中,用于恢复或记录。


Role角色

角色映射一个组件(例如,的NameNode , DataNode上)或动作(例如, HDFS的rebalancing, HBase的烟雾测试,其他管理命令等)



Ambari 架构

下图描述Ambari的高层体系结构



下图描述Ambari服务器的设计:



下图描述Ambari代理的设计:



Use cases

In this section we cover a few basic use cases and describe how the request are 

served by the system at a high level and how components interact.


1.  Add service:  

Add a new service to an existing cluster.  Let’s take a specific 

example of adding Hbase service to an existing cluster, which is already running 
HDFS.  HBase master and slaves will be added to the subset of existing nodes (no 
additional nodes). It will go through following steps:
• The request lands on the server via API and a request id is generated and 
attached to the request. A handler for this API is invoked in the Coordinator.
• The API Handler implements the steps needed to start a new service to an 
existing cluster. In this case the steps would be: install all the service 
components along with the required prerequisites, start the prerequisites and 
the service components in a specific order, and re-configure Nagios server to 
add new service monitoring as well.
• The Coordinator will lookup in Dependency Tracker and find the prerequisites 
for HBase. Dependency Tracker will return HDFS and ZooKeeper 
components. Coordinator will also lookup dependencies for Nagios server 
and which will return HBase client.  Dependency Tracker will also return the 
required state of the required components. Thus, Coordinator will know the 
entire set of components and their required states.  Coordinator will set the 
desired state for all these components in the DB.• During the previous step, the Coordinator may also determine that it requires 
the user’s input to select nodes for ZooKeeper and may return an appropriate 
response.  This depends on the API semantics.
• The Coordinator will then pass the list of components and their desired states 
to the Stage Planner.  Stage Planner will return the staged sequence of 
operations that need to be performed on each node where these components 
are to be installed/started/modified. The Stage Planner will also generate the 
manifest (tasks for each individual nodes for each stage) using the Manifest 
Generator.
• Coordinator will pass this ordered list of stages to the Action Manager with 
the corresponding request id.
• Action Manager will update the state of each node-component, in the FSM, 
which will reflect that an operation is in progress. Note that the FSM for each 
affected node-component is updated.  In this step, the FSM may detect an 
invalid event and throw failure, which will abort the operation and all actions 
will be marked as failed with an error.
• Action Manager will create an action id for each operation and add it to the 
Plan. The Action Manager will pick first Stage from the plan and adds each 
action in this Stage to the queue for each of the affected nodes. The next 
Stage will be picked when first Stage completes. Action Manager will also 
start a timer for scheduled actions.
• Heartbeat Handler will receive the response for the actions and notify the 
Action Manager.  Action Manager will send an event to the FSM to update the 
state. In case of a timeout, the action will be scheduled again or marked as 
failed. Once all nodes for an action have reached completion (response 
received or final timeout) the action is considered completed. Once all actions 
for a Stage are completed the Stage is considered completed.
• Action completion is also recorded in the database.

• The Action Manager proceeds to the next Stage and repeats.


2.  Run Smoke Test: 

The cluster is already active with HDFS and HBase services 

active, and the user wants to run HBase smoke test.
• The request lands on the server via API and a request id is generated and 
attached to the request. A handler for this API is invoked in the Coordinator.
• The API Handler invokes Dependency Tracker and finds that HBase service 
should be up and running for this. The API Handler throws back error if hbase 
live status is not running. The Handler also determines where the smoke test 
should be run. Dependency Tracker will expose a method which will tell the 
Coordinator which client component is required on the host where smoke test 
should be run. The Stage Planner is invoked to generate a plan for smoke 
test. In this case the plan is simple one stage with single node-component.
• The rest of the steps are similar to previous use case.  In this case the FSM 

will be specifically for hbase-smoketest role.


3.  Reconfigure Service

• The cluster is already active with the services and its dependencies. 
• A request to save the configuration lands on the server and the new config 
snapshot is stored on the server. This request should also have information 
on what service(s) and/or roles-hosts the config change affects. This implies that the persistence layer needs to track on a per service/component/node 
basis as to what was the last config version that was changed that affects the 
object in question. 
• A user can make multiple calls above to save multiple checkpoints. 
• At some point, the user decides to deploy the new configs. In this scenario, 
the user will send a request to deploy the new configs to the required 
service/component/node-component.
• When this request lands on the server, via the coordinator handler, it will 
result in updating of the desired config version of the object in question to the 
version specified or the latest config based on the API specs. 
• The Coordinator will execute re-configure in two steps. First, it will generate a 
stage plan to stop the services in question. Then it will generate a stage plan 
to start the services with new configurations.  Coordinator will append the two 
plans, stop followed by start and will pass it to the Action Manager. Rest of 

the steps proceed as in previous use cases.


4.  Ambari master crashed and restarted

• Assuming the Ambari master dies, there are multiple scenarios that need to 
be addressed.
• Assumptions:
o Desired state has been persisted in the DB
o All pending actions are defined in the database. 
o Live status is tracked in the DB. However, the agent cannot send back 
live status currently so require will be based on DB state. 
o All actions are idempotent. 
• Actions required
o Ambari Master
! The action layer needs to re-queue all pending or previously 
queued ( but incomplete ) stages back to the Heartbeat 
Handler and allow the agents to re-execute the steps in 
question.
! The Ambari server should not accept any new actions/requests 
until it has recovered completely.*
! If there is any discrepancy between actual live state and the 
desired state ( i.e live state is STARTED or STARTING but 
desired state is STOPPED ), there should be a trigger to the 
stage planner/transaction manager to initiate the actions to get 
state to the desired state.  However, if the live state is 
STOP_FAILED and desired_state is STOPPED, then this is a 
no-op for recovery as the Ambari server does not initiate 
actions itself in such scenarios. For the latter case, the onus is 
on the admin/user to re-initiate a stop for the nodes that failed [ 
this may change depending on API specifications and product 
decision ].
o Ambari Agent
! The agent should not die if the master suddenly disappears. It 
should continue to poll at regular intervals and recover as 
needed when the master comes back up.! The Ambari agent should keep all the necessary information it 
planned to send to the master in case of a connection failure 
and re-send the information after the master comes back up.
! It may need to re-register if it was previously in the process of 

registering.


5.  Decommission a subset of Datanodes.

• Decommissioning will be implemented as an action performed on hadoopadmin role in the Puppet layer.  hadoop-admin will be a new role defined to 
cover hadoop admin operations. This role will require hadoop-client to be 
installed and admin privileges available. 
• The manifest for the decommission operation will consist of hadoop-admin 
role with certain parameters like a list of datanodes and a flag specifying the 
action.
• Coordinator will identify a node which is designated to run hadoop-admin 
actions. This information must be available in cluster state. 
• The decommission action will follow the state machine for node-roles that are 
actions. This will be considered successful when the decommissioning has 
been started at the namenode i.e. when the admin command to datanode 
succeeds.
• Information whether nodes have been decommissioned or not should be 
queried separately. This information is available in namenode UI. Ambari 
should have another API to query decommissioning or decommissioned 
datanodes.
• Ambari does not  keep track whether a datanode is decommissioning or 
decommissioned. To get rid of decommissioned nodes from Ambari, a 
seperate API call should be made to Ambari to stop/uninstall those 

datanodes. 


Agent: 

Agents will heartbeat to the master every few seconds and will receive 
commands from the master in the heartbeat responses.  Heartbeat responses 
will be the only way for master to send a command to the agent.  The command 
will be queued in the  action queue, which will be picked up by the action 
executioner.  Action executioner will pick the right tool (Puppet, Python, etc) for 
execution depending on the command type and action type. Thus the actions 
sent in the hearbeat response will be processed asynchronously at the agent. 
The action executioner will put the response or progress messages on the 
message queue. The agent will send everything on the message queue to the 

master in the next heartbeat. 


Recovery

There are two main choices in terms of master recovery. • Recovery based on actions: In this case every action is persisted and upon 
a restart, the master checks for pending actions and reschedules them. The 
state of the cluster is also persisted in the database and master rebuilds the 
state machines upon a restart. There could be a race condition that some 
actions might be complete but master crashes before recording their 
completion, this will be handled by ensuring actions are idempotent and 
master will re-schedule all actions that are not marked as completed or failed 
in the DB. Persisted actions can be viewed as redo logs which is a very 
common approach.
• Alternating to the above approach is recovery based on the desired state. In 
this approach the master persists a desired state and upon restart it matches 
desired state with the live state and tries to restore the cluster to the desired 
state. 
The approach based on actions gels better with our overall design because we preplan an operation and persist it, therefore even if we persist desired  state, the 
recovery will require a re-planning of actions. Also the desired state approach 
doesn’t capture certain operations that don’t change the state of the cluster from 
Ambari point of view, e.g., smoke tests or hdfs re-balancing. Persisted actions can 
be viewed as redo logs.
Agent recovery requires just an agent restart because the agent is stateless. An 
agent failure will be detected by the master by heartbeat loss beyond a threshold of 
time. 

TBD: How is agent restarted?


Security

API Authentication Methods
One way is to use HTTP Basic Authentication.  This means the client would pass the 
credentials (or a base64 encoded version of it) to the server on every request.  The 
server would have to validate credentials on every call.  It makes it easy for CLI 
clients to use the API this way.  Also the session state is not stored on the server.
Another way is to utilize HTTP Session and Cookies.  Once the server validates the 
credentials, the server generates and stores the session ID in its store.  The session 
ID is sent back to the client to be stored as a cookie.  The client can pass this 
session ID on subsequent calls.  This is more efficient in that credentials need not be 
validated on every API request.  VMware’s vCloud REST API 1.0 used to support 
this.  However, they dropped this in later versions due to certain security concerns:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displ
ayKC&externalId=1025884
Also, this approach forces the server to store session state which is a big no no for 
RESTfulness (IF we care).
We can support both approaches.
For CLI, the first method is preferred.  For browser based applications, the latter 
method avoids making redundant credential validations.Both authentication methods (and any API call that requires an authenticated user) 
should be done over HTTPS.  User credentials or the session token should never be 

transferred over HTTP.


Bootstrap

In HMC (released as 1.0) bootstrapping was a very integrated process of 
installing/configuring the hosts. In Ambari 1.1, bootstrapping will be a helper routine 
to install the Ambari agents on the hosts. In HMC 1.0, the bootstrap process figures 
out the information about a host and adds that to the database. This will now be 
done as a part of Ambari agent registering to the server. The bootstrap process will 
just SSH onto the hosts and install the Ambari agent.  No database changes will be 
done as part of doing a bootstrap.
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值