- 实现方案:通过Rest API方式获取集群资源与内存信息判断是否继续提交任务还是将任务阻塞在客户端
1、Yarn的Rest API
(1)Rest API简介
Rest将增删改查直接映射到HTTP中已实现的GET、POST、PUT、DELETE方法。
(2)Yarn Rest API
Hadoop YARN 自带了一系列的web service REST API,我们可以通过这个web service,可以通过web service访问集群(cluster)、节点(nodes)、应用(application)以及应用的历史信息。
- 注意事项:web service REST API语法如下
http://{http address of service}/ws/{version}/{resourcepath}
①address:获取服务器的地址信息,包括RM、NM、History
②ws:固定语句
③version:API版本
④resourcepath:资源路径
(3)Yarn集群监控指标相关Rest API
①API说明
- 访问路径:
http://<rm http address:port>/ws/<version>/cluster/metrics
- 查询结果
- 参数说明
元素 | 解释 |
---|---|
appsSubmitted | 提交的申请量 |
appsCompleted | 已完成的申请量 |
appsPending | 待处理的申请数量 |
appsRunning | 运行的应用程序数量 |
appsFailed | 申请失败次数 |
appsKilled | 被杀死的应用程序数量 |
reservedMB | 保留的内存量(以 MB 为单位) |
availableMB | 可用内存量(以 MB 为单位) |
allocatedMB | 以 MB 为单位分配的内存量 |
totalMB | 以 MB 为单位的总内存量 |
reservedVirtualCores | 保留的虚拟核心数 |
availableVirtualCores | 可用虚拟核心数 |
allocatedVirtualCores | 分配的虚拟核心数 |
totalVirtualCores | 虚拟核心总数 |
containersAllocated | 分配的容器数量 |
containersReserved | 保留的容器数量 |
containersPending | 待处理的容器数 |
totalNodes | 节点总数 |
activeNodes | 活跃节点数 |
lostNodes | 丢失节点数 |
unhealthyNodes | 不健康节点数 |
decommissionedNodes | 退役节点数 |
rebootedNodes | 重启的节点数 |
②Rest API实战案例
- yarn连接配置文件
# RM
RM_ACTIVE = 10.148.15.6
RM_ACTIVE_HOST = 8088
RM_STANDBY = 127.0.0.1
RM_STANDBY_HOST = 8088
PERCENT=0.98
REST_RESOURCE_URL=/ws/v1/cluster/metrics
- yarn配置类
public class Configuration {
private static Logger logger = Logger.getLogger(Configuration.class);
private static Properties properties = new Properties();
private static Configuration configuration ;
public static Configuration getConfiguration(){
if (configuration == null){
synchronized (Configuration.class){
if (configuration == null){
configuration = new Configuration();
}
}
}
return configuration;
}
public String get(String key){
return (String)properties.get(key);
}
private Configuration(){
String filePath = System.getProperty("user.dir") + "/src/main/resources/yarn.properties";
InputStream in = null;
try {
in = new BufferedInputStream(new FileInputStream(filePath));
properties.load(in);
} catch (FileNotFoundException e) {
logger.error(e.toString());
} catch (IOException e) {
logger.error(e.toString());
}
}
}
- yarn集群指标监控
public class CheckRM {
private static Logger logger = Logger.getLogger(CheckRM.class);
public static volatile boolean isReady = true;
public static void main(String[] args) throws IOException {
// ①读取配置文件
PropertyConfigurator.configure(System.getProperty("user.dir")
+ "/src/main/resources/log4j.properties");
String hostA = Configuration.getConfiguration().get("RM_ACTIVE");
String portA = Configuration.getConfiguration().get("RM_ACTIVE_HOST");
String hostS = Configuration.getConfiguration().get("RM_STANDBY");
String portS = Configuration.getConfiguration().get("RM_STANDBY_HOST");
double percent = Double.parseDouble(Configuration.getConfiguration().get("PERCENT"));
System.out.println("active host : " + hostA + ", active port : " + portA + ";standby host : " + hostS + ",standby port :" + portS);
String restResourceUrl = Configuration.getConfiguration().get("REST_RESOURCE_URL");
String restUrl = "http://" + hostA + ":" + portA + restResourceUrl;
// ②Rest API连接集群
JSONObject resource = JSON.parseObject(HttpUtil.sendGet(restUrl)).getJSONObject("clusterMetrics");
// ③获取相关参数
long totalMem = resource.getLong("totalMB");
long usedMem = resource.getLong("allocatedMB");
long totalCpu = resource.getLong("totalVirtualCores");
long usedCpu = resource.getLong("allocatedVirtualCores");
System.out.println("总内存容量 is " + totalMem + ", 已被使用内存容量 is " + usedMem
+ "and CPU容量 is " + totalCpu + ", 被使用CPU容量 is " + usedCpu + " in YARN");
if ((totalMem * percent < usedMem) || totalCpu * percent < usedCpu) {
System.out.println("yarn is busy, please waiting...");
} else {
System.out.println("commit yarn task");
}
}
}
- 查询结果
(4)监控ResourceManager状态的Rest API
①Rest API简介
- 访问路径:
http://leidi01:8088/ws/v1/cluster
- 查询结果
- 参数说明
元素 | 含义 |
---|---|
id | 集群ID |
startedOn | 集群启动的时间(以毫秒为单位) |
state | ResourceManager 状态 - 有效值为:NOTINITED、INITED、STARTED、STOPPED |
haState | ResourceManager HA 状态 - 有效值为:INITIALIZING、ACTIVE、STANDBY、STOPPED |
resourceManagerVersion | 资源管理器的版本 |
resourceManagerBuildVersion | ResourceManager 构建字符串,包含构建版本、用户和校验 |
resourceManagerVersionBuiltOn | ResourceManager 构建时的时间戳(自纪元以来的毫秒数) |
hadoopVersion | hadoop常用版本 |
hadoopBuildVersion | 具有构建版本、用户和校验和的 Hadoop 通用构建字符串 |
hadoopVersionBuiltOn | 构建hadoop common时的时间戳(自纪元以来的毫秒数) |
②Rest API实战案例
- 配置文件
# RM
RM_ACTIVE = 192.168.6.102
RM_ACTIVE_HOST = 8088
RM_STANDBY = 127.0.0.1
RM_STANDBY_HOST = 8088
PERCENT=0.98
REST_RESOURCE_URL=/ws/v1/cluster/metrics
REST_CLUSTER_URL=/ws/v1/cluster
- 线程实现
public class CheckRM {
private static Logger logger = Logger.getLogger(CheckRM.class);
public static volatile boolean isReady = true;
public static void main(String[] args) throws IOException {
// ①读取配置文件
PropertyConfigurator.configure(System.getProperty("user.dir")
+ "/src/main/resources/log4j.properties");
String hostA = Configuration.getConfiguration().get("RM_ACTIVE");
String portA = Configuration.getConfiguration().get("RM_ACTIVE_HOST");
String hostS = Configuration.getConfiguration().get("RM_STANDBY");
String portS = Configuration.getConfiguration().get("RM_STANDBY_HOST");
System.out.println("active host : " + hostA + ", active port : " + portA + ";standby host : " + hostS + ",standby port :" + portS);
String restClusterUrl = Configuration.getConfiguration().get("REST_CLUSTER_URL");
String restUrl1 = "http://" + hostA + ":" + portA + restClusterUrl;
// ②发送http请求连接集群并获取返回信息
JSONObject cluster = JSON.parseObject(HttpUtil.sendGet(restUrl1)).getJSONObject("clusterInfo");
// ③获取相关参数
long id = cluster.getLong("id");
long startedOn = cluster.getLong("startedOn");
String state = cluster.getString("state");
String haState = cluster.getString("haState");
System.out.println("集群ID is: " + id + "" + ", 集群启动时间 is: " + startedOn + ", RM状态" + state + ", RM HA状态" + haState);
}
}
- 运行结果
2、CheckRM线程作用
(1)CheckRM线程业务逻辑
- 核心作用:获取Yarn集群内存与CPU资源信息,并根据获取到的资源信息判断是否可以继续提交任务
//检测Yarn集群资源,判断是否提交任务
Thread checkRMThread = new Thread(new CheckRM(), "checkRMThread");
logger.info("executor ckeckRM thread start...");
checkRMThread.start();
①向Yarn发送http请求
String restUrl = "http://" + hostA + ":" + portA + restResourceUrl;
②线程循环检测获取Yarn资源信息
long totalMem = resource.getLong("totalMB");
long usedMem = resource.getLong("allocatedMB");
long totalCpu = resource.getLong("totalVirtualCores");
long usedCpu = resource.getLong("allocatedVirtualCores");
// 线程循环检测频率每30S一次
Thread.sleep(30000);
③判断Yarn资源是否充足进行相关处理
if ((totalMem * percent < usedMem) ||
totalCpu * percent < usedCpu){
logger.warn("yarn is busy, please waiting...");
this.isReady = false;
}else {
this.isReady = true;
}
(2)CheckRM线程源码
①properties配置文件
# RM
RM_ACTIVE = 10.148.15.6
RM_ACTIVE_HOST = 8088
RM_STANDBY = 127.0.0.1
RM_STANDBY_HOST = 8088
PERCENT=0.98
REST_RESOURCE_URL=/ws/v1/cluster/metrics
②配置类
public class Configuration {
private static Logger logger = Logger.getLogger(Configuration.class);
private static Properties properties = new Properties();
private static Configuration configuration ;
public static Configuration getConfiguration(){
if (configuration == null){
synchronized (Configuration.class){
if (configuration == null){
configuration = new Configuration();
}
}
}
return configuration;
}
public String get(String key){
return (String)properties.get(key);
}
private Configuration(){
String filePath = System.getProperty("user.dir") + "/conf/executor.properties";
InputStream in = null;
try {
in = new BufferedInputStream(new FileInputStream(filePath));
properties.load(in);
} catch (FileNotFoundException e) {
logger.error(e.toString());
} catch (IOException e) {
logger.error(e.toString());
}
}
}
③线程逻辑
public class CheckRM implements Runnable {
private static Logger logger = Logger.getLogger(CheckRM.class);
public static volatile boolean isReady = true;
public void run() {
logger.info("checkRM start...");
String hostA = Configuration.getConfiguration().get("RM_ACTIVE");
String portA = Configuration.getConfiguration().get("RM_ACTIVE_HOST");
String hostS = Configuration.getConfiguration().get("RM_STANDBY");
String portS = Configuration.getConfiguration().get("RM_STANDBY_HOST");
double percent = Double.parseDouble(Configuration.getConfiguration().get("PERCENT"));
logger.debug("RM host : " + hostA + ", port : " + portA
+ "; hostS : " + hostS + ", portS :" + portS);
String restResourceUrl = Configuration.getConfiguration().get("REST_RESOURCE_URL");
boolean isActive = true;
String restUrl = "http://" + hostA + ":" + portA + restResourceUrl;
while (true){
try {
logger.debug("restUrl is " + restUrl);
JSONObject resource = JSON.parseObject(HttpUtil.sendGet(restUrl)).getJSONObject("clusterMetrics");
logger.debug("yarn's status is : " + resource.toString());
long totalMem = resource.getLong("totalMB");
long usedMem = resource.getLong("allocatedMB");
long totalCpu = resource.getLong("totalVirtualCores");
long usedCpu = resource.getLong("allocatedVirtualCores");
logger.debug("totalMem is " + totalMem + ", usedMem is " + usedMem
+ "and totalCpu is " + totalCpu + ", usedCpu is " + usedCpu + " in YARN");
if ((totalMem * percent < usedMem) ||
totalCpu * percent < usedCpu){
logger.warn("yarn is busy, please waiting...");
this.isReady = false;
}else {
this.isReady = true;
}
logger.debug("Now YARN can schedule " + this.isReady);
} catch (IOException e) {
logger.error(e.getMessage() + ", changed", e);
if (isActive){
restUrl = "http://" + hostS + ":" + portS + restResourceUrl;
isActive = false;
}else {
restUrl = "http://" + hostA + ":" + portA + restResourceUrl;
isActive = true;
}
} catch (Exception e) {
logger.error("check RM error ", e);
}
try {
Thread.sleep(30000);
} catch (InterruptedException e) {
logger.error(e.getMessage());
}
}
}
public static void main(String[] args){
PropertyConfigurator.configure(System.getProperty("user.dir")
+ "/conf/log4j.properties");
CheckRM checkRM = new CheckRM();
Thread thread = new Thread(checkRM, "CheckRMThread");
thread.start();
}
}