CGROUP OOM控制
CGROUP是目前比较流行也比较常用的资源隔离技术,包括docker,hadoop都是使用cgroup做的资源隔离。当对内存做资源隔离时,当进程OOM后,可以选择直接kill进程,也可以不kill,默认选项是oom之后直接kill。可以通过以下方式关闭该功能:
echo 1 > memory.oom_control
OOM事件捕捉
但是当进程oom将进程kill掉之后,很难捕捉到oom日志,针对这种情况,cgroup提供了一种监听oom事件的方式,并提供了C语言实现方式。
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/eventfd.h>
#include <errno.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
static inline void die(const char *msg)
{
fprintf(stderr, "error: %s: %s(%d)\n", msg, strerror(errno), errno);
exit(EXIT_FAILURE);
}
static inline void usage(void)
{
fprintf(stderr, "usage: oom_eventfd_test <cgroup.event_control> <memory.oom_control>\n");
exit(EXIT_FAILURE);
}
#define BUFSIZE 256
int main(int argc, char *argv[])
{
char buf[BUFSIZE];
int efd, cfd, ofd, rb, wb;
uint64_t u;
if (argc != 3)
usage();
if ((efd = eventfd(0, 0)) == -1)
die("eventfd");
if ((cfd = open(argv[1], O_WRONLY)) == -1)
die("cgroup.event_control");
if ((ofd = open(argv[2], O_RDONLY)) == -1)
die("memory.oom_control");
if ((wb = snprintf(buf, BUFSIZE, "%d %d", efd, ofd)) >= BUFSIZE)
die("buffer too small");
if (write(cfd, buf, wb) == -1)
die("write cgroup.event_control");
if (close(cfd) == -1)
die("close cgroup.event_control");
for (;;) {
if (read(efd, &u, sizeof(uint64_t)) != sizeof(uint64_t))
die("read eventfd");
printf("mem_cgroup oom event received\n");
}
return 0;
}
JAVA中捕捉OOM KILL事件
在JAVA中想捕捉oom kill事件,采用java调用c的方式来实现,通过JNI方式有很多插件可以方便的调用c程序。我的工程是maven,使用这个工具
<dependency>
<groupId>org.fusesource.hawtjni</groupId>
<artifactId>hawtjni-runtime</artifactId>
<version>1.9</version>
</dependency>
写自己的native方法来对应c的方法就可以了
package ji;
import org.fusesource.hawtjni.runtime.JniArg;
import org.fusesource.hawtjni.runtime.JniClass;
import org.fusesource.hawtjni.runtime.JniMethod;
import org.fusesource.hawtjni.runtime.Library;
/**
* Created by ji on 17-5-18.
*/
@JniClass
public class OomNotifierNative {
private static final Library LIBRARY = new Library("native-oom-notifier", OomNotifierNative.class);
static {
LIBRARY.load();
}
@JniMethod(cast = "char *")
public static final native long oom_event_listener(@JniArg(cast = "char *") String ptr, @JniArg(cast = "char *") String ptr2);
}
对应的C方法如下:
#include "notifier.h"
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/eventfd.h>
#include <errno.h>
#include <string.h>
static inline void die(const char *msg)
{
fprintf(stderr, "error: %s: %s(%d)\n", msg, strerror(errno), errno);
exit(EXIT_FAILURE);
}
static inline void usage(void)
{
fprintf(stderr, "usage: oom_eventfd_test <cgroup.event_control> <memory.oom_control>\n");
exit(EXIT_FAILURE);
}
#define BUFSIZE 256
int oom_event_listener(char *event_ctrl,char *oom_ctrl){
char buf[BUFSIZE];
int efd, cfd, ofd, rb, wb;
uint64_t u;
if ((efd = eventfd(0, 0)) == -1)
die("eventfd");
if ((cfd = open(event_ctrl, O_WRONLY)) == -1)
die("cgroup.event_control");
if ((ofd = open(oom_ctrl, O_RDONLY)) == -1)
die("memory.oom_control");
if ((wb = snprintf(buf, BUFSIZE, "%d %d", efd, ofd)) >= BUFSIZE)
die("buffer too small");
if (write(cfd, buf, wb) == -1)
die("write cgroup.event_control");
if (close(cfd) == -1)
die("close cgroup.event_control");
for (;;) {
if (read(efd, &u, sizeof(uint64_t)) != sizeof(uint64_t))
die("read eventfd");
if (access(event_ctrl,0)==-1){
printf("group not exists\n");
return 2;
}
printf("mem_cgroup oom event received\n");
return 1;
}
return 0;
}
使用maven打包的时候用了下面这个插件
<plugin>
<groupId>org.fusesource.hawtjni</groupId>
<artifactId>maven-hawtjni-plugin</artifactId>
<version>1.9</version>
</plugin>