1.每当看到一些好的操作总会记录下来。这次看看hbase源码中的骚操作吧:
一、创建对象:
如何灵活增加删除接口的实现类呢?保证使用的时候只加载一次呢?看看hbase源码如何做的
1.1
@InterfaceAudience.Private
public class CompatibilitySingletonFactory extends CompatibilityFactory {
//枚举单例实现。为啥用枚举单例不用懒加载还有什么二阶段锁之类的?
public static enum SingletonStorage {
INSTANCE;
private final Object lock = new Object();
private final Map<Class, Object> instances = new HashMap<>();
}
private static final Logger LOG = LoggerFactory.getLogger(CompatibilitySingletonFactory.class);
/**
* This is a static only class don't let anyone create an instance.
*/
protected CompatibilitySingletonFactory() { }
/**
* Get the singleton instance of Any classes defined by compatibiliy jar's
*
* @return the singleton
*/
@SuppressWarnings("unchecked")
public static <T> T getInstance(Class<T> klass) {
synchronized (SingletonStorage.INSTANCE.lock) {
T instance = (T) SingletonStorage.INSTANCE.instances.get(klass);
if (instance == null) {
try {
//这是啥?
ServiceLoader<T> loader = ServiceLoader.load(klass);
Iterator<T> it = loader.iterator();
instance = it.next();
if (it.hasNext()) {
StringBuilder msg = new StringBuilder();
msg.append("ServiceLoader provided more than one implementation for class: ")
.append(klass)
.append(", using implementation: ").append(instance.getClass())
.append(", other implementations: {");
while (it.hasNext()) {
msg.append(it.next()).append(" ");
}
msg.append("}");
LOG.warn(msg.toString());
}
} catch (Exception e) {
throw new RuntimeException(createExceptionString(klass), e);
} catch (Error e) {
throw new RuntimeException(createExceptionString(klass), e);
}
// If there was nothing returned and no exception then throw an exception.
if (instance == null) {
throw new RuntimeException(createExceptionString(klass));
}
SingletonStorage.INSTANCE.instances.put(klass, instance);
}
return instance;
}
}
}
1.饿汉式:单例实例在类装载时就构建,急切初始化。(预先加载法)
class Test {
public static Test instance = new Singleton2();
public static Test getInstance() {
return instance;
}
}
优点
1.线程安全
2.在类加载的同时已经创建好一个静态对象,调用时反应速度快
缺点
资源效率不高,可能getInstance()永远不会执行到,但执行该类的其他静态方法或者加载了该类(class.forName),那么这个实例仍然初始化 。
懒汉式:单例实例在第一次被使用时构建,延迟初始化。
class Test {
private Test() {
}
public static Test instance = null;
public static Test getInstance() {
//这个地方可能同时进来多个调用方
if (instance == null) {
instance = new Singleton2();
}
return instance;
}
}
3.1演进一下:
class Test {
private Test() {
}
public static Test instance = null;
public static Test getInstance() {
synchronized (Test.class) {
if (instance == null) {
instance = new Test();
}
}
return instance;
}
}
太慢了,每个都要判断一下进来都有判断一下
3.2再演进一下:
class Test {
private Test() {
}
public static Test instance = null;
public static Test getInstance() {
if (instance == null) {
synchronized (Test.class) {
if (instance == null) {
instance = new Test();
}
}
}
return instance;
}
}
很完美了吗?有没有坑呢?当然有,,,我们看看synchronized这个关键字,这个关键字一出来里面就可以看成是一个单线程。单线程我们要考虑as-if-serial这种东西。 就是指令的重排。这个时候不知道有没有小伙伴拿什么happenbefore的出来,有的话说明没有理解全。顺便讲一下happenbefore,讲的是线程间的可见性。happenbefore不保证无依赖的两个谁先谁后,讲的是如果A线程结果对B线程可见,A就在B的前面,没有就随便重排。B线程在哪里看呢?内存中,这又引出了java内存模型。扯远了。
instance 可以加关键字volatile来增加内存可见性。
4.那枚举的有什么优点,第一个安全性方面:不能通过反射来破坏了。第二个就是可以序列化。
ServiceLoader这一个类可以加载 META-INF.
services 文件夹下面的东西。这个文件夹下面文件可以配置接口的实现类。满满套路,使得实现类变得更加灵活。
https://blog.csdn.net/shi2huang/article/details/80308531
学会了么?
二、Mapreduce模型,相信大家看Hadoop代码的时候总会被绕的出不来,那说明你没有去思考。来思考一下,如果你去设计你会怎么设计?
1.任何事物都是有规则或是可以描述的,不然程序开发不了。有规则就有限定,有限定就有作用域。
先看输入,对于输入来说我们知道输入类型,但是我们不知道输出类型。怎么办?泛型来解决。那么就有了:
public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>
这是一个最高层的抽象。输入输出都是抽象的,唯有一个map阶段。
protected void map(KEYIN key, VALUEIN value,
Context context) throws IOException, InterruptedException {
context.write((KEYOUT) key, (VALUEOUT) value);
}
抽象出第一个动作,传入处理,然后输出。输出环境抽象。
来看一下
public class CellCounter extends Configured implements Tool
先看这个:
public class Configured implements Configurable {
private Configuration conf;
public Configured() {
this((Configuration)null);
}
public Configured(Configuration conf) {
this.setConf(conf);
}
public void setConf(Configuration conf) {
this.conf = conf;
}
public Configuration getConf() {
return this.conf;
}
}
@Public
@Stable
public interface Tool extends Configurable {
int run(String[] var1) throws Exception;
}
为啥不把这两个接口直接合并呢?为啥要多此一举呢?
为了将配置信息合跑的动作分开。
既然分开为啥要继承一个配置类呢?
public static int run(Configuration conf, Tool tool, String[] args) throws Exception {
if (conf == null) {
conf = new Configuration();
}
GenericOptionsParser parser = new GenericOptionsParser(conf, args);
//看过来,这里就是将这个在哪里,我需要在每个实现类里面这么写么?
tool.setConf(conf);
String[] toolArgs = parser.getRemainingArgs();
return tool.run(toolArgs);
}
看过来同学们:
Configured 有一个setConf方法,在子类继承了父类的方法。将子类传入,附带了父类的属性。
Context类如下:
public abstract class Context
implements MapContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
}
首先看一下MRJobConfig这个类:
/ Put all of the attribute names in here so that Job and JobContext are
// consistent.
public static final String INPUT_FORMAT_CLASS_ATTR = "mapreduce.job.inputformat.class";
public static final String MAP_CLASS_ATTR = "mapreduce.job.map.class";
public static final String MAP_OUTPUT_COLLECTOR_CLASS_ATTR
= "mapreduce.job.map.output.collector.class";
public static final String COMBINE_CLASS_ATTR = "mapreduce.job.combine.class";
public static final String REDUCE_CLASS_ATTR = "mapreduce.job.reduce.class";
public static final String OUTPUT_FORMAT_CLASS_ATTR = "mapreduce.job.outputformat.class";
这个类定义了一些包的路径。
public interface JobContext extends MRJobConfig {
这个类主要定位是跑作业的操作,
public Configuration getConfiguration();
public Credentials getCredentials();
public JobID getJobID();
public int getNumReduceTasks();
public Path getWorkingDirectory() throws IOException;
public Class<?> getOutputKeyClass();
public Class<?> getOutputValueClass();
public Class<?> getMapOutputKeyClass();
public Class<?> getMapOutputValueClass();
public String getJobName();
public Class<? extends InputFormat<?,?>> getInputFormatClass()
throws ClassNotFoundException;
public Class<? extends Mapper<?,?,?,?>> getMapperClass()
throws ClassNotFoundException;
public Class<? extends Reducer<?,?,?,?>> getCombinerClass()
throws ClassNotFoundException;
public Class<? extends Reducer<?,?,?,?>> getReducerClass()
throws ClassNotFoundException;
public Class<? extends OutputFormat<?,?>> getOutputFormatClass()
throws ClassNotFoundException;
public Class<? extends Partitioner<?,?>> getPartitionerClass()
throws ClassNotFoundException;
public RawComparator<?> getSortComparator();
public String getJar();
public RawComparator<?> getCombinerKeyGroupingComparator();
public RawComparator<?> getGroupingComparator();
public boolean getJobSetupCleanupNeeded();
public boolean getTaskCleanupNeeded();
public boolean getProfileEnabled();
public String getProfileParams();
public IntegerRanges getProfileTaskRange(boolean isMap);
public String getUser();
public boolean getSymlink();
public Path[] getArchiveClassPaths();
public URI[] getCacheArchives() throws IOException;
public URI[] getCacheFiles() throws IOException;
public Path[] getLocalCacheArchives() throws IOException;
public Path[] getLocalCacheFiles() throws IOException;
public Path[] getFileClassPaths();
public String[] getArchiveTimestamps();
public String[] getFileTimestamps();
public int getMaxMapAttempts();
public int getMaxReduceAttempts();
这些都是跑作业需要的操作。
public interface TaskAttemptContext extends JobContext, Progressable {
下一层就是task的配置信息
配置输入输出值:
public interface TaskInputOutputContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
extends TaskAttemptContext {
public boolean nextKeyValue() throws IOException, InterruptedException;
public KEYIN getCurrentKey() throws IOException, InterruptedException;
public VALUEIN getCurrentValue() throws IOException, InterruptedException;
public void write(KEYOUT key, VALUEOUT value)
throws IOException, InterruptedException;
public OutputCommitter getOutputCommitter();
配置输入值:
public interface MapContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
extends TaskInputOutputContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
public InputSplit getInputSplit();
}
到了这里:
public abstract class Context
implements MapContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
}
Context的继承类
WrappedMapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>
用了一个装饰者模式:
public Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>.Context
getMapContext(MapContext<KEYIN, VALUEIN, KEYOUT, VALUEOUT> mapContext) {
return new Context(mapContext);
}
传入一个map你就来实现。
说明实现这个类那么我们就实现了上面接口的所有方法。这个类如何传入呢?
MapRunner(Context context) throws IOException, InterruptedException {
//正常情况通过context获取
mapper = ReflectionUtils.newInstance(mapClass,
context.getConfiguration());
try {
Constructor c = context.getClass().getConstructor(
Mapper.class,
Configuration.class,
TaskAttemptID.class,
RecordReader.class,
RecordWriter.class,
OutputCommitter.class,
StatusReporter.class,
InputSplit.class);
c.setAccessible(true);
subcontext = (Context) c.newInstance(
mapper,
outer.getConfiguration(),
outer.getTaskAttemptID(),
new SubMapRecordReader(),
new SubMapRecordWriter(),
context.getOutputCommitter(),
new SubMapStatusReporter(),
outer.getInputSplit());
} catch (Exception e) {
try {
//写一个模板,因为组合很多
Constructor c = Class.forName("org.apache.hadoop.mapreduce.task.MapContextImpl").getConstructor(
Configuration.class,
TaskAttemptID.class,
RecordReader.class,
RecordWriter.class,
OutputCommitter.class,
StatusReporter.class,
InputSplit.class);
c.setAccessible(true);
//将输入塞入模板中
MapContext mc = (MapContext) c.newInstance(
outer.getConfiguration(),
outer.getTaskAttemptID(),
new SubMapRecordReader(),
new SubMapRecordWriter(),
context.getOutputCommitter(),
new SubMapStatusReporter(),
outer.getInputSplit());
Class<?> wrappedMapperClass = Class.forName("org.apache.hadoop.mapreduce.lib.map.WrappedMapper");
Method getMapContext = wrappedMapperClass.getMethod("getMapContext", MapContext.class);
subcontext = (Context) getMapContext.invoke(
wrappedMapperClass.getDeclaredConstructor().newInstance(), mc);
} catch (Exception ee) { // FindBugs: REC_CATCH_EXCEPTION
// rethrow as IOE
throw new IOException(e);
}
}
}
那contex到底t从哪里来啊?一会是装饰一下,一会类来构建一下。
public void run(Context context) throws IOException, InterruptedException {
outer = context;
int numberOfThreads = getNumberOfThreads(context);
mapClass = getMapperClass(context);
if (LOG.isDebugEnabled()) {
LOG.debug("Configuring multithread runner to use " + numberOfThreads +
" threads");
}
executor = Executors.newFixedThreadPool(numberOfThreads);
for(int i=0; i < numberOfThreads; ++i) {
MapRunner thread = new MapRunner(context);
executor.execute(thread);
}
executor.shutdown();
while (!executor.isTerminated()) {
// wait till all the threads are done
Thread.sleep(1000);
}
}
看上层设计:Context类的
public void run(Context context) throws IOException, InterruptedException {
setup(context);
try {
while (context.nextKeyValue()) {
map(context.getCurrentKey(), context.getCurrentValue(), context);
}
} finally {
cleanup(context);
}
}
如果调用了run方法那么就有了Context的实例.