Druid自身为各种外围功能定义了很多接口,比如存储就定义了:
- DataSegmentArchiver:用于对segment文件进行archive与restore,可用在s3之类的存储上,将暂时不用的segment放入到别的bucket中。
- DataSegmentFinder:用于在特定的目录下查找Druid segment,有时会根据正确的loadSpec更新deep storage上所有的descriptor.json文件。
- DataSegmentKiller:对segment文件进行删除操作。
- DataSegmentMover:用于移动segment文件。
- DataSegmentPuller:将特定segment的数据提取到特定的目录中。
- DataSegementPusher:将特定的segment的数据从特定的目录中提取出来。
其中HDFS存储实现了其中的4个接口:
- HdfsDataSegmentFinder
- HdfsDataSegementKiller
- HdfsDataSegmentPuller
- HdfsDataSegmentPusher
1. HdfsDataSegmentFinder:
@Override
public Set<DataSegment> findSegments(String workingDirPathStr, boolean updateDescriptor)
throws SegmentLoadingException
{
final Set<DataSegment> segments = Sets.newHashSet();
final Path workingDirPath = new Path(workingDirPathStr);
FileSystem fs;
try {
fs = workingDirPath.getFileSystem(config);
log.info(fs.getScheme());
log.info("FileSystem URI:" + fs.getUri().toString());
if (!fs.exists(workingDirPath)) {
throw new SegmentLoadingException("Working directory [%s] doesn't exist.", workingDirPath);
}
if (!fs.isDirectory(workingDirPath)) {
throw new SegmentLoadingException("Working directory [%s] is not a directory!?", workingDirPath);
}
final RemoteIterator<LocatedFileStatus> it = fs.listFiles(workingDirPath, true);
while (it.hasNext()) {
final LocatedFileStatus locatedFileStatus = it.next();
final Path path = locatedFileStatus.getPath();
if (path.getName().equals("descriptor.json")) {
final Path indexZip = new Path(path.getParent(), "index.zip");
if (fs.exists(indexZip)) {
final DataSegment dataSegment = mapper.readValue(fs.open(path), DataSegment.class);
log.info("Found segment [%s] located at [%s]", dataSegment.getIdentifier(), indexZip);
final Map<String, Object> loadSpec = dataSegment.getLoadSpec();
final String pathWithoutScheme = indexZip.toUri().getPath();
if (!loadSpec.get("type").equals(HdfsStorageDruidModule.SCHEME) || !loadSpec.get("path")
.equals(pathWithoutScheme)) {
loadSpec.put("type", HdfsStorageDruidModule.SCHEME);
loadSpec.put("path", pathWithoutScheme);
if (updateDescriptor) {
log.info("Updating loadSpec in descriptor.json at [%s] with new path [%s]", path, pathWithoutScheme);
mapper.writeValue(fs.create(path, true), dataSegment);
}
}
segments.add(dataSegment);
} else {
throw new SegmentLoadingException(
"index.zip didn't exist at [%s] while descripter.json exists!?",
indexZip
);
}
}
}
}
catch (IOException e) {
throw new SegmentLoadingException(e, "Problems interacting with filesystem[%s].", workingDirPath);
}
return segments;
}
从以上代码中可以看出,该方法是从特定的hdfs目录中获取符合条件的segments。如果updateDescriptor参数为true,将更新其descriptor.json文件。
2. HdfsDataSegmentKiller:用于在hdfs上将特定的segment删除。代码如下:
@Override
public void kill(DataSegment segment) throws SegmentLoadingException
{
final Path path = getPath(segment);
log.info("killing segment[%s] mapped to path[%s]", segment.getIdentifier(), path);
try {
if (path.getName().endsWith(".zip")) {
final FileSystem fs = path.getFileSystem(config);
if (!fs.exists(path)) {
log.warn("Segment Path [%s] does not exist. It appears to have been deleted already.", path);
return ;
}
// path format -- > .../dataSource/interval/version/partitionNum/xxx.zip
Path partitionNumDir = path.getParent();
if (!fs.delete(partitionNumDir, true)) {
throw new SegmentLoadingException(
"Unable to kill segment, failed to delete dir [%s]",
partitionNumDir.toString()
);
}
//try to delete other directories if possible
Path versionDir = partitionNumDir.getParent();
if (safeNonRecursiveDelete(fs, versionDir)) {
Path intervalDir = versionDir.getParent();
if (safeNonRecursiveDelete(fs, intervalDir)) {
Path dataSourceDir = intervalDir.getParent();
safeNonRecursiveDelete(fs, dataSourceDir);
}
}
} else {
throw new SegmentLoadingException("Unknown file type[%s]", path);
}
}
catch (IOException e) {
throw new SegmentLoadingException(e, "Unable to kill segment");
}
}
private boolean safeNonRecursiveDelete(FileSystem fs, Path path)
{
try {
return fs.delete(path, false);
}
catch (Exception ex) {
return false;
}
}
private Path getPath(DataSegment segment)
{
return new Path(String.valueOf(segment.getLoadSpec().get(PATH_KEY)));
}
3. HdfsDataSegmentPuller:用于将hdfs上的Segment提取到本地目录中。
@Override
public void getSegmentFiles(DataSegment segment, File dir) throws SegmentLoadingException
{
getSegmentFiles(getPath(segment), dir);
}
public FileUtils.FileCopyResult getSegmentFiles(final Path path, final File outDir) throws SegmentLoadingException
{
final LocalFileSystem localFileSystem = new LocalFileSystem();
try {
final FileSystem fs = path.getFileSystem(config);
if (fs.isDirectory(path)) {
// -------- directory ---------
try {
return RetryUtils.retry(
new Callable<FileUtils.FileCopyResult>()
{
@Override
public FileUtils.FileCopyResult call() throws Exception
{
if (!fs.exists(path)) {
throw new SegmentLoadingException("No files found at [%s]", path.toString());
}
final RemoteIterator<LocatedFileStatus> children = fs.listFiles(path, false);
final ArrayList<FileUtils.FileCopyResult> localChildren = new ArrayList<>();
final FileUtils.FileCopyResult result = new FileUtils.FileCopyResult();
while (children.hasNext()) {
final LocatedFileStatus child = children.next();
final Path childPath = child.getPath();
final String fname = childPath.getName();
if (fs.isDirectory(childPath)) {
log.warn("[%s] is a child directory, skipping", childPath.toString());
} else {
final File outFile = new File(outDir, fname);
// Actual copy
fs.copyToLocalFile(childPath, new Path(outFile.toURI()));
result.addFile(outFile);
}
}
log.info(
"Copied %d bytes from [%s] to [%s]",
result.size(),
path.toString(),
outDir.getAbsolutePath()
);
return result;
}
},
shouldRetryPredicate(),
DEFAULT_RETRY_COUNT
);
}
catch (Exception e) {
throw Throwables.propagate(e);
}
} else if (CompressionUtils.isZip(path.getName())) {
// -------- zip ---------
final FileUtils.FileCopyResult result = CompressionUtils.unzip(
new ByteSource()
{
@Override
public InputStream openStream() throws IOException
{
return getInputStream(path);
}
}, outDir, shouldRetryPredicate(), false
);
log.info(
"Unzipped %d bytes from [%s] to [%s]",
result.size(),
path.toString(),
outDir.getAbsolutePath()
);
return result;
} else if (CompressionUtils.isGz(path.getName())) {
// -------- gzip ---------
final String fname = path.getName();
final File outFile = new File(outDir, CompressionUtils.getGzBaseName(fname));
final FileUtils.FileCopyResult result = CompressionUtils.gunzip(
new ByteSource()
{
@Override
public InputStream openStream() throws IOException
{
return getInputStream(path);
}
},
outFile
);
log.info(
"Gunzipped %d bytes from [%s] to [%s]",
result.size(),
path.toString(),
outFile.getAbsolutePath()
);
return result;
} else {
throw new SegmentLoadingException("Do not know how to handle file type at [%s]", path.toString());
}
}
catch (IOException e) {
throw new SegmentLoadingException(e, "Error loading [%s]", path.toString());
}
}
4. HdfsDataSegmentPusher:将文件从特定的目录下copy到hdfs上:
@Override
public DataSegment push(File inDir, DataSegment segment) throws IOException
{
final String storageDir = DataSegmentPusherUtil.getHdfsStorageDir(segment);
log.info(
"Copying segment[%s] to HDFS at location[%s/%s]",
segment.getIdentifier(),
config.getStorageDirectory(),
storageDir
);
Path outFile = new Path(String.format("%s/%s/index.zip", config.getStorageDirectory(), storageDir));
FileSystem fs = outFile.getFileSystem(hadoopConfig);
fs.mkdirs(outFile.getParent());
log.info("Compressing files from[%s] to [%s]", inDir, outFile);
final long size;
try (FSDataOutputStream out = fs.create(outFile)) {
size = CompressionUtils.zip(inDir, out);
}
return createDescriptorFile(
segment.withLoadSpec(makeLoadSpec(outFile))
.withSize(size)
.withBinaryVersion(SegmentUtils.getVersionFromDir(inDir)),
outFile.getParent(),
fs
);
}
private DataSegment createDescriptorFile(DataSegment segment, Path outDir, final FileSystem fs) throws IOException
{
final Path descriptorFile = new Path(outDir, "descriptor.json");
log.info("Creating descriptor file at[%s]", descriptorFile);
ByteSource
.wrap(jsonMapper.writeValueAsBytes(segment))
.copyTo(new HdfsOutputStreamSupplier(fs, descriptorFile));
return segment;
}
private ImmutableMap<String, Object> makeLoadSpec(Path outFile)
{
return ImmutableMap.<String, Object>of("type", "hdfs", "path", outFile.toString());
}