hadoop-hdfs
houzhizhen
专注大数据处理和分布式计算。
展开
-
Enable HTTPS for HDFS
hadoop https配置 在 hadoop1生成ca并拷贝至hadoop2,hadoop2。 (密码随便设置,大于6位即可。如adminadmin) 1 2 3 4 5 6 cd /etc/https openssl req -new -x509 -keyout hdfs_ca_key -out hdfs_...转载 2018-05-11 11:15:08 · 819 阅读 · 0 评论 -
Dealing with Primary Datanode when HDFS Client Write Data
The Primary Datanode is used either setup pipeline or write data through pipeline.setup pipelineMethod DFSOutputStream.nextBlockOutputStream is called when setup pipeline. 1. locateFollowingB...原创 2018-04-13 19:13:24 · 166 阅读 · 0 评论 -
Hadoop DistributedFileSystem client can buffer 5120k at most
By default, client can buffer 80 packets, and there 64k per packet,so client can buffer 80*64k=5120k. public static final String DFS_CLIENT_WRITE_MAX_PACKETS_IN_FLIGHT_KEY = "dfs.client.write.max-...原创 2018-04-13 15:15:10 · 229 阅读 · 0 评论 -
hadoop-env.sh loaded twice when start namenode
hadoop-config.shhadoop-config.sh source the hadoop-env.shif [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then . "${HADOOP_CONF_DIR}/hadoop-env.sh"fihadoop-daemon.shAt top of hadoop-daemon.sh,...原创 2018-03-23 15:57:47 · 331 阅读 · 0 评论 -
Hadoop hdfs does not update modificationTime when add block while the linux system does
ModificationTime on HDFSWhen file is created on hdfs, the system set modification time, then the modification time does not update until close it. In addition to close a file, the modification ti...原创 2018-03-07 16:13:39 · 285 阅读 · 0 评论 -
A Secure HDFS Client Example
It takes about 3 lines of Java code to write a simple HDFS client that can further be used to upload, read or list files. Here is an example:Java转载 2017-11-06 15:00:52 · 450 阅读 · 0 评论 -
Increasing the handler count can improve the performance of NameNode significantly
In this test, use NNThroughputBenchmark to create directories, the dirs parameter is always 10000, and dirsPerDir is 128. Cpu cores is 8.set handler count to 2 <property> <name>dfs.namenode.handl原创 2017-11-13 14:35:13 · 295 阅读 · 0 评论 -
"hdfs dfsadmin -setBalancerBandwidth" Set Balancer Bandwidth Dynamically
DFSAdmin.setBalancerBandwidthIt calls the namenode.setBalancerBandwidth./** * Command to ask the namenode to set the balancer bandwidth for all of the * datanodes. * Usage: hdfs dfsadmin -setB原创 2017-10-17 14:20:16 · 1144 阅读 · 0 评论 -
hadoop compress file
compress files in directory to another directoryhadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-0.20.2-cdh3u2.jar \ -Dmapred.output.compress=true \ -Dmapred.compress.map.output=true...原创 2018-05-16 09:55:24 · 273 阅读 · 0 评论 -
set superuser unreadable on hadoop
public static final String SECURITY_XATTR_UNREADABLE_BY_SUPERUSER = "security.hdfs.unreadable.by.superuser";FSNameSystem.getBlockLocationsIntIt checks checkUnreadableBySuperuser, if pr...原创 2018-07-20 15:08:33 · 195 阅读 · 0 评论 -
DelegationTokenSecretManager does not run ExpiredTokenRemover thead by default
DelegationTokenSecretManager does not run ExpiredTokenRemover thead when in AuthenticationMethod.SIMPLE mode.AbstractDelegationTokenSecretManager.startSecretManager starts ExpiredTokenRemover thr...原创 2018-08-10 16:07:14 · 510 阅读 · 0 评论 -
hadoop can set the modification time and access time of a file at will if he/she has write permissio
Client@Test public void testSettime() throws IOException { String src = "/user/hadoop/hosts2"; long day = 1000 * 60 * 60 * 24; client.setTimes(src, System.currentTimeMil...原创 2018-08-20 17:06:18 · 235 阅读 · 0 评论 -
ClientProtocol.getContentSummary
ContentSummary returned by ClientProtocol.getContentSummarypublic class ContentSummary implements Writable{ private long length; private long fileCount; private long directoryCount; private...原创 2018-08-23 10:01:21 · 1248 阅读 · 0 评论 -
Namenode upgrade
Active NamenodeNamenodeRpcServer.rollingUpgrade try { checkOperation(OperationCategory.WRITE); if (isRollingUpgrade()) { return rollingUpgradeInfo; } long startTime...原创 2018-08-13 17:39:19 · 358 阅读 · 0 评论 -
Namenode storage restore
FSEditLog.startLogSegment// ... storage.attemptRestoreRemovedStorage(); // ... NNStorage.attemptRestoreRemovedStorageSee if any of removed storages is “writable” again, and can be returned ...原创 2018-08-13 15:27:58 · 455 阅读 · 0 评论 -
NameNodeResourceMonitor
FSNamesystem.startActiveServices starts NameNodeResourceMonitor thread.this.nnrmthread = new Daemon(new NameNodeResourceMonitor()); nnrmthread.start();NameNodeResourceMonitor periodically ...原创 2018-08-13 14:52:19 · 358 阅读 · 0 评论 -
FSEditLogLoader.applyEditLogOp
private long applyEditLogOp(FSEditLogOp op, FSDirectory fsDir, StartupOption startOpt, int logVersion, long lastInodeId) throws IOException { long inodeId = INodeId.GRANDFATHER_INODE_ID; ...原创 2018-08-08 11:33:11 · 367 阅读 · 0 评论 -
PermissionStatusFormat stores MODE, GROUP AND USER into a 64 bits long
static enum PermissionStatusFormat { MODE(null, 16), GROUP(MODE.BITS, 25), USER(GROUP.BITS, 23); final LongBitFormat BITS; private PermissionStatusFormat(LongBitFormat previous, ...原创 2018-08-08 11:22:44 · 256 阅读 · 0 评论 -
NameNodeHttpServer deals with fsck
NameNodeHttpServer.setupServletsprivate static void setupServlets(HttpServer2 httpServer, Configuration conf) { httpServer.addInternalServlet("startupProgress", StartupProgressServlet.PA...原创 2018-08-14 17:28:04 · 208 阅读 · 0 评论 -
hadoop DirectoryScanner
Default value: reportCompileThreadPool –> 1DirectoryScanner(DataNode datanode, FsDatasetSpi<?> dataset, Configuration conf) { this.datanode = datanode; this.dataset = dataset; int interval原创 2017-10-16 14:31:13 · 818 阅读 · 0 评论 -
Using Hadoop Encryption Zone
BackgroundEncryption can be done at different layers in a traditional data management software/hardware stack. Choosing to encrypt at a given layer comes with different advantages and disadvantages.Ap原创 2017-10-23 17:17:01 · 1304 阅读 · 0 评论 -
HDFS Namenode Audit Design and Implemention
HDFS Namenode Audit Design and ImplementionHadoop Namenode can audit any operations, including “rename,open,delete,listStatus,create,setPermission,getfileinfo,mkdirs”.Hadoop Namenode can audit any oper原创 2017-04-01 15:48:39 · 923 阅读 · 0 评论 -
Hadoop Snappy安装终极教程
原创作品,允许转载,转载时请务必以超链接形式标明文章 原始出处 、作者信息和本声明。否则将追究法律责任。http://shitouer.cn/2013/01/hadoop-hbase-snappy-setup-final-tutorial/ 因为产品需要,这两天研究了一下Hadoop Snappy。先不说什么各个压缩算法之间的性能对比,单是这个安装过程,就很痛苦。网上有很多博友写H转载 2017-03-29 10:38:10 · 1428 阅读 · 0 评论 -
ScriptBasedMapping,CachedDNSToSwitchMapping,AbstractDNSToSwitchMapping,DNSToSwitchMapping类层次分析
DNSToSwitchMapping是接口。定义了以下几个方法:public interface DNSToSwitchMapping { /** * Resolves a list of DNS-names/IP-addresses and returns back a list of * switch information (network paths). One-t原创 2016-05-09 17:17:51 · 1034 阅读 · 0 评论 -
hadoop 2.6 Trash清除机制源代码分析
Namenode的垃圾回收机制,只有在命令行对文件进行删除时,会判断系统是否使用Trash,如果使用,则变成rename操作。在Namenode类的构造函数中,有this.haContext = createHAContext();createHAContext()初始化一个NamenodeHAContext对象,代码如下: protected HAContext createHACo原创 2016-01-26 18:35:39 · 1372 阅读 · 0 评论 -
hadoop 2.6.3 BlockPlacementPolicyDefault源代码分析
BlockPlacementPolicyDefault这个类负责为一个数据块的各副本选择目标数据结点。副本放置策略如下:如果写入者在一个datanode上,那么第一个副本在本机。否则随机选取一个结点。第二个副本先在另一个机架上,第三个副本被放置在同第二个副本同一机架,但不同的数据结点上。/** * The class is responsible for choosing the des原创 2016-01-28 18:51:46 · 1805 阅读 · 1 评论 -
hadoop 2.6.3 BlockPlacementPolicy分析
BlockPlacementPolicyDefault是BlockPlacementPolicy的实现类,你可以实现自己的实现类,用dfs.block.replicator.classname参数配置你的实现类。我们先看一下接口说明: 以下的方法为写入器选择numOfReplicas个数据结点来存储一个数据块的副本,数据块大小为blocksize。如果数量不够numOfReplic原创 2016-01-28 15:25:14 · 3011 阅读 · 0 评论 -
hadoop 2.6.3 CachedDNSToSwitchMapping源代码分析
CachedDNSToSwitchMapping的作用是对已经解析了的ip地址和rack的地址的映射进行缓冲。/** * A cached implementation of DNSToSwitchMapping that takes an * raw DNSToSwitchMapping and stores the resolved network location in * a原创 2016-01-28 10:34:13 · 826 阅读 · 0 评论 -
hadoop 2.6.0 CorruptReplicasMap源代码分析
CorruptReplicasMap 存储文件系统中坏块的相关信息。一个数据块只有在它所有的副本都损坏的情况下才认为是坏的。当汇报一个数据块的副本时,我们隐藏任何坏的副本的消息。如果一个数据块有期望数据的好的副本,这些损坏的副本会被立即消除。/** * Stores information about all corrupt blocks in the File System. * A原创 2016-01-27 18:28:56 · 696 阅读 · 0 评论 -
Hadoop CharacterTreeAuditLogger
package org.apache.hadoop.hdfs.namenode;import java.io.BufferedInputStream;import java.io.File;import java.io.FileInputStream;import java.io.IOException;import java.lang.management.GarbageCollector原创 2017-07-26 10:14:58 · 381 阅读 · 0 评论 -
hdfs dfsadmin -reconfig reload datanode dfs.datanode.data.dir without restart datanode
Reconfig Command:hdfs dfsadmin [-reconfig <datanode|...> <host:ipc_port> <start|status> Code AnalysisDFSAdmin.run if ("-reconfig".equals(cmd)) { exitCode = reconfig(argv, i);DFSAdmin.reconfigIf o原创 2017-09-15 16:09:12 · 1491 阅读 · 0 评论 -
The Process of Write a File to Encryption Zone
protected void copyStreamToTarget(InputStream in, PathData target) throws IOException { if (target.exists && (target.stat.isDirectory() || !overwrite)) { throw new PathExistsException(targe原创 2017-10-20 19:41:51 · 425 阅读 · 0 评论 -
HardLinkCGUnix use "stat -c%h" to get the hard link count
HardLinkCommandGetter/** * This abstract class bridges the OS-dependent implementations of the * needed functionality for querying link counts. * The particular implementation class is chosen原创 2017-10-13 11:12:13 · 418 阅读 · 0 评论 -
RoundRobinVolumeChoosingPolicy will cause storage imbalance when multiple storage type is specified
Both FsVolumeList.getNextVolume and getNextTransientVolume call the chooseVolume(list, blockSize), but with different list./** * Get next volume. * * @param blockSize free space needed on th原创 2017-10-13 10:51:24 · 318 阅读 · 0 评论 -
hadoop hdfs BlockPoolSlice du directory regularly
Part of the constructor of BlockPoolSlice. // Use cached value initially if available. Or the following call will // block until the initial du command completes. this.dfsUsage = new DU(bpDir,原创 2017-10-12 17:37:49 · 409 阅读 · 0 评论 -
hadoop2.0的datanode多目录数据副本存放策略
在hadoop2.0中,datanode数据副本存放磁盘选择策略有两种方式:第一种是沿用hadoop1.0的磁盘目录轮询方式,实现类:RoundRobinVolumeChoosingPolicy.java第二种是选择可用空间足够多的磁盘方式存储,实现类:AvailableSpaceVolumeChoosingPolicy.java选择策略对应的配置项是:转载 2017-10-12 14:37:05 · 513 阅读 · 0 评论 -
Ranger HDFS Plugin Details
The main entrance is specified in hdfs-site.xml . <property> <name>dfs.namenode.inode.attributes.provider.class</name> <value>org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer原创 2017-10-27 11:53:14 · 1832 阅读 · 0 评论 -
hadoop multiple namespace namenode failover
hdfs haadmin -ns yq01-ns1 -failover nn2 nn1原创 2017-08-31 19:50:25 · 286 阅读 · 0 评论 -
Default retry will not retry every command
The default retry policy is FailoverOnNetworkExceptionRetry, when the type of exception is socket exception, and the command is not isIdempotentOrAtMostOnce, then it will fail immediately.@Override原创 2017-09-26 11:28:11 · 396 阅读 · 0 评论 -
hadoop 2.6.0 hadoop 对Namenode image文件和Edit文件进行清理的NNStorageRetentionManager源代码分析
NNStorageRetentionManager对文件进行定期旋转删除。NNStorageRetentionManager的构造方法如下: public NNStorageRetentionManager(Configuration conf, NNStorage storage, LogsPurgeable purgeableLogs) { this(conf,原创 2016-01-27 17:15:15 · 1510 阅读 · 0 评论