Using Hadoop Encryption Zone

Background

Encryption can be done at different layers in a traditional data management software/hardware stack. Choosing to encrypt at a given layer comes with different advantages and disadvantages.

  • Application-level encryption. This is the most secure and most flexible approach. The application has ultimate control over what is encrypted and can precisely reflect the requirements of the user. However, writing applications to do this is hard. This is also not an option for customers of existing applications that do not support encryption.

  • Database-level encryption. Similar to application-level encryption in terms of its properties. Most database vendors offer some form of encryption. However, there can be performance issues. One example is that indexes cannot be encrypted.

  • Filesystem-level encryption. This option offers high performance, application transparency, and is typically easy to deploy. However, it is unable to model some application-level policies. For instance, multi-tenant applications might want to encrypt based on the end user. A database might want different encryption settings for each column stored within a single file.

    • Disk-level encryption. Easy to deploy and high performance, but also quite inflexible. Only really protects against physical theft.

HDFS-level encryption fits between database-level and filesystem-level encryption in this stack. This has a lot of positive effects. HDFS encryption is able to provide good performance and existing Hadoop applications are able to run transparently on encrypted data. HDFS also has more context than traditional filesystems when it comes to making policy decisions.

Architecture

Concept

  • An encryption zone is a special directory whose contents will be transparently encrypted upon write and transparently decrypted upon read.
  • data encryption key (DEK).
  • encrypted data encryption key (EDEK).
  • the Hadoop Key Management Server (KMS).

In the context of HDFS encryption, the KMS performs three basic responsibilities:

  • Providing access to stored encryption zone keys
  • Generating new encrypted data encryption keys for storage on the NameNode
  • Decrypting encrypted data encryption keys for use by HDFS clients

Configuration

etc/hadoop/kms-site.xml

Note: kms.keystore need not exists.

  <property>
    <name>hadoop.kms.key.provider.uri</name>
    <value>jceks://file@/home/houzhizhen/kms.keystore</value>
    <description>
      URI of the backing KeyProvider for the KMS.
    </description>
  </property>

  <property>
    <name>hadoop.security.keystore.JavaKeyStoreProvider.password</name>
    <value>any</value>
    <description>
      If using the JavaKeyStoreProvider, the password for the keystore file.
    </description>
  </property>

start kms

sbin/kms.sh start

etc/hadoop/core-site.xml

 <property>
      <name>hadoop.security.key.provider.path</name>
      <value>kms://http@localhost:16000/kms</value>
</property>

etc/hadoop/hdfs-site.xml

    <property>
     <name>dfs.encryption.key.provider.uri</name>
     <value>kms://http@localhost:16000/kms</value>
</property>

start hdfs

hadoop-daemon.sh start namenode
hadoop-daemon.sh start datanode

Example usage

# As the normal user, create a new encryption key
hadoop key create mykey
# output: mykey has been successfully created with options Options{cipher='AES/CTR/NoPadding', bitLength=128, description='null', attributes=null}.
KMSClientProvider[http://localhost:16000/kms/v1/] has been updated.

# As the super user, create a new empty directory and make it an encryption zone
hadoop fs -mkdir /zone
hdfs crypto -createZone -keyName mykey -path /zone

# chown it to the normal user
hadoop fs -chown myuser:myuser /zone

# As the normal user, put a file in, read it out
hadoop fs -put /etc/hosts  /zone
hadoop fs -cat /zone/hosts

KMS ACL

open it as normal user

hadoop fs -cat /zone/hosts

Open the file success.

etc/hadoop/kms-acls.xml

Add the following contents.

 <property>
    <name>key.acl.mykey.ALL</name>
    <value>myuser</value>
    <description>
      ACL for ALL operations.
    </description>
  </property>

ALL is equivalent to the following settings. The default value is *, which means anyone is allowed.

<property>
    <name>default.key.mykey.GENERATE_EEK</name>
    <value>myuser</value>
    <description>
      default ACL for GENERATE_EEK operations for all key acls that are not
      explicitly defined.
    </description>
  </property>

  <property>
    <name>default.key.mykey.DECRYPT_EEK</name>
    <value>myuser</value>
    <description>
      default ACL for DECRYPT_EEK operations for all key acls that are not
      explicitly defined.
    </description>
  </property>

  <property>
    <name>default.key.mykey.READ</name>
    <value>myuser</value>
    <description>
      default ACL for READ operations for all key acls that are not
      explicitly defined.
    </description>
  </property>

Get file again as superuser.

hadoop fs -cat /zone/hosts
cat: User [houzhizhen] is not authorized to perform [DECRYPT_EEK] on key with ACL name [mykey]!!

Get file as other person.

export HADOOP_USER_NAME=myuser
hadoop fs -cat /zone/hosts

Move file test

Do as superuser.

hadoop fs -mkdir /dir1
hadoop fs -chown myuser:myuser /dir1

Do as myuser

hadoop fs -put /etc/profile /dir1
hadoop fs -mv /dir1/profile /zone/
mv: /dir1/profile can't be moved into an encryption zone.

Conclusion

1. KMS has not HA. Even if set kms behind vip or load balance, how to prevent only one process write kms.keystore at a time.
2. Impostor can not be prohibited if user set HADOOP_USER_NAME environment.
3. Because file can not be moved from normal file to encryption zone, so if the location of a hive table is zone, hive staging dir must be a encryption too.
4. As rule 3, file can not be moved from encryption zone to trash, which is a normal directory.

The Process of Write a File to Encryption Zone

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值