Many HDP users are increasing their focus on security within Hadoop and are looking for ways to encrypt their data. Fortunately, Hadoop provides several options for encrypting data at rest. At the lowest level of encryption, there is volume encryption that can encrypt all the data on a node and doesn’t require any changes to Hadoop. The volume-level encryption provides protection against physical security but lacks a fine-grained approach.
Often, you want to encrypt only selected files/directories in HDFS to save on overhead and protect performance and now this is possible with HDFS Transparent Data Encryption (TDE). HDFS TDE allows users to take advantage of HDFS native data encryption without any application code changes.
Once an HDFS admin sets up encryption, HDFS takes care of the actual encryption/decryption without the end-user having to manually encrypt/decrypt a file.
The building blocks of this solution are:
- Encryption Zone: An HDFS admin creates an encryption zone and links it to an empty HDFS directory and an encryption key. Any files put in the directory are automatically encrypted by HDFS.
- Key Management Server (KMS): KMS is responsible for storing encryption key. KMS provides a REST API and access control on keys stored in the KMS.
- Key Provider API: The Key Provider API is the glue used by HDFS Name Node and Client to connect with the Key Management Server.
This guide covers:
- Configuring the Key Management Server
- Creating Encryption Zones
- Reading/Writing Data in Encrypted File System
This technical preview takes advantage of the the HDP 2.2 Sandbox, and it is recommended that you do the same when you use this guide. If you have deployed an HDP 2.2 cluster that is Kerberized or non-Kerberized, this guide has additional sections that cover those deployment options as well.
Configure the Key Management Service (KMS)
Extract the Key Management Server bits from the package included in Apache Hadoop
# mkdir -p /usr/kms-demo # cp /usr/hdp/current/hadoop-client/mapreduce.tar.gz /usr/kms-demo/ # export KMS_ROOT=/usr/kms-demo
Where KMS_ROOT refers to the directory where mapreduce.tar.gz has been extracted (/usr/kms-demo)
# cd $KMS_ROOT # tar -xvf mapreduce.tar.gz
Start the Key Management Server
Appendix A covers advanced configuration of the Key Management Server. The following basic scenario uses the default configurations:
# cd $KMS_ROOT/hadoop/sbin/ # ./kms.sh run
You’ll see the following console output on a successful start:
Jan 10, 2015 11:07:33 PM org.apache.catalina.startup.Catalina start INFO: Server startup in 1764 ms
Configure Hadoop to use the KMS as the key provider
Hadoop configuration can be managed through either Ambari or through manipulating the XML configuration files. Both options are shown here.
Configure Hadoop to use KMS using Ambari
You can use Ambari to configure this in the HDFS configuration section.
Login in to Ambari through your web browser (admin/admin):
On the Ambari Dashboard, click HDFS service and then the “Configs” tab.
Add the following custom properties for HADOOP key management and HDFS encryption zone feature to find the right KMS key provider:
- Custom core-site
Add property “hadoop.security.key.provider.path” with value “kms://http@localhost:16000/kms”
Note: Make sure to match the host of the node where you started KMS to the value in kms://http@localhost:16000/kms
- Custom hdfs-site
Add the property “dfs.encryption.key.provider.uri” with the value “kms://http@localhost:16000/kms”
Make sure to match the host of the node where you started KMS to the value in kms://http@localhost:16000/kms
Save the configuration and restart HDFS after setting these properties.
Manually Configure Hadoop to use KMS using the XML configuration files
You can manually edit the site files in this section, if you are not using Ambari.
First edit your hdfs-site.xml file:
# cd /etc/hadoop/conf # vi hdfs-site.xml
Add following entry in the hdfs-site.xml:
<property> <name>dfs.encryption.key.provider.uri</name> <value>kms://http@localhost:16000/kms</value> |</property>
And edit the core-site.xml file as well.
# cd /etc/hadoop/conf # vi core-site.xml
Add following entry in the core-site.xml
<property> <name>hadoop.security.key.provider.path</name> <value>kms://http@localhost:16000/kms</value> </property>
Create Encryption Keys
Log into the Sandbox as the hdfs superuser. Run the following commands to create a key named “key1″ with length of 256 and show the result:
# su - hdfs # hadoop key create key1 -size 256 # hadoop key list -metadata
As an Admin, Create an Encryption Zone in HDFS
Run the following commands to create an encryption zone under /secureweblogs with zone key named “key1” and show the results:
# hdfs dfs -mkdir /secureweblogs # hdfs crypto -createZone -keyName key1 -path /secureweblogs # hdfs crypto -listZones
Note: Crypto command requires the HDFS superuser privilege
As HDFS User, Reading and Writing Files From/To an Encryption Zone in HDFS
HDFS file encryption/decryption is transparent to its client. Users and applications can read/write files from/to an encryption zone as long they have the permission to access it.
As an example, the ‘/secureweblogs’ directory in HDFS has been set up to be only read/write accessible by the ‘hive’ user:
# hdfs dfs -ls / … drwxr-x--- - hive hive 0 2015-01-11 23:12 /secureweblogs
The same directory ‘/secureweblogs’ is an encrypted zone in HDFS, you can verify that with HDFS superuser privilege.
# hdfs crypto -listZones /secureweblogs key1
As the ‘hive’ user, you can transparently write data to that directory.
[hive@sandbox ~]# hdfs dfs -copyFromLocal web.log /secureweblogs [hive@sandbox ~]# hdfs dfs -ls /secureweblogs
Found 1 items
-rw-r--r-- 1 hive hive 1310 2015-01-11 23:28 /secureweblogs/web.log
As the ‘hive’ user, you can transparently read data from that directory, and verify that the exact file that was loaded into HDFS is readable in its unencrypted form.
[hive@sandbox ~]# hdfs dfs -copyToLocal /secureweblogs/web.log read.log [hive@sandbox ~]# diff web.log read.log
Other users will not be able to write data or read from the encrypted zone:
[root@sandbox ~]# hdfs dfs -copyFromLocal install.log /secureweblogs copyFromLocal: Permission denied: user=root, access=EXECUTE, inode="/secureweblogs":hive:hive:drwxr-x--- [root@sandbox ~]# hdfs dfs -copyToLocal /secureweblogs/web.log read.log copyToLocal: Permission denied: user=root, access=EXECUTE, inode="/secureweblogs":hive:hive:drwxr-x---
Appendices
A: HDFS TDE in a multi-node Hadoop Cluster
Extract Key Management bits to a node of your cluster, and make sure to use the FQDN of host of that node in your HDFS configuration for hadoop.security.key.provider.path & dfs.encryption.key.provider.uri.
B: HDFS TDE in a Kerberos Enabled Cluster
Step 1: Enable Kerberos for the Hadoop Cluster and validate that it is working
Step 2: Configure KMS to use Kerberos by adding the following configuration in $KMS_ROOT/hadoop/etc/hadoop/kms-site.xml :
<property> <name>hadoop.kms.authentication.type</name> <value>kerberos</value> <description> Authentication type for the KMS. Can be either "simple" or "kerberos".</description> </property>
<property> <name>hadoop.kms.authentication.kerberos.keytab</name> <value>/etc/security/keytabs/spnego.service.keytab</value> <description> Path to the keytab with credentials for the configured Kerberos principal.</description> </property>
<property> <name>hadoop.kms.authentication.kerberos.principal</name> <value>HTTP/FQDN for KMS host@YOUR HADOOP REALM</value> <description> The Kerberos principal to use for the HTTP endpoint. The principal must start with 'HTTP/' as per the Kerberos HTTP SPNEGO specification.</description> </property>
The value for hadoop.kms.authentication.kerberos.principal must be relevant for your environment. To get FQDN of your KMS host you can use the output of “hostname -f”
Step 3: Start KMS –
./hadoop/sbin/kms.sh run
C: Accessing Raw Bytes of an Encrypted File
HDFS provides access to the raw encrypted files. This enables the admin to move encrypted data.
There is a hidden namespace added under /.reserved/raw for distcp to access raw encrypted files to avoid unnecessary decrypt/encrypt overhead when copying encrypted files between clusters. It is accessible only for HDFS super user.
hdfs dfs -cat /.reserved/raw/zone1/localfile.dat hdfs dfs -cat /.reserved/raw/secureweblogs/web.log
D: Error Creating Key when KeyProvider is not configured
hadoop key create key1 -size 256 “There are no valid KeyProviders configured. No key was created. You can use the -provider option to specify a provider to use.”
This error message will appear if you haven’t configured the 2 KMS related properties or have not restarted HDFS after this configuration.
E: Error putting file in Encrypted zone with Key Size of 256
An AES key of size 256 required unlimited strength JCE
F: Error Creating Key when KMS is not running
hadoop key create key2 -size 128 key2 has not been created. Connection refused java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
Solution: Start KMS server
cd $KMS_ROOT/hadoop/sbin ./kms.sh run
G: Error Starting KMS Server – 1
java.lang.ClassNotFoundException: javax.management.modelmbean.ModelMBeanNotificationBroadcaster not found
Make sure to set the JAVA_HOME to where JDK is on the node.
For example, export JAVA_HOME=/usr/jdk64/jdk1.7.0_67/
H: Error starting KSM Server – 2
SEVERE: Exception looking up UserDatabase under key UserDatabase javax.naming.NamingException: /usr/kms-demo/hadoop/share/hadoop/kms/tomcat/conf/tomcat-users.xml (Permission denied)
Make sure to run the KMS as the user who as access to everything under $KMS_ROOT, such as the root user.
I: Change the default password for KMS Keystore
By default, KMS uses JCEKS and stores the keys in $USER_HOME/kms.keystore. The default config also does not use a password for this keystone. This is obviously not secure and should not be used in a production environment. We recommend that you set password for this file and configure KMS to use the password protected keystore.
Other Issues
Please post to Hortonworks Security Forum if you need help.
The post HDFS Transparent Data Encryption appeared first on Hortonworks.