Quantcast
Channel: Hortonworks » Knowledgebase
Viewing all articles
Browse latest Browse all 31

HDFS Transparent Data Encryption

$
0
0

Many HDP users are increasing their focus on security within Hadoop and are looking for ways to encrypt their data.  Fortunately, Hadoop  provides  several options for encrypting data at rest. At the lowest level of encryption,  there is volume encryption that can encrypt all the data on a node and doesn’t require any changes to Hadoop. The volume-level encryption provides protection against physical security but lacks a fine-grained approach.

Often, you  want to encrypt only selected files/directories in HDFS to save on overhead and protect performance and now this is possible with HDFS Transparent Data Encryption (TDE). HDFS TDE allows users to take advantage of HDFS native data encryption without any application code changes.

Once an  HDFS admin sets up encryption, HDFS takes care of the  actual encryption/decryption without the end-user having to manually encrypt/decrypt a file.

The building blocks of this solution are:

  1. Encryption Zone: An HDFS admin creates an encryption zone and links it to an empty HDFS directory and an encryption key. Any files put in the directory are automatically encrypted by HDFS.
  2. Key Management Server (KMS): KMS is responsible for storing encryption key. KMS provides a REST API and access control on keys stored in the KMS.
  3. Key Provider API: The Key Provider API is the glue used by HDFS Name Node and Client to connect with the Key Management Server.

This guide covers:

  • Configuring the Key Management Server
  • Creating Encryption Zones
  • Reading/Writing Data in Encrypted File System

This technical preview takes advantage of the the HDP 2.2 Sandbox, and it is recommended that you do the same when you use this guide. If you have deployed an HDP 2.2 cluster that is Kerberized or non-Kerberized, this guide has additional sections that cover those deployment options as well.

Configure the Key Management Service (KMS)

Extract the Key Management Server bits from the package included in Apache Hadoop

# mkdir -p /usr/kms-demo
# cp /usr/hdp/current/hadoop-client/mapreduce.tar.gz /usr/kms-demo/
# export KMS_ROOT=/usr/kms-demo

Where KMS_ROOT refers to the directory where mapreduce.tar.gz has been extracted (/usr/kms-demo)

# cd $KMS_ROOT
# tar -xvf mapreduce.tar.gz

Start the Key Management Server

Appendix A covers advanced configuration of the Key Management Server. The following basic scenario uses the default configurations:

# cd $KMS_ROOT/hadoop/sbin/
# ./kms.sh run

You’ll see the following console output on a successful start: 

Jan 10, 2015 11:07:33 PM org.apache.catalina.startup.Catalina start
INFO: Server startup in 1764 ms

Configure Hadoop to use the KMS as the key provider

Hadoop configuration can be managed through either Ambari or through manipulating the XML configuration files. Both options are shown here.

Configure Hadoop to use KMS using Ambari

You can use Ambari to configure this in the HDFS configuration section.

Login in to Ambari through your web browser (admin/admin):

2015-01-28_17-26-562015-01-28_17-31-41_B

 

On the Ambari Dashboard, click HDFS service and then the “Configs” tab.

 

Screen Shot 2015-01-29 at 7.37.28 AMAdd the following custom properties for HADOOP key management and HDFS encryption zone feature to find the right KMS key provider:

  •       Custom core-site

Add property “hadoop.security.key.provider.path”  with value “kms://http@localhost:16000/kms” 

Screen Shot 2015-01-29 at 7.44_cropped

Note: Make sure to match the host of the node where you started KMS to the value in kms://http@localhost:16000/kms

  •       Custom hdfs-site

Add the property “dfs.encryption.key.provider.uri” with the value “kms://http@localhost:16000/kms”

Screen Shot 2015-01-29 at 12.09.15 PM

Make sure to match the host of the node where you started KMS to the value in kms://http@localhost:16000/kms

Save the configuration and restart HDFS after setting these properties.

Manually Configure Hadoop to use KMS using the XML configuration files

You can manually edit the site files in this section, if you are not using Ambari.

First edit your hdfs-site.xml file:

# cd /etc/hadoop/conf
# vi hdfs-site.xml

Add following entry in the hdfs-site.xml:

<property>
     <name>dfs.encryption.key.provider.uri</name>
     <value>kms://http@localhost:16000/kms</value>
|</property>

And edit the core-site.xml file as well.  

# cd /etc/hadoop/conf
# vi core-site.xml

Add following entry in the core-site.xml

<property>
      <name>hadoop.security.key.provider.path</name>
      <value>kms://http@localhost:16000/kms</value>
</property>

Create Encryption Keys

Log into the Sandbox as the hdfs superuser. Run the following commands to create a key named “key1″ with length of 256 and show the result:

# su - hdfs
# hadoop key create key1  -size 256
# hadoop key list -metadata

As an Admin, Create an Encryption Zone in HDFS

Run the following commands to create an encryption zone under /secureweblogs with zone key named “key1” and show the results: 

# hdfs dfs -mkdir /secureweblogs
# hdfs crypto -createZone -keyName key1 -path /secureweblogs
# hdfs crypto -listZones

Note: Crypto command requires the HDFS superuser privilege

As HDFS User, Reading and Writing Files From/To an Encryption Zone in HDFS

HDFS file encryption/decryption is transparent to its client. Users and applications can read/write files from/to an encryption zone as long they have the permission to access it. 

As an example, the ‘/secureweblogs’ directory in HDFS has been set up to be only read/write accessible by the ‘hive’ user:

# hdfs dfs -ls /
 …
drwxr-x---   - hive   hive            0 2015-01-11 23:12 /secureweblogs

The same directory ‘/secureweblogs’ is an encrypted zone in HDFS, you can verify that with HDFS superuser privilege.

 # hdfs crypto -listZones
/secureweblogs  key1

 As the ‘hive’ user, you can transparently write data to that directory.

[hive@sandbox ~]# hdfs dfs -copyFromLocal web.log /secureweblogs
[hive@sandbox ~]# hdfs dfs -ls /secureweblogs

Found 1 items

-rw-r--r--   1 hive hive       1310 2015-01-11 23:28 /secureweblogs/web.log

 As the ‘hive’ user, you can transparently read data from that directory, and verify that the exact file that was loaded into HDFS is readable in its unencrypted form.

[hive@sandbox ~]# hdfs dfs -copyToLocal /secureweblogs/web.log read.log
[hive@sandbox ~]# diff web.log read.log

 Other users will not be able to write data or read from the encrypted zone:

[root@sandbox ~]# hdfs dfs -copyFromLocal install.log /secureweblogs
copyFromLocal: Permission denied: user=root, access=EXECUTE, inode="/secureweblogs":hive:hive:drwxr-x---
[root@sandbox ~]# hdfs dfs -copyToLocal /secureweblogs/web.log read.log
copyToLocal: Permission denied: user=root, access=EXECUTE, inode="/secureweblogs":hive:hive:drwxr-x---

Appendices

A: HDFS TDE in a multi-node Hadoop Cluster

Extract Key Management bits to a node of your cluster, and make sure to use the FQDN of host of that node in your HDFS configuration for hadoop.security.key.provider.path & dfs.encryption.key.provider.uri.

B: HDFS TDE in a Kerberos Enabled Cluster

Step 1: Enable Kerberos for the Hadoop Cluster and validate that it is working

Step 2: Configure KMS to use Kerberos by adding the following configuration in $KMS_ROOT/hadoop/etc/hadoop/kms-site.xml :

<property>
     <name>hadoop.kms.authentication.type</name>
     <value>kerberos</value>
     <description> Authentication type for the KMS. Can be either &quot;simple&quot; or &quot;kerberos&quot;.</description>
</property>
<property>
     <name>hadoop.kms.authentication.kerberos.keytab</name>
     <value>/etc/security/keytabs/spnego.service.keytab</value>
     <description> Path to the keytab with credentials for the configured Kerberos principal.</description>
</property>
<property>
     <name>hadoop.kms.authentication.kerberos.principal</name>
     <value>HTTP/FQDN for KMS host@YOUR HADOOP REALM</value>
     <description> The Kerberos principal to use for the HTTP endpoint. The principal must start with 'HTTP/' as per the Kerberos HTTP SPNEGO specification.</description>
</property>

The value for hadoop.kms.authentication.kerberos.principal must be relevant for your environment. To get FQDN of your KMS host you can use the output of “hostname -f”

Step 3: Start KMS –

./hadoop/sbin/kms.sh run

C: Accessing Raw Bytes of an Encrypted File

HDFS provides access to the raw encrypted files. This enables the admin to move encrypted data.

There is a hidden namespace added under /.reserved/raw for distcp to access raw encrypted files to avoid unnecessary decrypt/encrypt overhead when copying encrypted files between clusters. It is accessible only for HDFS super user.

hdfs dfs -cat /.reserved/raw/zone1/localfile.dat
hdfs dfs -cat /.reserved/raw/secureweblogs/web.log

D: Error Creating Key when KeyProvider is not configured

hadoop key create key1 -size 256
“There are no valid KeyProviders configured. No key was created. You can use the -provider option to specify a provider to use.”

This error message will appear if you haven’t configured the 2 KMS related properties or have not restarted HDFS after this configuration.

E: Error putting file in Encrypted zone with Key Size of 256

An AES key of size 256 required unlimited strength JCE

F: Error Creating Key when KMS is not running

hadoop key create key2  -size 128
key2 has not been created. Connection refused
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)

Solution: Start KMS server

cd $KMS_ROOT/hadoop/sbin
./kms.sh run

G: Error Starting KMS Server – 1

java.lang.ClassNotFoundException: javax.management.modelmbean.ModelMBeanNotificationBroadcaster not found

Make sure to set the JAVA_HOME to where JDK is on the node.

For example, export JAVA_HOME=/usr/jdk64/jdk1.7.0_67/

H: Error starting KSM Server – 2

SEVERE: Exception looking up UserDatabase under key UserDatabase

javax.naming.NamingException: /usr/kms-demo/hadoop/share/hadoop/kms/tomcat/conf/tomcat-users.xml (Permission denied)

Make sure to run the KMS as the user who as access to everything under  $KMS_ROOT, such as the root user.

I: Change the default password for KMS Keystore

 By default, KMS uses JCEKS and stores the keys in $USER_HOME/kms.keystore. The default config also does not use a password for this keystone. This is obviously not secure and should not be used in a production environment. We recommend that you set password for this file and configure KMS to use the password protected keystore.

Other Issues

Please post to Hortonworks Security Forum if you need help.

The post HDFS Transparent Data Encryption appeared first on Hortonworks.


Viewing all articles
Browse latest Browse all 31

Trending Articles