This document discusses data encryption in Hadoop. It describes two common cases for encrypting data: using a Crypto API to encrypt/decrypt with an AES key stored in a keystore, and encrypting MapReduce outputs using a CryptoContext. It also covers the Hadoop Encryption Framework APIs, HBase encryption via HBASE-7544, and related JIRAs around Hive and Pig encryption. Key management tools like keytool and potential future improvements like Knox gateway integration are also mentioned.
4. WITHOUT KERBEROS
• Authorization
Ensuring the user can only do things that they are allowed to do
• Yes: Owner/Group Permission
• Authentication
Ensuring the user is who they claim to be
• NO
10. HADOOP GATEWAY - NOW
• Webhdfs
• Rest: curl "http://GATEWAYHOST/webhdfs/v1/PATH?[user.name=USER&]op=…”
• Hadoop: hadoop fs -fs webhdfs://GATEWAYHOST:14000 -cat FILe_PATH
• Oozie
• REST API , supports direct submission of MapReduce, Pig, and Hive jobs
• Steps
• Use webhdfs to upload your files and jars
• create an oozie workflow
• Hbase
• Hbase Stargate Rest Gateway
• Hbase Thrift server
11. HADOOP GATEWAY - FUTURE
• Apache Knox Gateway
Provides a single point of authentication and access for Apache™ Hadoop® services in
a cluster
12. HADOOP GATEWAY - FUTURE
• Apache Knox Gateway
• Integrate with the existing frameworks for Active Directory /LDAP
• Shell and Rest Interface support
• Currently working on kerberized cluster support
13.
14. HADOOP DATA ENCRYPTION
• Disk Encryption
• Partition Encryption dm-crypt
• File System Encryption
• Folder Encryption encryptfs
• Hadoop Encryption Framework
• Just encrypt what it should be
16. HADOOP ENCRYPTION FRAMEWORK - MR
File Map File Reduce
HDFS
HDFS
File
Encryption/Decryption All the Path(Stages)
17. JIRAS
• hadoop-9331: Hadoop crypto codec framework and crypto codec implementations
• hadoop-9332: Crypto codec implementations for AES
• hadoop-9333: Hadoop crypto codec framework based on compression codec
• mapreduce-5025: Key Distribution and Management for supporting crypto codec in
Map Reduce
• hbase-7544: Transparent table/CF encryption
18. Brief
• Two Crypto Typical Case in Hadoop
• Crypto API Case: Using AES Key (Store in KeyStore) to Encrypt/Decrypt Data
• MR CryptoContext Case: Encrypt the MR output
• Tool – Distcrypto
• Hbase Encryption
• Other Related JIRAs and Security Key Store(Manager)
• TODOs
19. KEY STORE TOOL - KEYTOOL
A key and certificate management utility.
• Create & Store an AES key
• keytool -keystore /tmp/hbase.jks -storetype jceks -storepass 123456 -genseckey -
keyalg AES -keysize 256 -alias hbase
• Create & Store an RSA Private Key
• keytool -genkey -keyalg RSA -keysize 2048 -storetype jceks -storepass 123456 -
keystore privateKeyStore.jks -alias testPrivate
• Export Certificate from KeyStore to a cert file
• keytool -export -keystore privateKeyStore.jks -storetype jceks -storepass 123456 -
alias testPrivate -file publicKey.crt
• Import a cert file to a KeyStore
• keytool -import -trustcacerts -file publicKey.crt -storetype jceks -storepass 123456 -
alias testPublic -keystore publicKeyStore.jks
21. CRYPTO API CASE: USING AES KEY (STORE IN
KEYSTORE) TO ENCRYPT/DECRYPT DATA
Use Crypto API to retrieve AES secret key from a key store file and use the key to
encrypt/decrypt data
• KeyProvider
• CryptoContext
• CryptoCodec
• Sample Code
23. CryptoContext
• To store key related info
• Key Attributes
• Raw Key Data
• Key Type: SYMMETRIC_KEY, PUBLIC_KEY, PRIVATE_KEY, CERTIFICATE
• Cryptographic Algorithm: e.g AES
• Cryptographic Length
24. CryptoCodec
• A wrap, contain CryptoContext and provide Crypto IO Stream
• Major member
• CryptoContext
• Crypto IO Stream Method
• createOutputStream(……)
• createInputStream(……)
30. CryptoContextProvider
Provide several static helper methods to update Crypto related Job Configurations. For
example, store the following Parameters and Secrets to the Job Credential in the secret key
list
• mapred.[[[STAGE]]].crypto.context.provider.parameters
• mapred.[[[STAGE]]].crypto.context.secrets
[[[STAGE]]]: input, output, map.output
AbstractCryptoContextProvider
FileMatchCryptoContextProvider
KeyProviderCryptoContextProvider
Credentials credentials = jobConf.getCredentials();
credentials.addSecretKey(new Text("mapred.map.output.crypto.context.provider.parameters"), parameters);
credentials.addSecretKey(new Text("mapred.map.output.crypto.context.secrets"), secrets);
32. FileMatchCryptoContextProvider
Provides the ability to select the appropriate CryptoContext according to the file path
FileMatches fileMatches = new FileMatches(KeyContext.derive("12345678"));
fileMatches.addMatch("^.*/input1.intel_aes$", KeyContext.derive("1234"));
fileMatches.addMatch("^.*/input2.intel_aes$", KeyContext.derive("5678"));
FileMatchCryptoContextProvider.setInputCryptoContextProvider(jobConf,
fileMatches, null);
33. KeyProviderCryptoContextProvider
Not only include the capability of FileMatchCryptoContextProvider also provide the ability to
retrieve the Key from Key Store
FileMatches fileMatches = new FileMatches(KeyContext.refer("KEY00",
Key.KeyType.SYMMETRIC_KEY, "AES", 128));
String keyStoreFile = "file:///" + KEYSTORE_HOME + "/mr.jks";
String keyStorePassword = "12345678";
KeyProviderConfig keyProviderConfig =
KeyProviderCryptoContextProvider.getKeyStoreKeyProviderConfig(
keyStoreFile, "JCEKS", keyStorePassword, null, true);
KeyProviderCryptoContextProvider.setInputCryptoContextProvider(jobConf, fileMatches,
true, keyProviderConfig);
34. SAMPLE CODE - ENCRYPT THE MR OUTPUT
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
JobConf jobConf = (JobConf)job.getConfiguration();
35. SAMPLE CODE - ENCRYPT THE MR OUTPUT
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
JobConf jobConf = (JobConf)job.getConfiguration();
FileOutputFormat.setOutputCompressorClass(job, AESCodec.class);
jobConf.set(AESCodec.CRYPTO_COMPRESSOR,
org.apache.hadoop.io.compress.SnappyCodec);
36. SAMPLE CODE - ENCRYPT THE MR OUTPUT
- Conti
FileMatches fileMatches = new FileMatches(KeyContext.refer("KEY00",
Key.KeyType.SYMMETRIC_KEY, "AES", 256));
40. MORE IN KeyProviderCryptoContextProvider
• Using asymmetric key (RSA) to protect Parameters & Secrets
CredentialProtection credentialProtection = new CredentialProtection(jobConf,
RSACredentialProtectionCodec.class,
encryptionKeyProviderConfig, encryptionKeyName,
decryptionKeyProviderConfig, decryptionKeyName);
KeyProviderCryptoContextProvider.setInputCryptoContextProvider(
jobConf,
fileMatches,
false,
keyProviderConfig,
credentialProtection);
41. MORE IN KeyProviderCryptoContextProvider - Conti
• How to use Customized KeyProvider in KeyProviderCryptoContextProvider
String keyProviderParameters = KeyStoreKeyProvider.getKeyStoreParameterString(
keyStoreFile, keyStoreType,
keyStorePassword,
keyStorePasswordFile,
sharedPassword);
KeyProviderConfig keyProviderConfig = new KeyProviderConfig(
CustomizeKeyStoreKeyProvider.class.getName(),
keyProviderParameters);
43. TOOL – DISTCRYPTO - conti
• Source Definition File (XML format)
• src
• path
• format:
• raw
• Sequence
• the full class name of a class which implement CryptoHandler for
customized format.
• includeFilter & excludeFilter
• stripSuffix & appendSuffix
• keyClassName & valueClassName.
44. TOOL – DISTCRYPTO - conti
• Encryption Sample
• command
• hadoop distcrypto -op encrypt -ek
21EF7D7487F69A19E552C1274A9FCAC721EF7D7487F69A19E552C1274A9F
CAC7 -log /tmp/log.distcrypto.encrypt -src file:///working/crypto_encrypt.xml
• Source Definition File (crypto_encrypt.xml)
• TODO: Not support retrieve keys from key store --- Not Good
<configuration><src>
<path>/tmp/install.log</path>
<format>raw</format>
<appendSuffix>.encrypted</appendSuffix>
</src></configuration>
46. HBASE ENCRYPTION – HBASE-7544
• Introduce transparent encryption of HBase on disk data.
• Transparent encryption at the CF level
• Two-tier key architecture for consistency with best practices for this feature in the
RDBMS world
• Flexible and non-intrusive key rotation
48. HBASE ENCRYPTION – HBASE-7544
HFile
Block0
……
Block N
Meta Block0
……
Meta Block N
File Info
Data Block Index
Mwta Block Index
Fixed File Trailer
Key block data
format
1 byte ordinal
4 bytes key data length
encrypted key
data
Encryption
KeyBlock
Offset
49. HBASE-7544 SETTINGS
1. Set up the keystore with a secret key
Create a secret key of appropriate length for AES.
$ keytool -keystore /path/to/hbase/conf/hbase.jks
-storetype jceks -storepass password
-genseckey -keyalg AES -keysize 256
-alias ${USER}
Press RETURN to store the key with the same password as the store
50. HBASE-7544 SETTINGS
2. Configure HBase to use the keystore
Add this to the hbase-site.xml file:
<property>
<name>hbase.crypto.keyprovider</name>
<value>org.apache.hadoop.io.crypto.KeyStoreKeyProvider</value>
</property>
<property>
<name>hbase.crypto.keyprovider.parameters</name>
<value><![CDATA[keyStoreUrl=file:///path/to/hbase/conf/
hbase.jks&keyStoreType=JCEKS&password=password]]></value>
</property>
52. HBASE-7544
• CF key rotation
• CF key is changed by modifying the column descriptor via
HBaseAdmin.
• Then, major compaction is triggered either on the table at once or region by
region.
• Performance
• Using this AES-NI codec, HFile read and write code paths introduces an overhead
roughly on par with GZIP compression for reads, and half that as for writes.
53. OTHER RELATED JIRAS
• MAPREDUCE-4491: Encryption and Key Protection
• 4550: Key Protection : Define Encryption and Key Protection interfaces and default
implementation
• 4551: Key Protection : Add ability to read keys and protect keys in JobClient and
TTS/NodeManagers
• 4552: Encryption: Add support for PGP Encryption
• 4553: Key Protection : Implement KeyProvider to read key from a WebService Based
KeyStore
• 5025: Key Distribution and Management for supporting crypto codec in Map Reduce
54. SECURITY WEB KEYSTORE SERVER
safe (http://benoyantony.github.com/safe/)
Web service based keystore
Support ACL Per Key
Authenticates the user using SPNego
Base on Cloudera Alfredo, a Java library consisting of a client and a server components
to enable Kerberos SPNEGO authentication for HTTP.
WEB Server
(safe(alfredo))
KDC user
authorization
authentication
MR/Hbase +
WebStoreKeyProvider
55. OTHER TODOs
• Hive support
• https://issues.apache.org/jira/browse/HIVE-5207
• Support data encryption for Hive tables
• https://issues.apache.org/jira/browse/HIVE-4227
• Add column level encryption to ORC files (Created: 25/Mar/13 17:14)
• Pig support
• https://issues.apache.org/jira/browse/PIG-3289
• Encryption aware load and store functions