Suche senden
Hochladen
Map r hadoop-security-mar2014 (2)
•
3 gefällt mir
•
2,191 views
MapR Technologies
Folgen
Melden
Teilen
Melden
Teilen
1 von 37
Jetzt herunterladen
Downloaden Sie, um offline zu lesen
Empfohlen
Securing Hadoop - MapR Technologies
Securing Hadoop - MapR Technologies
MapR Technologies
Kerberos, Token and Hadoop
Kerberos, Token and Hadoop
Kai Zheng
Hadoop security
Hadoop security
shrey mehrotra
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
Adam Muise
[2A5]하둡 보안 어떻게 해야 할까
[2A5]하둡 보안 어떻게 해야 할까
NAVER D2
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
Steve Loughran
Kerberos Survival Guide - St. Louis Day of .Net
Kerberos Survival Guide - St. Louis Day of .Net
J.D. Wade
Kerberos and its application in cross realm operations
Kerberos and its application in cross realm operations
Arunangshu Bhakta
Weitere ähnliche Inhalte
Was ist angesagt?
Kafka Security
Kafka Security
Sriharsha Chintalapani
cisco 7200
cisco 7200
Rizal Hikmawan
Improving HDFS Availability with Hadoop RPC Quality of Service
Improving HDFS Availability with Hadoop RPC Quality of Service
Ming Ma
Охота на уязвимости Hadoop
Охота на уязвимости Hadoop
Positive Hack Days
Google Compute and MapR
Google Compute and MapR
MapR Technologies
IBM Informix dynamic server and websphere MQ integration
IBM Informix dynamic server and websphere MQ integration
Keshav Murthy
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks
Apache Kafka Security
Apache Kafka Security
DataWorks Summit/Hadoop Summit
Seasonal Burst Handling Using Hybrid Cloud Infrastructure from Cloud Security...
Seasonal Burst Handling Using Hybrid Cloud Infrastructure from Cloud Security...
CA API Management
Deploying Docker Containers at Scale with Mesos and Marathon
Deploying Docker Containers at Scale with Mesos and Marathon
Discover Pinterest
Hadoop Security Preview
Hadoop Security Preview
Hadoop User Group
Crawl
Crawl
madhurikad13
03 h base-2-installation_andshell
03 h base-2-installation_andshell
dntth0601
Session 49 - Semantic metadata management practical
Session 49 - Semantic metadata management practical
ISSGC Summer School
My cool new Slideshow!
My cool new Slideshow!
Parag Gajbhiye
Was ist angesagt?
(15)
Kafka Security
Kafka Security
cisco 7200
cisco 7200
Improving HDFS Availability with Hadoop RPC Quality of Service
Improving HDFS Availability with Hadoop RPC Quality of Service
Охота на уязвимости Hadoop
Охота на уязвимости Hadoop
Google Compute and MapR
Google Compute and MapR
IBM Informix dynamic server and websphere MQ integration
IBM Informix dynamic server and websphere MQ integration
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Apache Kafka Security
Apache Kafka Security
Seasonal Burst Handling Using Hybrid Cloud Infrastructure from Cloud Security...
Seasonal Burst Handling Using Hybrid Cloud Infrastructure from Cloud Security...
Deploying Docker Containers at Scale with Mesos and Marathon
Deploying Docker Containers at Scale with Mesos and Marathon
Hadoop Security Preview
Hadoop Security Preview
Crawl
Crawl
03 h base-2-installation_andshell
03 h base-2-installation_andshell
Session 49 - Semantic metadata management practical
Session 49 - Semantic metadata management practical
My cool new Slideshow!
My cool new Slideshow!
Andere mochten auch
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
Edgar Alejandro Villegas
Big Data Journey
Big Data Journey
Tugdual Grall
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Mathieu Dumoulin
Why Elastic? @ 50th Vinitaly 2016
Why Elastic? @ 50th Vinitaly 2016
Christoph Wurm
Elastic v5.0.0 Update uptoalpha3 v0.2 - 김종민
Elastic v5.0.0 Update uptoalpha3 v0.2 - 김종민
NAVER D2
Which data should you move to Hadoop?
Which data should you move to Hadoop?
Attunity
Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...
Zaloni
MapR-DB Elasticsearch Integration
MapR-DB Elasticsearch Integration
MapR Technologies
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in Finance
MapR Technologies
MapR and Cisco Make IT Better
MapR and Cisco Make IT Better
MapR Technologies
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Denodo
Kibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stack
Sylvain Wallez
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR Technologies
Key Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShare
MapR Technologies
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
台灣資料科學年會
Hadoop security
Hadoop security
Shivaji Dutta
An Introduction to Elastic Search.
An Introduction to Elastic Search.
Jurriaan Persyn
Andere mochten auch
(17)
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
Big Data Journey
Big Data Journey
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Why Elastic? @ 50th Vinitaly 2016
Why Elastic? @ 50th Vinitaly 2016
Elastic v5.0.0 Update uptoalpha3 v0.2 - 김종민
Elastic v5.0.0 Update uptoalpha3 v0.2 - 김종민
Which data should you move to Hadoop?
Which data should you move to Hadoop?
Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...
MapR-DB Elasticsearch Integration
MapR-DB Elasticsearch Integration
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in Finance
MapR and Cisco Make IT Better
MapR and Cisco Make IT Better
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Kibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stack
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
Key Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShare
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
Hadoop security
Hadoop security
An Introduction to Elastic Search.
An Introduction to Elastic Search.
Ähnlich wie Map r hadoop-security-mar2014 (2)
Securing Hadoop by Sr. Principal Technologist Keys Botzum
Securing Hadoop by Sr. Principal Technologist Keys Botzum
MapR Technologies
Hadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
Hadoop security
Hadoop security
Kashif Khan
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Yahoo!デベロッパーネットワーク
Hadoop and Data Access Security
Hadoop and Data Access Security
Cloudera, Inc.
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
Cask Data
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
Shravan (Sean) Pabba
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
Cloudera, Inc.
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBay
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBay
Cloudera, Inc.
大数据数据治理及数据安全
大数据数据治理及数据安全
Jianwei Li
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
Clocker - The Docker Cloud Maker
Clocker - The Docker Cloud Maker
Andrew Kennedy
Securing Spark Applications
Securing Spark Applications
DataWorks Summit/Hadoop Summit
20a installation
20a installation
mapr-academy
Clocker: Managing Container Networking and Placement
Clocker: Managing Container Networking and Placement
Docker, Inc.
Dapr- Distributed Application Runtime
Dapr- Distributed Application Runtime
Moaid Hathot
Building Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPA
LDAPCon
BSides SG Practical Red Teaming Workshop
BSides SG Practical Red Teaming Workshop
Ajay Choudhary
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream Processing
Jack Gudenkauf
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Data Con LA
Ähnlich wie Map r hadoop-security-mar2014 (2)
(20)
Securing Hadoop by Sr. Principal Technologist Keys Botzum
Securing Hadoop by Sr. Principal Technologist Keys Botzum
Hadoop Security Architecture
Hadoop Security Architecture
Hadoop security
Hadoop security
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Hadoop and Data Access Security
Hadoop and Data Access Security
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBay
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBay
大数据数据治理及数据安全
大数据数据治理及数据安全
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
Clocker - The Docker Cloud Maker
Clocker - The Docker Cloud Maker
Securing Spark Applications
Securing Spark Applications
20a installation
20a installation
Clocker: Managing Container Networking and Placement
Clocker: Managing Container Networking and Placement
Dapr- Distributed Application Runtime
Dapr- Distributed Application Runtime
Building Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPA
BSides SG Practical Red Teaming Workshop
BSides SG Practical Red Teaming Workshop
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Mehr von MapR Technologies
Converging your data landscape
Converging your data landscape
MapR Technologies
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
MapR Technologies
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
MapR Technologies
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
MapR Technologies
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
MapR Technologies
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
MapR Technologies
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
MapR Technologies
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
MapR Technologies
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
MapR Technologies
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
MapR Technologies
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
MapR Technologies
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
MapR Technologies
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
MapR Technologies
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
MapR Technologies
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
MapR Technologies
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
MapR Technologies
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
MapR Technologies
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
MapR Technologies
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
MapR Technologies
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies
Mehr von MapR Technologies
(20)
Converging your data landscape
Converging your data landscape
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
Map r hadoop-security-mar2014 (2)
1.
© 2014 MapR
Technologies 1© 2014 MapR Technologies
2.
© 2014 MapR
Technologies 2 Agenda • What’s MapR • Why Secure Hadoop • Securing MapR Hadoop • Security beyond the core
3.
© 2014 MapR
Technologies 3 MapR Data Platform Management MapR Data Platform MAPR-DBMAPR-FS APACHE HADOOP AND OSS ECOSYSTEM Hue ...SharkImpalaDrill Hive/ Stinger/ Tez Sqoop Storm SentrySparkSolrCascadingMahoutFlume Oozie HBaseMapReduceYARNPigWhirrZookeeper MapR Data Platform TABLESFILES MapR Data Platform MAPR-DBMAPR-FS Patent Pending Enterprise-grade Security OperationalPerformance • High availability • Data protection • Disaster recovery • Standard file access • Standard database access • Pluggable services • Broad developer support • Enterprise security authorization • Wire-level authentication • Data governance • Ability to support predictive analytics, real-time database operations, and support high arrival rate data • Ability to logically divide a cluster to support different use cases, job types, user groups, and administrators • Ability to deliver 2X to 7X performance • Consistent low latency Multi-tenancyInter-operability MapR Distribution for Hadoop
4.
© 2014 MapR
Technologies 4 The Cloud Leaders Pick MapR Google chose MapR to provide Hadoop on Google Compute Engine Amazon EMR is the largest Hadoop provider in revenue and # of clusters
5.
© 2014 MapR
Technologies 5 Why Secure Hadoop Now? • Historically security wasn’t a high priority – Reflection of the type of data and the type of organizations using Hadoop • Hadoop is now being used by more traditional firms as well as organizations with high security requirements – Highly regulated – Sensitive data sets – People with experience with security in existing enterprise technologies (e.g., databases) are asking for the same in Hadoop • Think for a moment and imagine the value of the data in a Hadoop cluster used as a data lake – Much valuable operational data about your customers, systems, sales, etc.
6.
© 2014 MapR
Technologies 6 Typical Hadoop Deployment Weaknesses • Client operating system is trusted to identify user (weak authentication) – If I can compromise client, I can run jobs or access HDFS as anyone – Think about virtual machines with root access • Hadoop servers trust anyone that can reach them on the network – Could I falsify a data node, job tracker, etc.? • Hive Server runs as ‘system’ user – All Hive Server submitted jobs run as that ‘system’ user • Intruders can see and modify all network traffic
7.
© 2014 MapR
Technologies 7 Agenda • What’s MapR • Why Secure Hadoop • Securing MapR Hadoop • Security beyond the core
8.
© 2014 MapR
Technologies 8 MapR 3.1: Securing MapR Hadoop • Core goals – Authenticate network traffic • Users authenticate • Servers authenticate to each other – Encrypt network traffic – Authorization • Integrate with existing authorization functionality • Enhance MapR Tables authorization with fine grained controls – Low barrier to entry • Low performance overhead • Simple and easy to administer • Support, but do not require Kerberos – Leverage Apache Hadoop functionality
9.
© 2014 MapR
Technologies 9 MapR Native Security • Hadoop security without Kerberos – But borrows heavily from Kerberos design • Kerberos integration if desired
10.
© 2014 MapR
Technologies 10 Architecture • Shared secrets like Kerberos – Managed at cluster level – Two shared keys: cldb key and server key • Identity represented using a ticket which is issued by MapR CLDB servers (Container Location DataBase)
11.
© 2014 MapR
Technologies 11 Tickets • A ticket represents a valid authenticated identity • Contains – An expiration time, renewal lifetime, and creation time – A randomly generated secret key – Information about the identity – userid, group ids • Signed and encrypted when issued by CLDB – CLDB key used for ‘permanent’ server tickets – Server key used for ephemeral tickets issued for users • A client authenticates to trusted servers using the ticket
12.
© 2014 MapR
Technologies 12 User Experience • User invokes maprlogin – maprlogin connects to CLDB (over https) • Provide userid & password (or Kerberos ticket) for validation by CLDB – Ticket is returned, saved in file in /tmp file and accessible only by owning user – file name is /tmp/maprticket_<uid> • MapR PAM module – Optional MapR provided PAM module creates MapR tickets automatically during Unix login • All processes automatically pick up ticket (nothing to do) – Java and C/C++ clients implicitly look for valid ticket and use it – Clients optionally use existing Kerberos identity to get MapR ticket
13.
© 2014 MapR
Technologies 13 Maprlogin • Primary user visible security tool • Actions are – password - authenticate to a MapR cluster using a valid password – kerberos - authenticate to a MapR cluster using Kerberos – print - print information on your existing credentials – authtest - test authentication as a generic client – end / logout - logout of cluster – renew - renew existing ticket • User information is obtained using PAM and Linux pwent APIs – Fully pluggable – MapR can authenticate using any registry that is PAM enabled and gets user information via Unix APIs which are NSSwitch controlled • Basically, if it works with Linux authentication, it should work with MapR
14.
© 2014 MapR
Technologies 14 CLI Example $ hadoop fs -ls / Bad connection to FS. command aborted. exception: failure to login: Unable to obtain MapR credentials $ maprlogin password [Password for user 'fred' at cluster 'my.cluster.com': ] MapR credentials of user 'fred' for cluster 'my.cluster.com' are written to ‘/tmp/maprticket_1001' $ hadoop fs -ls / Found 3 items -rwxr-xr-x 3 mapr mapr 0 2013-12-10 13:25 /hbase drwxr-xr-x - mapr mapr 1 2013-12-10 13:25 /user drwxr-xr-x - mapr mapr 1 2013-12-10 13:25 /var
15.
© 2014 MapR
Technologies 15 Maprlogin – Under the Covers maprlogin MapR CLDB 1. username/passwd sent on https LDAP/ Kerberos /NIS 2. uses PAM to authenticate 3. ticket + user key returned FileServer/ CLDB 4. ticket + key saved in file in /tmp hadoop fs –ls / 5. cmd picks up ticket + key from file 6. client sends RPC encrypted with user-key + ticket 7. server decrypts ticket to authenticate user and checks permissions on ACL
16.
© 2014 MapR
Technologies 16 Client First Contact • Client sends the ticket and data encrypted using secret key • Receiving server – Extracts and decrypts ticket to obtain secret key – Checks expiration – Uses the secret key to decrypt the data • This proves that the client possesses the key that corresponds to the ticket – Extracts identity information from ticket and uses that for authorization – Returns encrypted response to client • MapR user identity is independent of host or operating system identity
17.
© 2014 MapR
Technologies 17 Server First Contact • When a trusted server starts it uses a local server ticket to authenticate to the CLDB – CLDB verifies the ticket’s authenticity using secret key – CLDB returns the server key that is used to create and validate user tickets – The server is now a trusted member of the cluster
18.
© 2014 MapR
Technologies 18 Component Security • Security between MapR unique components (CLDB, file server, etc.) is handled via changes to the MapR RPC layer • Apache components support pluggable security mechanisms – typically SASL – We are providing a new mechanism called ‘maprsasl’ – maprsasl secures communication following the same techniques as the MapR RPC layer • Existing authorization code simply leverages the securely authenticated identity – File access – Job submission – Queue ACLs – And so on …
19.
© 2014 MapR
Technologies 19 Example: Job Tracker Integration JT can create user tickets. TT copies ticket to private job directory on local disk. taskcontroller copies it to user private local disk dir and tasks set MAPR_TICKET_LOCATION to that place. JobClient JobTracker TaskTracker submit job (maprsasl) schedule job (maprsasl) File system 1. JC copies job conf securely to FS 4. TT launches job using ticket identity 3. TT fetches ticket 2. JT creates user ticket
20.
© 2014 MapR
Technologies 20 Out of the Box Defaults • User experience – Users authenticate using maprlogin and passwords – User ‘mapr’ is admin as always • User must authenticate however, OS identity irrelevant – Operating system identity (on or off cluster) no longer relevant to MapR security • Obviously root user and ‘mapr’ user can read/write /opt/mapr • We’ve also tightened permissions for many directories under /opt/mapr – Web UIs require authentication – MapR CLIs require authentication • hadoop fs/mfs/jar/job/etc • maprcli – Any user can submit jobs, but can only admin their own jobs
21.
© 2014 MapR
Technologies 21 Out of the Box Defaults • Cluster operations – All MapR servers authenticate to each other • Most communication paths encrypted – All nodes share common maprserverticket • Nodes can only join cluster if they have maprserverticket – Self-signed wildcard certificates created for HTTPS traffic • ssl_keystore contains certificate and private key, ssl_truststore contains certificate – We set JVM system property: javax.net.ssl.trustStore • Used by Web UIs, MCS, and maprlogin to CLDB • Uses hostname command to get DNS domain for cluster and put that into certificate
22.
© 2014 MapR
Technologies 22 Cryptography • Encrypted using current NIST standards – AES-256 in GCM mode for encryption and signing • http://en.wikipedia.org/wiki/Galois/Counter_Mode • NIST standard - http://csrc.nist.gov/publications/fips/fips140-2/fips1402annexa.pdf – Leverage Intel hardware encryption where available, software otherwise • Use the open source crypto++ library for our C++ cryptography – http://cryptopp.com • Random number generation – Use secure random number generation as documented here http://www.cryptopp.com/docs/ref/class_auto_seeded_random_pool.ht ml#_details
23.
© 2014 MapR
Technologies 23 Let’s Build a Secure Cluster! Node 1 apt-get install mapr…. configure.sh –C … -Z … -secure –genkeys – Generates all needed keys for MapR-RPC as well as for HTTPS Node N apt-get install mapr…. scp rootORmapr@node1:/opt/mapr/conf/{cldb.key,maprserverticket,ssl_ke ystore,ssl_truststore} /opt/mapr/conf configure.sh –C … -Z … -secure Clients apt-get install mapr… scp anyuser@nodeN:/opt/mapr/conf/ssl_truststore /opt/mapr/conf configure.sh … -secure
24.
© 2014 MapR
Technologies 24 Kerberos • Not required but can use • Kerberos SSO – Explicitly using ‘maprlogin kerberos’ – Implicitly • If no MapR ticket available, client automatically detects and uses Kerberos ticket and uses it to obtain MapR ticket • Kerberos SSO requires only – Kerberos client on CLDB and client machines – Kerberos identity only for CLDB – typically 3-5 CLDBs • No need to manage identities for every node
25.
© 2014 MapR
Technologies 25 Agenda • What’s MapR • Why Secure Hadoop • Securing MapR Hadoop • Security beyond the core
26.
© 2014 MapR
Technologies 26 Hadoop Map Reduce Clients • Many components simply generate Map Reduce jobs. As such they implicitly leverage the security we’ve defined for Map Reduce previously. They are: – Hive (except Hive Server) – Pig – Mahout – Sqoop
27.
© 2014 MapR
Technologies 27 Ecosystem Security • All ecosystem components run securely as well in a secure MapR cluster – Some by default – Some with minor configuration • Most Web UIs enhanced to use userid & password authentication and HTTPS – Can configure Kerberos SPNEGO, same as from Apache
28.
© 2014 MapR
Technologies 28 MapR Ecosystem Security – by Default • By default, out of the box when security enabled – Hive Server 2 supports password authentication • Can configure Kerberos and SSL function, same as from Apache, including secure impersonation – Oozie supports MapR ticket authentication • Can configure Kerberos and SSL function, same as from Apache, including secure impersonation • HBase and Hive MetaServer require Kerberos to be secured • MapR Tables (HBase APIs) use native MapR security, no configuration needed
29.
© 2014 MapR
Technologies 29 MapR Tables Authorization • boolean logic constraints on access to M7 tables – Uses user & group information – Very powerful • ( u:bob | g:admins) • ( g:managers & ! g:restricted) • ( g: managers & g:businessunity) | g:executives – Settable at table, column, and column family level for various actions – Queries silently hide data you are not authorized to see
30.
© 2014 MapR
Technologies 30 MapR Hadoop Advantage • Vastly simpler – Core secured by default in one step – No requirement for Kerberos in core and associated complexity • Easier integration – Leverage existing Linux authentication (PAM and NSSwitch) • Faster – Leverage Intel AES hardware cryptography
31.
© 2014 MapR
Technologies 31 Further Reading • MapR – http://mapr.com • MapR Native Security – http://www.mapr.com/blog/getting-started-mapr-security-0 – http://www.mapr.com/press-release/mapr-technologies-integrates-security- into-hadoop – http://www.mapr.com/products/only-with-mapr/mapr-integrates-security-into- hadoop • Adding Security to Apache Hadoop – http://hortonworks.com/wp-content/uploads/2011/10/security- design_withCover-1.pdf • The Evolution of Hadoop’s Security Model – http://www.infoq.com/articles/HadoopSecurityModel/
32.
© 2014 MapR
Technologies 32 Q&A @mapr maprtech kbotzum@mapr.com Engage with us! MapR maprtech mapr-technologies
33.
© 2014 MapR
Technologies 33© 2014 MapR Technologies Appendix
34.
© 2014 MapR
Technologies 34 Encrypted Shuffle (?) • No need to special case encrypting shuffle • MapR-FS is store for Map output – Shuffle inherits the same encryption, authentication, and authorization functionality of the rest of MapR-FS
35.
© 2014 MapR
Technologies 35 Persistent Keys and Tickets CLDB/Z K 1 K CLDB/Z K N K Node 1 Node 2 Node N … …
36.
© 2014 MapR
Technologies 36 Apache Hadoop Security • Kerberos as core authentication technology – Kerberos to access HDFS, JT, Oozie, etc. – Kerberos for server to server traffic • But Kerberos doesn’t fit perfectly with Hadoop model – Introduce delegation tokens for carrying identity in many scenarios • Kerberos is complicated – Need Kerberos identity for every server in the cluster • Lots to manage! – Every user needs a Kerberos identity to access cluster, Web UIs, etc. – Lots of steps • http://www.cloudera.com/content/cloudera-content/cloudera- docs/CDH4/4.3.0/CDH4-Security-Guide/cdh4sg_topic_3.html
37.
© 2014 MapR
Technologies 37 Key Design Elements • User authentication and authorization information obtained using standard operating system information – PAM and nsswitch • MapR specific shared secret keys – Easier to manage – No dependencies on complex external security systems – Better performance • MapR servers (running as ‘mapr’) have access to maprserverticket and are therefore privileged processes • MapR-RPC altered to encrypt and authenticate traffic • Maprsasl created for Apache Java code to leverage similar security – Leverages same keys, authentication model, etc. – Reuses the C/C++ code via JNI
Jetzt herunterladen