Historically, security hasn't been a high priority in regards to Hadoop (reflection of type of data and organizations using Hadoop), but now Hadoop is being used by more traditional firms with heightened security requirements. MapR's Senior Principal Technologist, Keys Botzum, gives a talk on how you can build a secure cluster.
MapR provides a complete distribution for Apache Hadoop. MapR has integrated, tested and hardened a broad array of packages as part of this distribution Hive, Pig, Oozie, Sqoop, plus additional packages such as Cascading. We have spent over a two year well funded effort to provide deep architectural improvements to create the next generation distribution for Hadoop. MapR has made significant updates combined with a dozen open source packages. Any of the innovations MapR has delivered include 100% compatibility with the Apache Hadoop APIs. This is in stark contrast with the alternative distributions from Cloudera, HortonWorks, Apache which are all equivalent.
MapR has been selected by two of the companies most experienced with MapReduce technology which is a testament to the technology advanges of MapR’s distribution. Amazon through its Elastic MapReduce service (EMR) hosted over 2 million clusters in the past year. Amazon selected MapR to complement EMR as the only commercial Hadoop distribution being offered, sold and supported as a service by Amazon to its customers. MapR was also selected by Google – the pioneer of MapReduce and the company whose white paper on MapReduce inspired the creation of Hadoop – has also selected MapR to make our distribution available on Google Compute Engine. Hadoop in the cloud makes a great deal of sense: the elastic resource allocation that cloud computing is premised on works well for cluster-based data processing infrastructure used on varying analyses and data sets of indeterminate size. MapR has unique features such as mirroring between sites and multi-tenancy support that further enhance cloud deployments
In initial release, server key and cldb key never changes. Server ticket also shared by all servers and does not expire.
Note: this does create a “race condition” in the install process since all nodes but the first have to have configure.sh run after the first. This might be an issue with certain parallel install processes. You can work around this by simply running configure.sh (specifying the domain for the ssl certs as needed) somewhere to create the needed keys and then copying them to all nodes at once.