11. Deployment challenges
● Infrastructure is different everywhere
○ e.g. Each cloud provider has their own API
○ e.g. Each provider has different networking methods
● OS/images are different everywhere
● How to do service discovery?
● How to dynamically scale/manage?
See prior operations workshops
13. Options for Automation
- Many combinations of tools
- e.g. Foreman, Ansible, Chef, Puppet, docker-ambari,
shell scripts, CloudFormation, …
- Provider specific
- Cisco UCS, Teradata, HP, Google’s bdutil, …
- Docker with Cloudbreak
Using Ambari with all of the above!
32. Requirement: a Docker host
● OSX or Windows: http://boot2docker.io/
○ boot2docker init
○ boot2docker up
○ eval "$(boot2docker shellinit)"
○ boot2docker ssh
● Linux: Install the docker daemon
● Anywhere: docker-machine “lets you create Docker hosts on your
computer, on cloud providers, and inside your own data center”
○ Example on Rackspace:
■ docker-machine create --driver rackspace
--rackspace-api-key $OS_PASSWORD
--rackspace-username $OS_USERNAME
--rackspace-region DFW docker-rax
■ docker-machine ssh docker-rax
38. 3. Use your Cluster
Ambari available as expected
To reach your Hadoop hosts:
● SSH to Docker Host
○ Hosts arre listed in “Cloud stack description”
○ ssh cloudbreak@IPofHost
● Shell to the “ambari-agent”
container
○ sudo docker ps | grep ambari-agent
■ note the CONTAINER ID
○ sudo docker -it CONTAINERID bash
● Use the hosts as usual. e.g.:
○ hadoop fs -ls /
65. Rackspace
Cloud Big Data Platform
● Rapidly spin up on-demand HDP clusters
● Integrated with Cloud Files (OpenStack Swift)
● Opt-in for Managed Services by Rackspace
Managed Big Data Platform
● Fully Managed HDP on Dedicated and/or Cloud
● Leverage Fanatical Support and Industry Leading SLA’s
● Supported by Rackspace with escalation to Hortonworks
68. Microsoft Azure
● Deployment
○ Deploy using CloudBreak
○ Deploy using HWX Azure Gallery Image
● Integrated with Azure Blob Storage
● Supported directly by Hortonworks
● Other offerings
○ Microsoft HDInsight
○ HDP Sandbox
69. Azure Deployment Guideline
● All in same Region
● Instance Types
○ Typical: A7
○ Performance: D14
○ 8x1TB Standard LRS x3 Virtual Hard Disk per
server
● Multiple Storage Accounts are recommended
○ Recommend no more than 40 Virtual Hard Disks
per Storage Account
70. Azure Blob Store
Azure Blob Store (Object Storage)
● wasb[s]:
//<containername>@<accountname>.blob.
core.windows.net/<path>
Can be used as a replacement for HDFS
● Thoroughly tested in HDP release test suites
71. Amazon Web Services
● Deploy using CloudBreak
● Integrated with AWS S3 (object storage)
● Supported directly by Hortonworks
72. Amazon Deployment Guideline
● All in same Region/AZ
● Instances with Enhanced
Networking
Master Nodes:
● Choose EBS Optimized
● Boot: 100GB on EBS
● Data: 4+ 1TB on EBS
Worker Nodes:
● Boot: 100GB on EBS
● Data: Instance Storage
○ EBS can be used, but local
is preferred
Instance Types:
● Typical: d2.
● Performance: i2.
https://aws.amazon.com/ec2/instance-types/
73. AWS RDS
● Some services rely on MySQL, Oracle or PostgreSQL:
○ Apache Ambari
○ Apache Hive
○ Apache Oozie
○ Apache Ranger
● Use RDS for these instead of managing yourself.
74. AWS S3 (Object Storage)
● s3n:// with HDP 2.2 (Hadoop 2.6)
● s3a:// with HDP 2.3 (Hadoop 2.7)
Not currently a direct replacement for HDFS
Recommended to configure access with IAM Role/Policy
● https://docs.aws.amazon.
com/IAM/latest/UserGuide/policies_examples.html#iam-
policy-example-s3
● Example: http://git.io/vLoGY
75. Amazon Deployment Guideline
● All in same Region/AZ
● Instances with Enhanced
Networking
Master Nodes:
● Choose EBS Optimized
● Boot: 100GB on EBS
● Data: 4+ 1TB on EBS
Worker Nodes:
● Boot: 100GB on EBS
● Data: Instance Storage
○ EBS can be used, but local
is preferred
Instance Types:
● Typical: d2.
● Performance: i2.
https://aws.amazon.com/ec2/instance-types/
76. Google Cloud
● Deploy using
○ CloudBreak
○ Google bdutil with Apache Ambari plug-in
● Integrated with Google Cloud Storage
● Supported directly by Hortonworks
77. Google Deployment Guideline
● Instance Types
○ Typical: n1 standard 4 with single 1.5 TB
persistent disks
○ Performance: n1 standard 8 with 1TB SSD
● Google GCS (Object Storage)
● gs://<CONFIGBUCKET>/dir/file
● Not currently a replacement for HDFS
78. S3 & GCS as Secondary storage system
The connectors are currently eventually consistent so do not replace HDFS
Backup
● Falcon, distCP, hadoop fs, HBase ExportSnapshot
● Kafka+Storm bolt sends messages to S3/GCS
providing backup & point-in-time recovery source
Input/Output
● Convenient & broadly used upload/download method
○ As a middleware to ease integration with Hadoop & limit access
● Publishing static content (optionally with CloudFront)
○ Removes need to manage any web services
● Storage for temporary/ephemeral clusters