Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

YARN Containerized Services: Fading The Lines Between On-Prem And Cloud

339 Aufrufe

Veröffentlicht am

Apache Hadoop YARN is the modern distributed operating system for big data applications. In Apache Hadoop 3.1.0, YARN added a service framework that supports long-running services. This new capability goes hand in hand with the recent improvements in YARN to support Docker containers. Together these features have made it significantly easier to bring new applications and services to YARN.

In this talk you will learn about YARN service framework, its new containerization capabilities and how it lays the foundation for a hybrid and uniform architecture for compute and storage across on-prem and multi-cloud environments. This will include examples highlighting how easy it is to bring applications to the YARN service framework as well as how to containerize applications.

Here's what to expect in this talk:
- Motivation for YARN service framework and containerization
- YARN service framework overview
- YARN service examples
- Containerization overview
- Containerization for Big Data and non Big Data workloads - wait that's everything

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

YARN Containerized Services: Fading The Lines Between On-Prem And Cloud

  1. 1. © Cloudera, Inc. All rights reserved. Apache Hadoop YARN Containerized Services: Fading The Lines Between On-Prem And Cloud Billie Rinaldi
  2. 2. © Cloudera, Inc. All rights reserved. 2 AGENDA Emergence of Containers Journey to a Container Cloud Building Blocks of a Container Cloud YARN Service APIs YARN Service Examples Enabling Hybrid Deployments
  3. 3. © Cloudera, Inc. All rights reserved. 4 CONTAINERIZATION IS GAINING MOMENTUM • Industry adoption continues • “Number of containerized applications will rise by 80% in the next two years” [1] • Multi-cloud and hybrid strategies • Adoption of microservices • Exponential ecosystem growth • Dozens of container orchestrators • Thousands of plugins • Market moves 1. http://i.dell.com/sites/doccontent/business/solutions/whitepapers/en/Documents/Containers_Real_Adoption_2017_Dell_EMC_Forrester_Paper.pdf
  4. 4. © Cloudera, Inc. All rights reserved. 5 WHY ARE CONTAINERS GAINING POPULARITY? • Improved hardware utilization through increased density • No virtual machine operating system overhead • Image layer reuse limits data duplication on disk • Strong resource isolation • Namespaces and cgroups • Better software packaging • Package applications and dependencies together • Improved reuse vs VM images • Distribution mechanism • Improved developer self service • More control over the execution environment • Promise of portability • On-premises and across multiple clouds
  5. 5. © Cloudera, Inc. All rights reserved. 6 CONTAINER ARCHITECTURE PATTERNS • Mix of services • Long lived services and ephemeral/batch jobs • Decoupled compute and storage • Scale independently • Hybrid deployments • Desire for consistency between cloud and on- premises
  6. 6. © Cloudera, Inc. All rights reserved. 7 ON PREM VS. CLOUD: VERY DIFFERENT MODELS Cloud • Multiple clusters • Decoupled compute and storage • Infrastructure as a Service • Improved agility and self-service On Prem • Large, multi-tenant clusters • Co-located compute and storage • Shared security and governance • Less agile due to physical hardware Public Cloud ComputeSecurity & Governance Compute Data Center Storage EDW Stream Processing Data Science Operations Data Science Data Science Data Science Stream Processing Stream Processing Stream Processing EDW Security, Governance, Operations EDWEDW Public Cloud Storage
  7. 7. © Cloudera, Inc. All rights reserved. 8 WHAT IS NEEDED TO BRIDGE THE GAP? Across clusters • Consistent deployment, security, and governance Within clusters • Decoupled compute and storage • Eliminate physical hardware as a barrier to agility How does Apache Hadoop YARN help enable portability?
  8. 8. © Cloudera, Inc. All rights reserved. 9 AGENDA Emergence of Containers Journey to a Container Cloud Building Blocks of a Container Cloud YARN Service APIs YARN Service Examples Enabling Hybrid Deployments
  9. 9. © Cloudera, Inc. All rights reserved. 10 JOURNEY TO A CONTAINER CLOUD • Started off with on-prem hardware • Quickly exceeded capacity, moved to public cloud • Costs were higher than we wanted • Bigger concern was the rate of the expense growth • Then back to on-prem • VM based infrastructure • CloudStack followed by OpenStack • Challenges before container cloud • Low density • Significant overhead per test • Many images with minimal differences, limited composition • More and more tests and products on-boarding • The existing environment could no longer keep up with the testing demands
  10. 10. © Cloudera, Inc. All rights reserved. 11 ASSESSING THE CHALLENGES • How is the industry addressing these same challenges? • Can we leverage our existing investment in hardware? • How to reduce overhead, improve density and hardware utilization? • What about improving reuse of packaging and automation?
  11. 11. © Cloudera, Inc. All rights reserved. 12 SOLUTION: ON-PREM CONTAINER CLOUD BUILT ON YARN • Containers (think Docker) • Containers eliminate a bulk of the virtualization overhead • Containers help improve reuse of images through composition • Container startup time is fast, no real boot sequence • Apache Hadoop YARN • Good technical fit • Good strategic fit
  12. 12. © Cloudera, Inc. All rights reserved. 13 WHY YARN? • YARN is Apache Hadoop’s resource management framework • At its core, YARN is responsible for orchestrating “containers” across a collection of servers • What is a YARN container? • Linux Process • Local Resources (scripts, jars, security tokens) • Resource constraints (CPU, Memory, IO) • Aligns well with container technologies such as Docker Container Model
  13. 13. © Cloudera, Inc. All rights reserved. 14 WHY YARN? • YARN is widely deployed • YARN is a superior scheduler • hardened by customer feedback • Leverage our existing expertise • “use what we ship and ship what we use” • No big leap to containerization • existing “Hadoop native” frameworks to run unchanged on the same infrastructure Strategic Advantages
  14. 14. © Cloudera, Inc. All rights reserved. 15 DOGFOODING: CONTAINER CLOUD FOR RELEASE TESTING Shared Services Resource Management (YARN) Management and Monitoring (Ambari) Jenkins Worker (Docker) Testing HDP and HDF releases in container clusters (soon CDH) HDP (Docker) Worker (Docker) Storage (HDFS) Service Discovery and REST API (YARN Services) Security and Governance (Ranger and Atlas) SubmitTest LaunchTest Worker (Docker) HDP (Docker) HDP (Docker) HDP (Docker)
  15. 15. © Cloudera, Inc. All rights reserved. 16 AGENDA Emergence of Containers Journey to a Container Cloud Building Blocks of a Container Cloud YARN Service APIs YARN Service Examples Enabling Hybrid Deployments
  16. 16. © Cloudera, Inc. All rights reserved. 17 BUILDING BLOCKS FOR A CONTAINER CLOUD ON YARN • YARN Container Runtimes – Enables support for Docker containers to make it easier to onboard new applications and services on YARN. • YARN Services Framework – Provides AM implementation, REST API, and various improvements to enable long running services on YARN. • YARN Service Discovery – Allows services running on YARN to discover one another.
  17. 17. © Cloudera, Inc. All rights reserved. 18 BUILDING BLOCKS FOR A CONTAINER CLOUD ON YARN • YARN Container Runtimes – Enables support for Docker containers to make it easier to onboard new applications and services on YARN. • YARN Services Framework – Provides AM implementation, REST API, and various improvements to enable long running services on YARN. • YARN Service Discovery – Allows services running on YARN to discover one another.
  18. 18. © Cloudera, Inc. All rights reserved. 19 NEW ABSTRACTION: YARN CONTAINER RUNTIMES Choose the Container Runtime at app submission time! DefaultLinuxContainerRuntime DockerLinuxContainerRuntime Existing Linux process based execution Using Docker to run and monitor the containers
  19. 19. © Cloudera, Inc. All rights reserved. 20 DISTRIBUTED SHELL AND MAPREDUCE EXAMPLES Only difference is setting environment variables!
  20. 20. © Cloudera, Inc. All rights reserved. 21 DOCKER CONTAINER SUPPORT EVOLVING • Recent Efforts • Container Security • ACLs for privileged containers • Improved out the box security for untrusted images • Entrypoint support (systemd as PID-1 Fixes) • Exec to container support • Ongoing Efforts • Improving image management and lifecycle (YARN-9228) • runc/squashfs (YARN-9014) • CSI support (YARN-8811)
  21. 21. © Cloudera, Inc. All rights reserved. 22 BUILDING BLOCKS FOR A CONTAINER CLOUD ON YARN • YARN Container Runtimes – Enables support for Docker containers to make it easier to onboard new applications and services on YARN. • YARN Services Framework – Provides AM implementation, REST API, and various improvements to enable long running services on YARN. • YARN Service Discovery – Allows services running on YARN to discover one another.
  22. 22. © Cloudera, Inc. All rights reserved. 23 YARN SERVICES FRAMEWORK OVERVIEW • Long Running • Simplify the deployment and management of long running apps on YARN • Easy Onboarding • Remove tedious process of bringing new services to YARN • Declarative Configuration • JSON specification describing the desired state for the service to be managed • Standard Interfaces • REST API that lives in the Resource Manager, CLI tools for clients
  23. 23. © Cloudera, Inc. All rights reserved. 24 DEFINING SERVICES THROUGH THE JSON SPEC $ curl -H "Content-Type: application/json" -X POST http://RM_HOST:8088/app/v1/services -d @sleeper.json • This spec creates two component instances, sleeper-0 and sleeper-1 • Optional features include readiness checks, placement policies, and creating / mounting resources such as config files $ yarn app -launch serviceName sleeper.json
  24. 24. © Cloudera, Inc. All rights reserved. 25 BUILDING BLOCKS FOR A CONTAINER CLOUD ON YARN • YARN Container Runtimes – Enables support for Docker containers to make it easier to onboard new applications and services on YARN. • YARN Services Framework – Provides AM implementation, REST API, and various improvements to enable long running services on YARN. • YARN Service Discovery – Allows services running on YARN to discover one another.
  25. 25. © Cloudera, Inc. All rights reserved. 26 SIMPLIFIED SERVICE DISCOVERY VIA DNS Existing YARN Service Registry • Allows apps to register themselves • Stores entries in Apache ZooKeeper • Provides native Java, REST, and CLI clients to enable service discovery YARN Registry DNS Server • Watches the YARN Service Registry (ZK) for new application and container records • Creates user friendly DNS records based on the records • Supports zone transfers, zone forwarding, upstream querying, and DNSSEC Examples: componentInstanceName.serviceName.user.domain sleeper-0.sleeper-service.billie.domain ctr-e138-1518143905142-215498-01-000007.domain
  26. 26. © Cloudera, Inc. All rights reserved. 27 AGENDA Emergence of Containers Journey to a Container Cloud Building Blocks of a Container Cloud YARN Service APIs Yarn Service Examples Enabling Hybrid Deployments
  27. 27. © Cloudera, Inc. All rights reserved. 28 YARN SERVICE REST API Create a service POST URL - http://RM_HOST:8088/app/v1/services Get service status GET URL - http://RM_HOST:8088/app/v1/services/tensorflow Update service PUT URL - http://RM_HOST:8088/app/v1/services/tensorflow • Extend lifetime • STOP service • START service • Flex UP/DOWN the # of containers of one or more components • DELETE (destroy) service
  28. 28. © Cloudera, Inc. All rights reserved. 29 YARN APP CLI Usage: yarn app -launch serviceName jsonfile -flex serviceName -component componentName count -save serviceName jsonfile -start serviceName -status serviceName -stop serviceName -destroy serviceName
  29. 29. © Cloudera, Inc. All rights reserved. 30 Emergence of Containers Journey to a Container Cloud Building Blocks of a Container Cloud YARN Service APIs YARN Service Examples Enabling Hybrid Deployments AGENDA
  30. 30. © Cloudera, Inc. All rights reserved. 31 DEFINING SERVICES THROUGH THE JSON SPEC $ curl -H "Content-Type: application/json" -X POST http://RM_HOST:8088/app/v1/services -d @sleeper.json • This spec creates two component instances, sleeper-0 and sleeper-1 • Optional features include readiness checks, placement policies, and creating / mounting resources such as config files $ yarn app -launch serviceName sleeper.json
  31. 31. © Cloudera, Inc. All rights reserved. 32 DOCKER EXAMPLE To convert the sleeper example into a docker example, add an artifact: "artifact": { "id": "library/centos:7", "type": "DOCKER" }
  32. 32. © Cloudera, Inc. All rights reserved. 36 APACHE HBASE TARBALL EXAMPLE HBase tarball service ● TARBALL artifact type ● ENV variables ● Config files
  33. 33. © Cloudera, Inc. All rights reserved. 37 APACHE HBASE DOCKER EXAMPLE Replace TARBALL artifact with DOCKER artifact Remove unneeded env vars and add Docker mounts Optionally use absolute paths for generated config files Remove unneeded config files that already exist in the image Adjust launch command based on location in image
  34. 34. © Cloudera, Inc. All rights reserved. 40 AGENDA Emergence of Containers Journey to a Container Cloud Building Blocks of a Container Cloud YARN Service APIs YARN Service Examples Enabling Hybrid Deployments
  35. 35. © Cloudera, Inc. All rights reserved. Canada East (GCP) Reality: Multi-cloud and On-prem
  36. 36. © Cloudera, Inc. All rights reserved. 42 ON PREM VS. CLOUD: VERY DIFFERENT MODELS Cloud • Multiple clusters • Decoupled compute and storage • Infrastructure as a Service • Improved agility and self-service On Prem • Large, multi-tenant clusters • Co-located compute and storage • Shared security and governance • Less agile due to physical hardware Public Cloud ComputeSecurity & Governance Compute Data Center Storage EDW Stream Processing Data Science Operations Data Science Data Science Data Science Stream Processing Stream Processing Stream Processing EDW Security, Governance, Operations EDWEDW Public Cloud Storage
  37. 37. © Cloudera, Inc. All rights reserved. 43 ON PREM VS. CLOUD: BRIDGING THE GAP Cloud Shared Sec/Gov Services, Multi- Cluster, Multi-Cloud On Prem Shared Sec/Gov Services, Multi-Cluster, Containerized Public Cloud Compute Data Science Data ScienceData Science Stream ProcessingStream Processing Stream Processing EDW Security, Governance, Operations EDWEDW Public Cloud Storage Apache Hadoop YARN Container Cloud Data Science Data ScienceData Science Stream ProcessingStream Processing Stream Processing EDW Security, Governance, Operations EDWEDW Data Center Storage
  38. 38. © Cloudera, Inc. All rights reserved. 44© Cloudera, Inc. All rights reserved. CLOUDERA DATA PLATFORM • Public, private & hybrid cloud • Shared data experience • Powered by open source • Analytics from the Edge to AI • Unified data control plane Infrastructur e Private Cloud Hybrid Cloud Public Multi-Cloud Edge DSX Catalog | Schema | Migration | Security | GovernanceData management Analytic experiences Data Flow & Streaming Data Engineering Data Warehouse Operational Database Machine Learning Altus DataPlane Identity | Orchestration | Management | OperationsUnified control plane
  39. 39. © Cloudera, Inc. All rights reserved. THANK YOU

×