SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Quick House Keeping Rule

• Q&A panel is available if you have any questions during the
 webinar
• There will be time for Q&A at the end
• We will record the webinar for future viewing
• All attendees will receive a copy of the slides an recording




                                                                 Page 1
         © Hortonworks Inc. 2013
Hadoop, R, and Google Chart Tools
Data Visualization for Application Developers


Jeff Markham
Solution Engineer

jmarkham@hortonworks.com




© Hortonworks Inc. 2013
Agenda
•   Introductions
•   Use Case Description
•   Preparation
•   Demo
•   Review
•   Q&A




                                   Page 3
         © Hortonworks Inc. 2013
Use Case Description
• Visualizing data
  • Tools vs. application development
  • Choosing the technology
      • Hortonworks Data Platform
      • RHadoop
      • Google Charts




                                        Page 4
        © Hortonworks Inc. 2013
Preparation: Install HDP

 OPERATIONAL                                 DATA                   Hortonworks
   SERVICES                                SERVICES
                                                                    Data Platform (HDP)
   Manage &
    AMBARI                        FLUME    Store, HIVE
                                           PIG
   Operate at                           Process and         HBASE   Enterprise Hadoop
     Scale                        SQOOP Access Data
     OOZIE                                 HCATALOG
                                                                    • The ONLY 100% open source
                                  WEBHDFS
                                  Distributed    MAP REDUCE           and complete distribution
  HADOOP CORE                     Storage & Processing (in 2.0)
                                   HDFS          YARN


  PLATFORM SERVICES                    Enterprise Readiness: HA,
                                       DR, Snapshots, Security, …
                                                                    • Enterprise grade, proven and
                                                                      tested at scale
                                  HORTONWORKS
                                  DATA PLATFORM (HDP)               • Ecosystem endorsed to
                                                                      ensure interoperability
   OS                 Cloud                 VM          Appliance




                                                                                                Page 5
        © Hortonworks Inc. 2013
Preparation: Install R
• Install R language

• Install appropriate packages
  – rhdfs
  – rmr2
  – googleVis
  – shiny
  – Dependencies for all above




                                 Page 6
      © Hortonworks Inc. 2013
Preparation
• rmr2
   – Functions to allow for MapReduce in R apps


• rhdfs
   – Functions allowing HDFS access in R apps


• googleVis
   – Use of Google Chart Tools in R apps


• shiny
   – Interactive web apps for R developers




                                                  Page 7
      © Hortonworks Inc. 2013
Demo Walkthrough
              Using Hadoop, R, and Google Chart Tools




© Hortonworks Inc. 2012
Visualization Use Case
• Data from CDC
                – Vital statistics publicly available data
                – 2010 US birth data file




                 S    201001     7      2        2               30105
                 2 011 06 1 123               3405 1 06 01      2 2
SAMPLE RECORD




                 0321     1006 314      2000                   2 222           22
                 2 2 2       122222 11   3 094 1        M 04 200940 39072     3941
                 083                22    2 2 22                        110 110 00
                 0000000 00    000000000 000000 000  000000000000000000011
                 101       1 111       10    1 1 1    111111         11   1 1 11




                                              source: http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm

                                                                                                         Page 9
                    © Hortonworks Inc. 2013
Visualization Use Case
• Put data into HDFS
                     – Create input directory
                     – Put data into input directory
 CREATE HDFS DIR




                      > hadoop fs –mkdir /user/jeff/natality
PUT DATA INTO HDFS




                      > hadoop fs –put ~/VS2010NATL.DETAILUS.DAT
                      /user/jeff/natality/




                                                                   Page 10
                         © Hortonworks Inc. 2013
Visualization Use Case
• Write R script
           – Specify use of RHadoop packages
           – Initialize HDFS
           – Specify data input and output location

            #!/usr/bin/env Rscript

            require('rmr2')
            require('rhdfs')
            hdfs.init()
R SCRIPT




            hdfs.data.root = 'natality'
            hdfs.data = file.path(hdfs.data.root, 'VS2010NATL.DETAILUS.DAT')
            hdfs.out.root = hdfs.data.root
            hdfs.out = file.path(hdfs.out.root, 'out')

             ...


                                                                               Page 11
               © Hortonworks Inc. 2013
Visualization Use Case
• Write R script
           – Write mapper function
           – Write reducer function



            ...

            mapper = function(k, fields) {
              keyval(as.integer(substr(fields, 89, 90)),1)
            }
R SCRIPT




            reducer = function(key, vv) {
            # count values for each key
              keyval(key, sum(as.numeric(vv),na.rm=TRUE))
            }
             ...



                                                             Page 12
              © Hortonworks Inc. 2013
Visualization Use Case
• Write R script
           – Write job function




            ...

            job = function (input, output) {
             mapreduce(input = input,
                    output = output,
R SCRIPT




                    input.format = "text",
                    map = mapper,
                    reduce = reducer,
                    combine = T)
            }...




                                               Page 13
              © Hortonworks Inc. 2013
Visualization Use Case
• Write R script
           – Write result to HDFS output directory




            ...
R SCRIPT




            out = from.dfs(job(hdfs.data, hdfs.out))
            results.df = as.data.frame(out,stringsAsFactors=F)




                                                                 Page 14
              © Hortonworks Inc. 2013
Visualization Use Case
• Create Shiny application

                – Create directory
                – Create ui.R
                – Create server.R
SHINY APP DIR




                 > mkdir ~/my-shiny-app




                                             Page 15
                   © Hortonworks Inc. 2013
Visualization Use Case
• Create Shiny application
              – Create ui.R


               shinyUI(pageWithSidebar(

                # Application title
                headerPanel("2010 US Births"),

                sidebarPanel(. . .),
UI.R SOURCE




                 mainPanel(
                   tabsetPanel(
                     tabPanel("Line Chart", htmlOutput("lineChart")),
                     tabPanel("Column Chart", htmlOutput("columnChart"))
                   )
                 )
               ))



                                                                           Page 16
                 © Hortonworks Inc. 2013
Visualization Use Case
• Create Shiny application
                  – Create server.R


                   library(googleVis)
                   library(shiny)
                   library(rmr2)
                   library(rhdfs)
SERVER.R SOURCE




                   hdfs.init()

                   hdfs.data.root = 'natality'
                   hdfs.data = file.path(hdfs.data.root, 'out')
                   df = as.data.frame(from.dfs(hdfs.data))

                    ...




                                                                  Page 17
                      © Hortonworks Inc. 2013
Visualization Use Case
• Create Shiny application
                  – Create server.R



                   ...
                   shinyServer(function(input, output) {

                     output$lineChart <- renderGvis({
SERVER.R SOURCE




                       gvisLineChart(df, options=list(
                         vAxis="{title:'Number of Births'}",
                         hAxis="{title:'Age of Mother'}",
                         legend="none"
                      ))
                     })
                    ...




                                                               Page 18
                      © Hortonworks Inc. 2013
Visualization Use Case
• Run Shiny application

                > shiny::runApp('~/my-shiny-app')
                Loading required package: shiny

                Welcome to googleVis version 0.4.0
RUN SHINY APP




                ...

                HADOOP_CMD=/usr/bin/hadoop

                Be sure to run hdfs.init()

                Listening on port 8100




                                                     Page 19
                  © Hortonworks Inc. 2013
Visualization Use Case
• View Shiny application




                               Page 20
     © Hortonworks Inc. 2013
Demo Live
              Using Hadoop, R, and Google Chart Tools




© Hortonworks Inc. 2012
Visualization Use Case
• Architecture recap
  –   Analyze data sets with R on Hadoop
  –   Choose RHadoop packages
  –   Visualize data with Google Chart Tools via googleVis package
  –   Render googleVis output in Shiny applications


• Architecture next steps
  – Integrate Shiny application into existing web apps
  – Create further data models with R




                                                                 Page 22
      © Hortonworks Inc. 2013
HDP: Enterprise Hadoop Distribution

 OPERATIONAL                                 DATA                   Hortonworks
   SERVICES                                SERVICES
                                                                    Data Platform (HDP)
   Manage &
    AMBARI                        FLUME    Store, HIVE
                                           PIG
   Operate at                           Process and         HBASE   Enterprise Hadoop
     Scale                        SQOOP Access Data
     OOZIE                                 HCATALOG
                                                                    • The ONLY 100% open source
                                  WEBHDFS
                                  Distributed    MAP REDUCE           and complete distribution
  HADOOP CORE                     Storage & Processing (in 2.0)
                                   HDFS          YARN


  PLATFORM SERVICES                    Enterprise Readiness: HA,
                                       DR, Snapshots, Security, …
                                                                    • Enterprise grade, proven and
                                                                      tested at scale
                                  HORTONWORKS
                                  DATA PLATFORM (HDP)               • Ecosystem endorsed to
                                                                      ensure interoperability
   OS                 Cloud                 VM          Appliance




                                                                                                Page 23
        © Hortonworks Inc. 2013
HDP Sandbox




                             Page 24
   © Hortonworks Inc. 2013
Thank You!


Jeff Markham
Solution Engineer

jmarkham@hortonworks.com




                                Page 25
      © Hortonworks Inc. 2012

Weitere ähnliche Inhalte

Was ist angesagt?

Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
DataWorks Summit
 
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionYARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
DataWorks Summit
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
DataWorks Summit
 
Hortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data London
Hortonworks
 

Was ist angesagt? (20)

Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
 
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionYARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
 
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
 
Bigger Data For Your Budget
Bigger Data For Your BudgetBigger Data For Your Budget
Bigger Data For Your Budget
 
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks and HP Vertica Webinar
Hortonworks and HP Vertica Webinar
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble Storage
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun Connolly
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
 
Hortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data London
 

Ähnlich wie Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis

Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2
Wes Floyd
 
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsIntroduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI Tools
DataWorks Summit
 
Bring your Service to YARN
Bring your Service to YARNBring your Service to YARN
Bring your Service to YARN
DataWorks Summit
 

Ähnlich wie Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis (20)

Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and future
 
OSDC 2013 | Introduction into Hadoop by Olivier Renault
OSDC 2013 | Introduction into Hadoop by Olivier RenaultOSDC 2013 | Introduction into Hadoop by Olivier Renault
OSDC 2013 | Introduction into Hadoop by Olivier Renault
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop Summit
 
Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
 
Enabling R on Hadoop
Enabling R on HadoopEnabling R on Hadoop
Enabling R on Hadoop
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open Cloud
 
Introduction to pig
Introduction to pigIntroduction to pig
Introduction to pig
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
Spark mhug2
Spark mhug2Spark mhug2
Spark mhug2
 
Hadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeHadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality Challenge
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsIntroduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI Tools
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
Bring your Service to YARN
Bring your Service to YARNBring your Service to YARN
Bring your Service to YARN
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
 

Mehr von Hortonworks

Mehr von Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Kürzlich hochgeladen

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Kürzlich hochgeladen (20)

80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 

Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis

  • 1. Quick House Keeping Rule • Q&A panel is available if you have any questions during the webinar • There will be time for Q&A at the end • We will record the webinar for future viewing • All attendees will receive a copy of the slides an recording Page 1 © Hortonworks Inc. 2013
  • 2. Hadoop, R, and Google Chart Tools Data Visualization for Application Developers Jeff Markham Solution Engineer jmarkham@hortonworks.com © Hortonworks Inc. 2013
  • 3. Agenda • Introductions • Use Case Description • Preparation • Demo • Review • Q&A Page 3 © Hortonworks Inc. 2013
  • 4. Use Case Description • Visualizing data • Tools vs. application development • Choosing the technology • Hortonworks Data Platform • RHadoop • Google Charts Page 4 © Hortonworks Inc. 2013
  • 5. Preparation: Install HDP OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & AMBARI FLUME Store, HIVE PIG Operate at Process and HBASE Enterprise Hadoop Scale SQOOP Access Data OOZIE HCATALOG • The ONLY 100% open source WEBHDFS Distributed MAP REDUCE and complete distribution HADOOP CORE Storage & Processing (in 2.0) HDFS YARN PLATFORM SERVICES Enterprise Readiness: HA, DR, Snapshots, Security, … • Enterprise grade, proven and tested at scale HORTONWORKS DATA PLATFORM (HDP) • Ecosystem endorsed to ensure interoperability OS Cloud VM Appliance Page 5 © Hortonworks Inc. 2013
  • 6. Preparation: Install R • Install R language • Install appropriate packages – rhdfs – rmr2 – googleVis – shiny – Dependencies for all above Page 6 © Hortonworks Inc. 2013
  • 7. Preparation • rmr2 – Functions to allow for MapReduce in R apps • rhdfs – Functions allowing HDFS access in R apps • googleVis – Use of Google Chart Tools in R apps • shiny – Interactive web apps for R developers Page 7 © Hortonworks Inc. 2013
  • 8. Demo Walkthrough Using Hadoop, R, and Google Chart Tools © Hortonworks Inc. 2012
  • 9. Visualization Use Case • Data from CDC – Vital statistics publicly available data – 2010 US birth data file S 201001 7 2 2 30105 2 011 06 1 123 3405 1 06 01 2 2 SAMPLE RECORD 0321 1006 314 2000 2 222 22 2 2 2 122222 11 3 094 1 M 04 200940 39072 3941 083 22 2 2 22 110 110 00 0000000 00 000000000 000000 000 000000000000000000011 101 1 111 10 1 1 1 111111 11 1 1 11 source: http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm Page 9 © Hortonworks Inc. 2013
  • 10. Visualization Use Case • Put data into HDFS – Create input directory – Put data into input directory CREATE HDFS DIR > hadoop fs –mkdir /user/jeff/natality PUT DATA INTO HDFS > hadoop fs –put ~/VS2010NATL.DETAILUS.DAT /user/jeff/natality/ Page 10 © Hortonworks Inc. 2013
  • 11. Visualization Use Case • Write R script – Specify use of RHadoop packages – Initialize HDFS – Specify data input and output location #!/usr/bin/env Rscript require('rmr2') require('rhdfs') hdfs.init() R SCRIPT hdfs.data.root = 'natality' hdfs.data = file.path(hdfs.data.root, 'VS2010NATL.DETAILUS.DAT') hdfs.out.root = hdfs.data.root hdfs.out = file.path(hdfs.out.root, 'out') ... Page 11 © Hortonworks Inc. 2013
  • 12. Visualization Use Case • Write R script – Write mapper function – Write reducer function ... mapper = function(k, fields) { keyval(as.integer(substr(fields, 89, 90)),1) } R SCRIPT reducer = function(key, vv) { # count values for each key keyval(key, sum(as.numeric(vv),na.rm=TRUE)) } ... Page 12 © Hortonworks Inc. 2013
  • 13. Visualization Use Case • Write R script – Write job function ... job = function (input, output) { mapreduce(input = input, output = output, R SCRIPT input.format = "text", map = mapper, reduce = reducer, combine = T) }... Page 13 © Hortonworks Inc. 2013
  • 14. Visualization Use Case • Write R script – Write result to HDFS output directory ... R SCRIPT out = from.dfs(job(hdfs.data, hdfs.out)) results.df = as.data.frame(out,stringsAsFactors=F) Page 14 © Hortonworks Inc. 2013
  • 15. Visualization Use Case • Create Shiny application – Create directory – Create ui.R – Create server.R SHINY APP DIR > mkdir ~/my-shiny-app Page 15 © Hortonworks Inc. 2013
  • 16. Visualization Use Case • Create Shiny application – Create ui.R shinyUI(pageWithSidebar( # Application title headerPanel("2010 US Births"), sidebarPanel(. . .), UI.R SOURCE mainPanel( tabsetPanel( tabPanel("Line Chart", htmlOutput("lineChart")), tabPanel("Column Chart", htmlOutput("columnChart")) ) ) )) Page 16 © Hortonworks Inc. 2013
  • 17. Visualization Use Case • Create Shiny application – Create server.R library(googleVis) library(shiny) library(rmr2) library(rhdfs) SERVER.R SOURCE hdfs.init() hdfs.data.root = 'natality' hdfs.data = file.path(hdfs.data.root, 'out') df = as.data.frame(from.dfs(hdfs.data)) ... Page 17 © Hortonworks Inc. 2013
  • 18. Visualization Use Case • Create Shiny application – Create server.R ... shinyServer(function(input, output) { output$lineChart <- renderGvis({ SERVER.R SOURCE gvisLineChart(df, options=list( vAxis="{title:'Number of Births'}", hAxis="{title:'Age of Mother'}", legend="none" )) }) ... Page 18 © Hortonworks Inc. 2013
  • 19. Visualization Use Case • Run Shiny application > shiny::runApp('~/my-shiny-app') Loading required package: shiny Welcome to googleVis version 0.4.0 RUN SHINY APP ... HADOOP_CMD=/usr/bin/hadoop Be sure to run hdfs.init() Listening on port 8100 Page 19 © Hortonworks Inc. 2013
  • 20. Visualization Use Case • View Shiny application Page 20 © Hortonworks Inc. 2013
  • 21. Demo Live Using Hadoop, R, and Google Chart Tools © Hortonworks Inc. 2012
  • 22. Visualization Use Case • Architecture recap – Analyze data sets with R on Hadoop – Choose RHadoop packages – Visualize data with Google Chart Tools via googleVis package – Render googleVis output in Shiny applications • Architecture next steps – Integrate Shiny application into existing web apps – Create further data models with R Page 22 © Hortonworks Inc. 2013
  • 23. HDP: Enterprise Hadoop Distribution OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & AMBARI FLUME Store, HIVE PIG Operate at Process and HBASE Enterprise Hadoop Scale SQOOP Access Data OOZIE HCATALOG • The ONLY 100% open source WEBHDFS Distributed MAP REDUCE and complete distribution HADOOP CORE Storage & Processing (in 2.0) HDFS YARN PLATFORM SERVICES Enterprise Readiness: HA, DR, Snapshots, Security, … • Enterprise grade, proven and tested at scale HORTONWORKS DATA PLATFORM (HDP) • Ecosystem endorsed to ensure interoperability OS Cloud VM Appliance Page 23 © Hortonworks Inc. 2013
  • 24. HDP Sandbox Page 24 © Hortonworks Inc. 2013
  • 25. Thank You! Jeff Markham Solution Engineer jmarkham@hortonworks.com Page 25 © Hortonworks Inc. 2012

Hinweis der Redaktion

  1. Hi, I’m Jeff Markham and I wanted to talk today about
  2. Agenda points
  3. Describe the use case and how to choose the tech
  4. Start by installing HDP
  5. Install R and dependencies
  6. Go into more detail on the R packages
  7. Walk through the demo before actually doing the demo
  8. Describe the data set
  9. Start with the very beginning: getting the downloaded data into Hadoop
  10. Start explaining the R script. Kick it off with explanation of RHadoop packages and what they’re doing
  11. Explain the mapper and reducer functions
  12. Explain the job function
  13. Wrap up with showing where the data lands
  14. Show how to create the Shiny app. Start with creating the directory.
  15. This the entirety of the Shiny UI. Help text in the sidebar is omitted for real estate.
  16. Explain the server.R code. Note the imports of the relevant R packages.
  17. Move to one of the functions that describes how Shiny wraps googleVis which wraps Google Chart Tools
  18. Show how to kick off the Shiny app and note the listening port
  19. Go to the browser and view the Shiny app
  20. Cut to the live demo.
  21. Recap what we just saw and suggest possible future steps to further develop the app
  22. Hammer home HDP as the bedrock for the app
  23. Suggest getting started with the Sandbox
  24. Wrap up with Q &amp; A