Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Quick House Keeping Rule• Q&A panel is available if you have any questions during the webinar• There will be time for Q&A ...
Hadoop, R, and Google Chart ToolsData Visualization for Application DevelopersJeff MarkhamSolution Engineerjmarkham@horton...
Agenda•   Introductions•   Use Case Description•   Preparation•   Demo•   Review•   Q&A                                   ...
Use Case Description• Visualizing data  • Tools vs. application development  • Choosing the technology      • Hortonworks ...
Preparation: Install HDP OPERATIONAL                                 DATA                   Hortonworks   SERVICES        ...
Preparation: Install R• Install R language• Install appropriate packages  – rhdfs  – rmr2  – googleVis  – shiny  – Depende...
Preparation• rmr2   – Functions to allow for MapReduce in R apps• rhdfs   – Functions allowing HDFS access in R apps• goog...
Demo Walkthrough              Using Hadoop, R, and Google Chart Tools© Hortonworks Inc. 2012
Visualization Use Case• Data from CDC                – Vital statistics publicly available data                – 2010 US b...
Visualization Use Case• Put data into HDFS                     – Create input directory                     – Put data int...
Visualization Use Case• Write R script           – Specify use of RHadoop packages           – Initialize HDFS           –...
Visualization Use Case• Write R script           – Write mapper function           – Write reducer function            ......
Visualization Use Case• Write R script           – Write job function            ...            job = function (input, out...
Visualization Use Case• Write R script           – Write result to HDFS output directory            ...R SCRIPT           ...
Visualization Use Case• Create Shiny application                – Create directory                – Create ui.R           ...
Visualization Use Case• Create Shiny application              – Create ui.R               shinyUI(pageWithSidebar(        ...
Visualization Use Case• Create Shiny application                  – Create server.R                   library(googleVis)  ...
Visualization Use Case• Create Shiny application                  – Create server.R                   ...                 ...
Visualization Use Case• Run Shiny application                > shiny::runApp(~/my-shiny-app)                Loading requir...
Visualization Use Case• View Shiny application                               Page 20     © Hortonworks Inc. 2013
Demo Live              Using Hadoop, R, and Google Chart Tools© Hortonworks Inc. 2012
Visualization Use Case• Architecture recap  –   Analyze data sets with R on Hadoop  –   Choose RHadoop packages  –   Visua...
HDP: Enterprise Hadoop Distribution OPERATIONAL                                 DATA                   Hortonworks   SERVI...
HDP Sandbox                             Page 24   © Hortonworks Inc. 2013
Thank You!Jeff MarkhamSolution Engineerjmarkham@hortonworks.com                                Page 25      © Hortonworks ...
Nächste SlideShare
Wird geladen in …5
×

Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis

6.007 Aufrufe

Veröffentlicht am

In this session, attendees will learn how to use R in the distributed environment of Hadoop using the rmr package. Additionally, the R package googleVis will be used to show how application development teams can incorporate the power of R and the power of Google Chart Tools into their applications quickly and easily. The result is a rich custom data visualization with far less coding than what would otherwise be required. The session will begin by discussing R basics and then moving to concrete examples of statistical analysis on data sets. This will be accompanied by an application development example showing custom visualization of the analysis using googleVis. The application development example will show a browser based app both kicking off the data set analysis using R as well as the visualization of the result. Visualization examples will use both googleVis as well as basic Google Chart Tools. Attendees will leave the session with a concrete example of how to incorporate R into their existing application development practices and how to use Hadoop and its ecosystem to build custom visualizations.

Veröffentlicht in: Bildung
  • Als Erste(r) kommentieren

Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis

  1. 1. Quick House Keeping Rule• Q&A panel is available if you have any questions during the webinar• There will be time for Q&A at the end• We will record the webinar for future viewing• All attendees will receive a copy of the slides an recording Page 1 © Hortonworks Inc. 2013
  2. 2. Hadoop, R, and Google Chart ToolsData Visualization for Application DevelopersJeff MarkhamSolution Engineerjmarkham@hortonworks.com© Hortonworks Inc. 2013
  3. 3. Agenda• Introductions• Use Case Description• Preparation• Demo• Review• Q&A Page 3 © Hortonworks Inc. 2013
  4. 4. Use Case Description• Visualizing data • Tools vs. application development • Choosing the technology • Hortonworks Data Platform • RHadoop • Google Charts Page 4 © Hortonworks Inc. 2013
  5. 5. Preparation: Install HDP OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & AMBARI FLUME Store, HIVE PIG Operate at Process and HBASE Enterprise Hadoop Scale SQOOP Access Data OOZIE HCATALOG • The ONLY 100% open source WEBHDFS Distributed MAP REDUCE and complete distribution HADOOP CORE Storage & Processing (in 2.0) HDFS YARN PLATFORM SERVICES Enterprise Readiness: HA, DR, Snapshots, Security, … • Enterprise grade, proven and tested at scale HORTONWORKS DATA PLATFORM (HDP) • Ecosystem endorsed to ensure interoperability OS Cloud VM Appliance Page 5 © Hortonworks Inc. 2013
  6. 6. Preparation: Install R• Install R language• Install appropriate packages – rhdfs – rmr2 – googleVis – shiny – Dependencies for all above Page 6 © Hortonworks Inc. 2013
  7. 7. Preparation• rmr2 – Functions to allow for MapReduce in R apps• rhdfs – Functions allowing HDFS access in R apps• googleVis – Use of Google Chart Tools in R apps• shiny – Interactive web apps for R developers Page 7 © Hortonworks Inc. 2013
  8. 8. Demo Walkthrough Using Hadoop, R, and Google Chart Tools© Hortonworks Inc. 2012
  9. 9. Visualization Use Case• Data from CDC – Vital statistics publicly available data – 2010 US birth data file S 201001 7 2 2 30105 2 011 06 1 123 3405 1 06 01 2 2SAMPLE RECORD 0321 1006 314 2000 2 222 22 2 2 2 122222 11 3 094 1 M 04 200940 39072 3941 083 22 2 2 22 110 110 00 0000000 00 000000000 000000 000 000000000000000000011 101 1 111 10 1 1 1 111111 11 1 1 11 source: http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm Page 9 © Hortonworks Inc. 2013
  10. 10. Visualization Use Case• Put data into HDFS – Create input directory – Put data into input directory CREATE HDFS DIR > hadoop fs –mkdir /user/jeff/natalityPUT DATA INTO HDFS > hadoop fs –put ~/VS2010NATL.DETAILUS.DAT /user/jeff/natality/ Page 10 © Hortonworks Inc. 2013
  11. 11. Visualization Use Case• Write R script – Specify use of RHadoop packages – Initialize HDFS – Specify data input and output location #!/usr/bin/env Rscript require(rmr2) require(rhdfs) hdfs.init()R SCRIPT hdfs.data.root = natality hdfs.data = file.path(hdfs.data.root, VS2010NATL.DETAILUS.DAT) hdfs.out.root = hdfs.data.root hdfs.out = file.path(hdfs.out.root, out) ... Page 11 © Hortonworks Inc. 2013
  12. 12. Visualization Use Case• Write R script – Write mapper function – Write reducer function ... mapper = function(k, fields) { keyval(as.integer(substr(fields, 89, 90)),1) }R SCRIPT reducer = function(key, vv) { # count values for each key keyval(key, sum(as.numeric(vv),na.rm=TRUE)) } ... Page 12 © Hortonworks Inc. 2013
  13. 13. Visualization Use Case• Write R script – Write job function ... job = function (input, output) { mapreduce(input = input, output = output,R SCRIPT input.format = "text", map = mapper, reduce = reducer, combine = T) }... Page 13 © Hortonworks Inc. 2013
  14. 14. Visualization Use Case• Write R script – Write result to HDFS output directory ...R SCRIPT out = from.dfs(job(hdfs.data, hdfs.out)) results.df = as.data.frame(out,stringsAsFactors=F) Page 14 © Hortonworks Inc. 2013
  15. 15. Visualization Use Case• Create Shiny application – Create directory – Create ui.R – Create server.RSHINY APP DIR > mkdir ~/my-shiny-app Page 15 © Hortonworks Inc. 2013
  16. 16. Visualization Use Case• Create Shiny application – Create ui.R shinyUI(pageWithSidebar( # Application title headerPanel("2010 US Births"), sidebarPanel(. . .),UI.R SOURCE mainPanel( tabsetPanel( tabPanel("Line Chart", htmlOutput("lineChart")), tabPanel("Column Chart", htmlOutput("columnChart")) ) ) )) Page 16 © Hortonworks Inc. 2013
  17. 17. Visualization Use Case• Create Shiny application – Create server.R library(googleVis) library(shiny) library(rmr2) library(rhdfs)SERVER.R SOURCE hdfs.init() hdfs.data.root = natality hdfs.data = file.path(hdfs.data.root, out) df = as.data.frame(from.dfs(hdfs.data)) ... Page 17 © Hortonworks Inc. 2013
  18. 18. Visualization Use Case• Create Shiny application – Create server.R ... shinyServer(function(input, output) { output$lineChart <- renderGvis({SERVER.R SOURCE gvisLineChart(df, options=list( vAxis="{title:Number of Births}", hAxis="{title:Age of Mother}", legend="none" )) }) ... Page 18 © Hortonworks Inc. 2013
  19. 19. Visualization Use Case• Run Shiny application > shiny::runApp(~/my-shiny-app) Loading required package: shiny Welcome to googleVis version 0.4.0RUN SHINY APP ... HADOOP_CMD=/usr/bin/hadoop Be sure to run hdfs.init() Listening on port 8100 Page 19 © Hortonworks Inc. 2013
  20. 20. Visualization Use Case• View Shiny application Page 20 © Hortonworks Inc. 2013
  21. 21. Demo Live Using Hadoop, R, and Google Chart Tools© Hortonworks Inc. 2012
  22. 22. Visualization Use Case• Architecture recap – Analyze data sets with R on Hadoop – Choose RHadoop packages – Visualize data with Google Chart Tools via googleVis package – Render googleVis output in Shiny applications• Architecture next steps – Integrate Shiny application into existing web apps – Create further data models with R Page 22 © Hortonworks Inc. 2013
  23. 23. HDP: Enterprise Hadoop Distribution OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & AMBARI FLUME Store, HIVE PIG Operate at Process and HBASE Enterprise Hadoop Scale SQOOP Access Data OOZIE HCATALOG • The ONLY 100% open source WEBHDFS Distributed MAP REDUCE and complete distribution HADOOP CORE Storage & Processing (in 2.0) HDFS YARN PLATFORM SERVICES Enterprise Readiness: HA, DR, Snapshots, Security, … • Enterprise grade, proven and tested at scale HORTONWORKS DATA PLATFORM (HDP) • Ecosystem endorsed to ensure interoperability OS Cloud VM Appliance Page 23 © Hortonworks Inc. 2013
  24. 24. HDP Sandbox Page 24 © Hortonworks Inc. 2013
  25. 25. Thank You!Jeff MarkhamSolution Engineerjmarkham@hortonworks.com Page 25 © Hortonworks Inc. 2012

×