Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Microsoft R Server for Data Sciencea

1.017 Aufrufe

Veröffentlicht am

Microsoft R Server for Data Science
by Krit Kamtuo
Data Science Thailand Meetup #9
Open Source tools for Data Science

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

Microsoft R Server for Data Sciencea

  1. 1. Data Science Team Data Engineering Data Science Application Development Business Acumen Data Management Data Dividend
  2. 2. Typical advanced analytics lifecycle Ingest Transform Explore Model Deploy     Score Visualize Measure   Model Score ƒ(x) Preparation Modeling Operationalization
  3. 3. Data Scientist should be creating / testing models Data scientist are rare and expensive Ingest Transform Explore Model Deploy     Score Visualize Measure   Model Score ƒ(x) Preparation Modeling Operationalization
  4. 4. But the reality is different … Data scientist focus time Ingest Transform Explore Model Deploy     Score Visualize Measure   Model Score ƒ(x) Preparation Modeling Operationalization 80% 5% 15%
  5. 5. Decisions Operationize Preparation Model
  6. 6. • Embrace Open Source • Evolutionary Path to Cloud • Democratize Data Science • Skill Re-Use • Transparent Scaling • Facilitate Collaboration • Decouple Data Science from Platforms • Leverage Hybrid Cloud Architecture • Accelerate Experimentation • Streamline Deployment Broaden The Talent Pool Increase Productivity Modernize Infrastructure Maximize Innovation Drive Down TCO
  7. 7. People + Data Sources Apps Sensors and devices From Data To Action On Premises INTELLIGENCEDATA ACTION Automated SystemsMicrosoft R Server & SQL R Services Apps Cortana Intelligence
  8. 8. Challenges posed by open source R ? ? Lack of Commercial Support Inadequate Modeling Performance Complex Deployment Processes Limited Data Scale
  9. 9. R from Microsoft brings Peace of mind Efficiency Speed and scalability Flexibility and agility
  10. 10. High-performance, Scalable R Linux, Windows, Hadoop & Teradata R Server Technology
  11. 11. CommercialOpen Community Revolution R Open R Open Revolution R Enterprise R Server
  12. 12. Escapes R’s traditional memory limits Scales predictive modeling using parallelization Distributes computation cores & nodes Minimizes data movement using in- database, in-MapReduce and in-Apache Spark execution
  13. 13. • Remote Execution • Transparent Parallelization: • Shared Resource Management Data Nodes Corporate Applications Desktops & Servers direct web services Microsoft R Server Hadoop
  14. 14. Distributed R - How Does Remote Compute Context ? Algorithm Master Predictive Algorithm Big Data Analyze Blocks In Parallel Load Block At A Time Distribute Work, Compile Results “Pack and Ship” Requests to Remote Environments Results Microsoft R Server functions • A compute context defines where to process. • E.g. remote context like Hadoop Map Reduce • Microsoft R functions prefixed with rx • Current set compute context determines processing location Copyright Microsoft Corporation. All rights reserved. Microsoft R Server “Client” Microsoft R Server “Server” Console R IDE or command- line REMOTE CONTEXT
  15. 15. ### SETUP HADOOP ENVIRONMENT VARIABLES ### myHadoopCC <- RxHadoopMR() ### HADOOP COMPUTE CONTEXT ### rxSetComputeContext(myHadoopCC) ### CREATE HDFS, DIRECTORY AND FILE OBJECTS ### hdfsFS <- RxHdfsFileSystem() hdfsFS ### ANALYTICAL PROCESSING ### ### Statistical Summary of the data rxSummary(~ArrDelay+DayOfWeek, data= AirlineDataSet, reportProgress=1) ### CrossTab the data rxCrossTabs(ArrDelay ~ DayOfWeek, data= AirlineDataSet, means=T) ### Linear Model and plot hdfsXdfArrLateLinMod <- rxLinMod(ArrDelay ~ DayOfWeek + 0 , data = AirlineDataSet) plot(hdfsXdfArrLateLinMod$coefficients) ### SETUP LOCAL ENVIRONMENT VARIABLES ### myLocalCC <- “localpar” ### LOCAL COMPUTE CONTEXT ### rxSetComputeContext(myLocalCC) ### CREATE LINUX, DIRECTORY AND FILE OBJECTS ### localFS <- RxNativeFileSystem() AirlineDataSet <- RxXdfData(“AirlineDemoSmall.xdf”, fileSystem = localFS) Local Parallel processing – Linux or Windows In – Hadoop ScaleR models can be deployed from a server or edge node to run in Hadoop without any functional R model re-coding for map-reduce Compute context R script – sets where the model will run Functional model R script – does not need to change to run in Hadoop Copyright Microsoft Corporation. All rights reserved.
  16. 16. DeployR • Web services software development kit for integration analytics via APIs : • Java • JavaScript • .NET Integrates R Into application infrastructures Capabilities: • Enterprise authentication & security • Horizontal scaling • Invokes R Scripts from web services calls • RESTful interface for easy integration • Works with: • Web & mobile apps • Leading BI & Visualization tools • Business rules and streaming engines DeployR DevelopR
  17. 17. 19 On-demand sales forecasting Real-time social media analysisLeveraging the power of Office365
  18. 18. Microsoft R Server provides a unique opportunity to deliver advanced analytics capabilities to customers who have already invested in storing their data on non Microsoft platforms like Hadoop, Teradata and Linux Hadoop - Cloudera CDH, Hortonworks HDP, and HDInsight
  19. 19. Write Once – Deploy Anywhere R Server portfolio Cloud RDBMS Desktops & Servers Hadoop & Spark EDW R Server Technology
  20. 20. Included in SQL Server 2016 Reuse and optimize existing R code Eliminate data movement In-database deployment Memory and disk scalability No R memory limits Write once, deploy anywhere Enterprise speed and scale Near-DB analytics Parallel threading and processing Reuse SQL skills for data engineering Cost effectiveness Scalability and choice Simplicity and agility
  21. 21. • The industry’s broadest R-based platform • Enterprise scale atop spark, Hadoop, RDBMSs & EDWs • Freedom from memory limits • Choice of Windows and Linux IDEs • Stable deployment • Write-once-deploy-anywhere portability • Investment protection • Hybrid cloud evolution
  22. 22. Introduces the following topics: 1. Creating an R Server on Spark HDInsight cluster 2. Installing RStudio for the cluster 3. Running R using Rstudio on web Reference: https://azure.microsoft.com/en- us/documentation/articles/hdinsight-hadoop-r-server-get- started/
  23. 23. Get Essentials Microsoft Developer Resources and R Server Developer Edition: aka.ms/ch9.th Microsoft R Server on-premises: www.microsoft.com/R-Server Microsoft R Server on Azure (Cloud): https://azure.microsoft.com/en- us/marketplace/partners/microsoft-r- products/microsoft-r-server/
  24. 24. What is • A statistics programming language • A data visualization tool • Open source • 2.5+M users • Taught in most universities • Thriving user groups worldwide • 7000+ free algorithms in CRAN • Scalable to big data • New and recent grad’s use it Language Platform Community Ecosystem • Rich application & platform integration
  25. 25. Convergence with Flexibility Scalable Algorithms R: Write Once Deploy Anywhere Templates & Samples Microsoft R Server Family R & Python to AML Interop. Cortana Intelligence
  26. 26. DistributedR ScaleR ConnectR DevelopR Code Portability Across Platforms In the Cloud Azure HDI/ Spark Workstations & Servers Linux Windows Clustered Systems Linux Clusters (LSF For Now) Microsoft HPC EDW Teradata Hadoop Hortonworks Cloudera MapR &HDInsight
  27. 27. DI R+CRAN MicrosoftR DistributedR DeployR DevelopR ScaleR ConnectR Delivers High Performance Parallel Distributed Analytics Across Individual and Clustered Systems • Cloudera • Hortonworks • MapR • Apache Spark • IBM Platform LSF • Microsoft HPC Clusters • Teradata Database • Red Hat • SuSE Servers • Windows DistributeR
  28. 28. RevoDeployR Web Services Client libraries (JavaScript, Java, .NET) Desktop Applications (i.e. Excel) Business Intelligence PowerBI Interactive Web or Mobile Applications HTTP/HTTPS – JSON/XML Session Management Authentication Data/Script Management Administration R R R scripts End User Application Developer Admin Data Scientist Grid Node R