Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Taking R Analytics to SQL and the Cloud

20.691 Aufrufe

Veröffentlicht am

Presentation by Andrie de Vries to SQL Relay 6 October 2015

Veröffentlicht in: Technologie

Taking R Analytics to SQL and the Cloud

  1. 1. 2 WHO The leading provider of advanced analytics software and services based on open source R, since 2007 WHAT REVOLUTION R: The enterprise-grade predictive analytics application platform based on the R language WHERE “This acquisition will help customers use advanced analytics within Microsoft data platforms“ -- Joseph Sirosh, CVP C+E
  2. 2. 3
  3. 3. • Situation • Complication • Critical question? • Answer
  4. 4. • A high level overview of R • Data science in the cloud • Connecting R to SQL • Scalable R • R in SQL Server • Moving your workflow to the cloud
  5. 5. A high level overview of R
  6. 6. • Most widely used data analysis software • Most powerful statistical programming language • Create beautiful and unique data visualizations • Thriving open-source community • Fills the talent gap www.revolutionanalytics.com/what-is-r
  7. 7. 1993 • Research project in Auckland, NZ 1995 • Open source 1997 • R-core 2000 • R-1.0.0 2003 • R Foundation 2004 • First UseR! 2009 • New York Times 2015 • R-3.2.0 • R Consortium 8 Photo credit: Robert Gentleman
  8. 8. The New York Times Interactive Features • Election Forecast • Dialect Quiz Data Journalism • NFL Draft Picks • Wealth distribution in USA
  9. 9. Data science in the Azure cloud
  10. 10. Trends
  11. 11. Software Revenues New License Revenues http://redmonk.com/sogrady/2013/11/21/selling-software/ 13
  12. 12. The Azure Cloud Operational Announced Central US Iowa West US California North Europe Ireland East US Virginia East US 2 Virginia US Gov Virginia North Central US Illinois US Gov Iowa South Central US Texas Brazil South Sao Paulo West Europe Netherlands China North * Beijing China South * Shanghai Japan East Saitama Japan West OsakaIndia West TBD India East TBD East Asia Hong Kong SE Asia Singapore Australia West Melbourne Australia East Sydney * Operated by 21Vianet
  13. 13. http://blog.revolutionanalytics.com/2015/06/r-build-keynote.html/
  14. 14. Connecting R to SQL
  15. 15. 21 mran.revolutionanalytics.com
  16. 16. Demo • Using ODBC to connect R to SQL
  17. 17. Solving the scalability problem with R
  18. 18. is…. the big data big analytics platform based on open source R
  19. 19. • Data import – Delimited, Fixed, SAS, SPSS, OBDC • Variable creation & transformation • Recode variables • Factor variables • Missing value handling • Sort, Merge, Split • Aggregate by category (means, sums) • Min / Max, Mean, Median (approx.) • Quantiles (approx.) • Standard Deviation • Variance • Correlation • Covariance • Sum of Squares (cross product matrix for set variables) • Pairwise Cross tabs • Risk Ratio & Odds Ratio • Cross-Tabulation of Data (standard tables & long form) • Marginal Summaries of Cross Tabulations • Chi Square Test • Kendall Rank Correlation • Fisher’s Exact Test • Student’s t-Test • Subsample (observations & variables) • Random Sampling Data Step Statistical Tests Sampling Descriptive Statistics • Sum of Squares (cross product matrix for set variables) • Multiple Linear Regression • Generalized Linear Models (GLM) exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit, identity, log, logit, probit. User defined distributions & link functions. • Covariance & Correlation Matrices • Logistic Regression • Classification & Regression Trees • Predictions/scoring for models • Residuals for all models Predictive Models • K-Means • Decision Trees • Decision Forests • Stochastic Gradient Boosted Decision Trees Cluster Analysis Classification Simulation Variable Selection • Stepwise Regression Linear, Logistic and GLM • Monte Carlo • Parallel Random Number Generation Combination • Using Revolution rxDataStep and rxExec functions to combine open source R with Revolution R • PEMA API
  20. 20. Demo • Using RRE to solve the scalability problem
  21. 21. R in SQL Server
  22. 22. Data Scientist Interact directly with data Built-in to SQL Server Data Developer/DBA Manage data and analytics together Example Solutions • Fraud detection • Salesforecasting • Warehouse efficiency • Predictive maintenance Relational Data Analytic Library T-SQL Interface Extensibility ? R RIntegration 010010 100100 010101 Microsoft Azure Machine Learning Marketplace New R scripts 010010 100100 010101 010010 100100 010101 010010 100100 010101 010010 100100 010101 010010 100100 010101 SQL Server 2016
  23. 23. • Use your preferred R IDE • Set compute context to SQL Server • Use RevoScaleR rx functions Run R script • Create stored procedure • Execute directly in SSMS query Create SQL query
  24. 24. Demo • Using RRE directly in SQL-Server
  25. 25. Demo • Running R inside a SQL stored procedure
  26. 26. 36
  27. 27. Moving your workflow to the cloud
  28. 28. Model in Cloud Model Model in SQL Server using Revolution R Model in SQL Server using Revolution R Model on a sample of data Model on a sample of data Score in cloud Score in cloud Score Score in SQL Server Score in SQL Server Score using R
  29. 29. Andrie de Vries Senior Programmer Manager R Community Projects @RevoAndrie adevries@microsoft.com

×