Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Rapid Productionalization of Predictive Models 
In-database Modeling with Revolution Analytics on Teradata 
Skylar Lyon 
A...
Introduction 
Skylar Lyon 
Accenture Analytics 
• 7 years of experience with focus on big data 
and predictive analytics -...
How we got here 
Project background and my involvement 
• New Customer Analytics team for Silicon Valley Internet eCommerc...
Colleague‘s CRAN R model 
Binomial logistic regression 
• 50+ Independent variables including categorical with indicator 
...
We optimized the current productionalization process 
We moved compute to data 
Before After 
Reduced 5+ hour process to 4...
Benchmarking our optimized process 
5+ hours to 40 seconds: Recommendation is that this now become 
the defacto production...
Optimization process 
Recode CRAN R to Rx R 
Before 
trainit <- glm(as.formula(specs[[i]]), data = training.data, 
family=...
Additional benefits to new process 
Technology is increasing data science team’s options and 
opportunities 
• Train in-da...
Appendix 
Table of Contents 
• Technical Considerations 
Copyright © 2014 Accenture. All rights reserved. 9
Technical considerations 
Environment setup 
• Teradata environment – 4 node, 1700 series appliance server 
• Revolution R...
Nächste SlideShare
Wird geladen in …5
×

Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on Teradata

8.516 Aufrufe

Veröffentlicht am

[Presentation by Skylar Lyon at DataWeek 2014, September 17 2014.]

I recently faced the task of how to scale out an existing analytics process. The schedule was compressed - it always is in my world. The data was big - 400+ million rows waiting in database. What did I do? I offered my favorite type of solution - quick and dirty.

At the outset, I wasn't sure how easy it would be. Nor was I certain of realized performance gains. But the concept seemed sound and the exercise fun. Let's move the compute to the data via Revolution R Enterprise for Teradata.

This presentation outlines my approach in leveraging a colleague's R models as I experimented with running R in-database. Would my path lead to significant improvement? Could it be used to productionalize the workflow?

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on Teradata

  1. 1. Rapid Productionalization of Predictive Models In-database Modeling with Revolution Analytics on Teradata Skylar Lyon Accenture Analytics
  2. 2. Introduction Skylar Lyon Accenture Analytics • 7 years of experience with focus on big data and predictive analytics - using discrete choice modeling, random forest classification, ensemble modeling, and clustering • Technology experience includes: Hadoop, Accumulo, PostgreSQL, qGIS, JBoss, Tomcat, R, GeoMesa, and more • Worked from Army installations across the nation and also had the opportunity to travel twice to Baghdad to deploy solutions downrange. Copyright © 2014 Accenture. All rights reserved. 2
  3. 3. How we got here Project background and my involvement • New Customer Analytics team for Silicon Valley Internet eCommerce giant • Data scientists developing predictive models • Deferred focus on productionalization • Joined as Big Data Infrastructure and Analytics Lead Copyright © 2014 Accenture. All rights reserved. 3
  4. 4. Colleague‘s CRAN R model Binomial logistic regression • 50+ Independent variables including categorical with indicator variables • Train from small sample (many thousands) – not a problem in and of itself • Scoring across entire corpus (many hundred millions) – slightly more challenging Copyright © 2014 Accenture. All rights reserved. 4
  5. 5. We optimized the current productionalization process We moved compute to data Before After Reduced 5+ hour process to 40 seconds Copyright © 2014 Accenture. All rights reserved. 5
  6. 6. Benchmarking our optimized process 5+ hours to 40 seconds: Recommendation is that this now become the defacto productionalization process Copyright © 2014 Accenture. All rights reserved. 6 rows minutes
  7. 7. Optimization process Recode CRAN R to Rx R Before trainit <- glm(as.formula(specs[[i]]), data = training.data, family='binomial', maxit=iters) fits <- predict(trainit, newdata=test.data, type='response') After trainit <- rxGlm(as.formula(specs[[i]]), data = training.data, family='binomial', maxIterations=iters) fits <- rxPredict(trainit, newdata=test.data, type='response') Copyright © 2014 Accenture. All rights reserved. 7
  8. 8. Additional benefits to new process Technology is increasing data science team’s options and opportunities • Train in-database on much larger set – reduces need to sample • Nearly “native” R language – decrease deploy time • Hadoop support – score in multiple data warehouses Copyright © 2014 Accenture. All rights reserved. 8
  9. 9. Appendix Table of Contents • Technical Considerations Copyright © 2014 Accenture. All rights reserved. 9
  10. 10. Technical considerations Environment setup • Teradata environment – 4 node, 1700 series appliance server • Revolution R Enterprise – version 7.1, running R 3.0.2 Copyright © 2014 Accenture. All rights reserved. 10

×