Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Shortening the time from analysis to deployment
with ML-as-a-Service
TEVEC	Systems
Luiz	Augusto	Canito Gallego de	Andrade
...
Time	Series	Forecasting
Brazil’s GDP
Indistrial Capacity
Sales
What	will	sales	be	like	in	the	coming	periods?
Time	Series	Forecasting
Some	strategies	to	deal	with	the	problem
Time	Series	Forecasting
Some	strategies	to	deal	with	the	problem
Embedding	strategy
Feature	engineering	strategy
API	Customer	Story
The	customer	needs	insights	about	his	
data	and	to	build	value	upon	its	database
1
The	customer	is	thri...
API	Customer	Story
API	Service	Level Cloud	Standards Improved	accuracy	
over	time
Fresh	insights	to	
increase	value
Code	S...
API	Customer	Story
API	Service	Level Cloud	Standards Improved	accuracy	
over	time
Fresh	insights	to	
increase	value
Code	S...
API	Customer	Story
API	Service	Level Cloud	Standards Improved	accuracy	
over	time
Fresh	insights	to	
increase	value
Code	S...
Machine	learning	as	a	Service
Focus	Groups	strategies
Focus	Group	1
Collaboration	is	hard
Problems	are	solved	locally
Prob...
Machine	learning	as	a	Service
Product	Oriented	Strategy
Limited	API	problem	range
Software	problems	become	focus
”Distance...
Our	view	of	the	matter
Experimentation	framework
Commonly	used	
frameworks	and	APIs
Model 1
Model	2
Model	3
Model	4
Pipeli...
What’s	a	pipeline?
Node
Node
Node
Node
Node
Target
By	combining	effective	
software	architecture	and	
state-of-the-art	ML	...
Experimenting	(Agile	Data	Science)
ML	engineering
Run	Accuracy	Report
Data	Science
Subsamples	datasets	to	
focus	on	an	imp...
Experimenting	(Agile	Data	Science)
ML	engineering
Run	Accuracy	Report
Data	Science
Subsamples	datasets	to	
focus	on	an	imp...
Experimenting	(Agile	Data	Science)
ML	engineering
Run	Accuracy	Report
Data	Science
Subsamples	datasets	to	
focus	on	an	imp...
Experimenting	(Agile	Data	Science)
Large	Scale	experimenting	is	an	
inherent	part	of	the	system.
Conclusions
We	achieved	process	stability	once	we	separated	our	Data	Science	team	from	the	Production	Software	Ecosystem
T...
Luiz Augusto Canito Gallego de Andrade
+55 (11) 9 7163-2619
luiz.andrade@tevec.com.br
Gabriel Sivieri
+55 (11) 9 7191-3783...
Nächste SlideShare
Wird geladen in …5
×

Shortening the time from analysis to deployment with ml as-a-service — Luiz Andrade and Gabriel De Bodt Sivieri (tevec sistemas sa) @PAPIs Connect — São Paulo 2017

400 Aufrufe

Veröffentlicht am

The daily job of a Data Scientist ranges from a variety of tasks: improving models performance or dealing with framework structure implementations. Machine learning as a service, a hot topic in the field, implies thinking about architecture to allow constant improvements in performance for our products. This presentation shows one architecture design using RESTful resources, document oriented databases and pre-trained pipelines to achieve real-time predictions of time series with high availability, scalability and freedom to Data Scientists work directly on improving the accuracy rate of our products. We fine tunned to work on time series forecasting which is a very challenging field that still needs better solutions in terms of innovative modeling. During the presentation will be shown how these decisions keep our Data Scientists focused on working with real data and thinking about improvements that can reach a large volume of time series instead of singular and localized actions.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Shortening the time from analysis to deployment with ml as-a-service — Luiz Andrade and Gabriel De Bodt Sivieri (tevec sistemas sa) @PAPIs Connect — São Paulo 2017

  1. 1. Shortening the time from analysis to deployment with ML-as-a-Service TEVEC Systems Luiz Augusto Canito Gallego de Andrade Gabriel deBodt Sivieri
  2. 2. Time Series Forecasting Brazil’s GDP Indistrial Capacity Sales What will sales be like in the coming periods?
  3. 3. Time Series Forecasting Some strategies to deal with the problem
  4. 4. Time Series Forecasting Some strategies to deal with the problem Embedding strategy Feature engineering strategy
  5. 5. API Customer Story The customer needs insights about his data and to build value upon its database 1 The customer is thrilled with the results and eagerly wants to deploy this new acquired knowledge in his business processes 3 Data Science teams comes in the scene to crunch data and deliver powerfull models and insights about customer data 2 What are the requirements? 4 Customer SideConsulting Side
  6. 6. API Customer Story API Service Level Cloud Standards Improved accuracy over time Fresh insights to increase value Code Standards and release workflow New variables from public sources
  7. 7. API Customer Story API Service Level Cloud Standards Improved accuracy over time Fresh insights to increase value Code Standards and release workflow New variables from public sources Some objectives/requirements are extremely software related
  8. 8. API Customer Story API Service Level Cloud Standards Improved accuracy over time Fresh insights to increase value Code Standards and release workflow New variables from public sources Others are Data Science related
  9. 9. Machine learning as a Service Focus Groups strategies Focus Group 1 Collaboration is hard Problems are solved locally Problem oriented There is no long term strategy Focus Group 2 Focus Group 4Focus Group 3
  10. 10. Machine learning as a Service Product Oriented Strategy Limited API problem range Software problems become focus ”Distance from data” ”One size fits all” Software engineering Customer service Data Science User Experience
  11. 11. Our view of the matter Experimentation framework Commonly used frameworks and APIs Model 1 Model 2 Model 3 Model 4 Pipelines Document based database Modelos treinados Production Structure REST Continuous Data Science
  12. 12. What’s a pipeline? Node Node Node Node Node Target By combining effective software architecture and state-of-the-art ML and DS tools we are able to quickly test and deploy a fresh pipelines for different problems
  13. 13. Experimenting (Agile Data Science) ML engineering Run Accuracy Report Data Science Subsamples datasets to focus on an improvement Data Science Designing new models in small/medium size scale testing Focus on Business metrics (MAPE, ROC). Secondary use of ”math” metrics such as RMSE or LogLoss Accuracy is reported based in production forecasts versus updated information Cluster accuracy by dataset theme or key statistical metrics Use of TEVEC’s pipelining framework for quick model design Prototype using small scale testing in a console application (JupyterHub)
  14. 14. Experimenting (Agile Data Science) ML engineering Run Accuracy Report Data Science Subsamples datasets to focus on an improvement Data Science Designing new models in small/medium size scale testing Data Science/ML Engineering Large scale testing on production framework using production data ML engineering Push pipelines to production and monitor operations Business Decision Analyze the accuracy report and decide to push to production Experimenting structure is an actual document in TEVEC’s ODM data structure Experiment connects with pipelines and applies it to a sequence of datasets A/B Testing compares performance in same format as Accuracy Report Business has business-like inputs to decide communicate expected results to customer The new pipeline was validated throughout the whole experiment. It is safe to push to production.
  15. 15. Experimenting (Agile Data Science) ML engineering Run Accuracy Report Data Science Subsamples datasets to focus on an improvement Data Science Designing new models in small/medium size scale testing Data Science/ML Engineering Large scale testing on production framework using production data ML engineering Push pipelines to production and monitor operations Business Decision Analyze the accuracy report and decide to push to production We try to repeat the cycle every week
  16. 16. Experimenting (Agile Data Science) Large Scale experimenting is an inherent part of the system.
  17. 17. Conclusions We achieved process stability once we separated our Data Science team from the Production Software Ecosystem Through a collaboration between Data Science team and ML Engineers we were able to design a continuous experimentation process To care about standards and interface in experimentation stage is to save time in deployment. This also reduces the risk of unexpected errors in production Pipeline structure uses state-of-the-art packages and frameworks while enforcing interfaces and software architecture, not coding standards. This saves time to focus on Data Science We are still learning from this new ”continuous” DS process, but so far we have had excellent results in team growing and incrementally improving our software
  18. 18. Luiz Augusto Canito Gallego de Andrade +55 (11) 9 7163-2619 luiz.andrade@tevec.com.br Gabriel Sivieri +55 (11) 9 7191-3783 gabriel.sivieri@tevec.com.br

×