Vip Model Call Girls (Delhi) Karol Bagh 9711199171âïžBody to body massage wit...
Â
Microservics, serverless and real time; Building blocks of the modern data pipeline
1. Manisha Sule | @tweetDataS
Microservices, serverless and real-time: Building blocks for
the modern data pipelines
#GHC18
2. PAGE 2 | GRACE HOPPER CELEBRATION 2018
PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC 1
8
Building data pipelines
Whats, Whys and Hows of Microservices
Whats, Whys and Hows of Serverless
Whats, Whys and Hows of real-time
Outline
Summary
3. #GHC18
Building Data Pipelines
âYou can have data without information, but you cannot have information without data.â-
Daniel Keys Moran, Computer programmer and science fiction author
4. PAGE 4 | GRACE HOPPER CELEBRATION 2018
PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY #GH
C
1
Streaming, IoT, legacy
systems, social media, apps
Collect Input data
Clean, join, standardize, correct,
SQL, statistics, ML, Deep learning,
Process and Analyze
Embed insights into apps, make business
decisions and measure effectiveness
Output or embed analysis
3 steps of a Data Pipelines
Data
Process
Insights
5. PAGE 5 | GRACE HOPPER CELEBRATION 2018
PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY #GH
C
1
Collect data and move it
to storage.
Prepare data as part of
ETL or ELT processes.
Stitch different data
processing steps
together.
Create frameworks and
microservices to serve
data.
Ensure data is ready to
use
Data
Engineer
Explore and examine
data to find patterns and
structure.
Prepare data for
predictive modelling.
Apply statistical
techniques, machine
learning, deep learning
for modelling data.
Deliver concrete insights
to be consumed by
applications or business.
Data
Analyst
Explore and examine
data to find patterns and
structure.
Apply SQL and statistical
analysis to extract trends
and patterns in data.
Prepare reports and
dashboards as per the
business needs.
Interact with business to
manage their data
analysis needs.
Data
Scientist
Responsible for
designing, creating and
deploying an
organizationâs data
architecture.
Establishes required
infrastructure to support
data projects.
Owner of data
governance and data
security.
Monitors and maintains
data quality.
Data
Architect
Players building a data pipeline
6. PAGE 6 | GRACE HOPPER CELEBRATION 2018
PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY #GH
C
1
Who owns cleaning of
data? Data engineer or
data scientist?
Productionizing poorly
written predictive
models.
Maintaining end-end
pipelines in production.
Data
Engineer
Who owns cleaning of
data? Data engineer or
data scientist?
Predictive model is
created, but not in
production.
Access issues to data
stores and data
infrastructure.
Data
Analyst
Data is dirty and
unprocessed.
Data infrastructure
failures.
Data
Scientist
Infrastructure is used
incorrectly.
Heavy load on
infrastructure due to
poorly written code.
Data
Architect
Challenges between players
Solution: âYou build it, you own itâ model. Give autonomy and ownership of
services all the way to production.
8. PAGE 8 | GRACE HOPPER CELEBRATION 2018
PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GH
C
1
Breaking down one large monolith application into smaller independent services.
Each service can be independently executed on demand.
Each service can be consumed via a RESTful API with pre-defined contracts
The Microservices architecture
9. PAGE 9 | GRACE HOPPER CELEBRATION 2018
PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GH
C
1
Microservices in the Data Analytics world
Data
Enhancer
Service
Prediction
Service
Data
Cleaner
Service
Training
Predictor
Service
Customer 360
degree View
Builder
Data
Collector
Service
10. PAGE 10 | GRACE HOPPER CELEBRATION 2018
PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GH
C
1
Architecting Microservices
Data
Cleaner
Service
Data
Collector
Service
Training
Predictor
Service
Data
Enhancer
Service
Data
Store
Data
Store
Prediction
Service
Data
Store
Data
Store
Prediction
Applications,
Social media,
Marketing tools,
Sales tools etc
11. #GHC18PAGE 11 | GRACE HOPPER CELEBRATION 2018
PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY #GH
C
1
Advantages of microservices
1. Allows development in parallel ;
1. Allows agile development; easier to upgrade features and redeploy
1. Diverse tech stacks
1. Better failure detection
1. Enhanced CI/CD : allowing easier push to code base which is spread out
1. âBuild it, own itâ principle adds autonomy, ownership all the way into
production
12. #GHC18PAGE 12 | GRACE HOPPER CELEBRATION 2018
PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY #GH
C
1
Disadvantages of microservices
1. Operations overhead ; maintenance can be cumbersome when 20 services
explode to 100+
1. DevOps is a must; needs a good understanding of how to deploy, run,
optimize and support services.
1. Change in communication contracts can affect all other services.
1. Code duplication
14. PAGE 14 | GRACE HOPPER CELEBRATION 2018
PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GH
C
1
What is serverless?
Cloud computing
execution model
Infrastructure is
managed by cloud
provider.
Developer focusses
on application logic
only.
Event-driven,
stateless
applications.
Image Source: https://www.slideshare.net/CodeOps/serverless-architecture-a-gentle-overview?qid=aecf8d27-8b16-4da5-987f-
600fe1cb0655&v=&b=&from_search=5
15. #GHC18PAGE 15 | GRACE HOPPER CELEBRATION 2018
PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY #GH
C
1
Advantages of serverless
1. Fully managed; cloud manages servers.
1. Highly available, scalable, no provisioning needed and zero administration.
1. Not just compute containers, but also includes NoSQL databases, interactive
query services, storage services, messaging services.
1. Cost efficient, never have to pay for idle time.
1. Support for continuous integration/ continuous delivery pipelines.
1. Developers can focus on architecture and code only.
1. Several use cases. Utility logic, scheduled processing, event-driven architecture,
micro services, full blown applications
16. #GHC18PAGE 16 | GRACE HOPPER CELEBRATION 2018
PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY #GH
C
1
Disadvantages of serverless
1. Vendor lock-in
1. Lack of visibility into granular architectures
1. Not for stateful applications
1. Limited by at timeout (vendor specific)
17. PAGE 17 | GRACE HOPPER CELEBRATION 2018
PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GH
C
1
Serverless cloud providers
Source: https://www.slideshare.net/loige/building-a-serverless-company-with-nodejs-react-and-the-serverless-framework-jsday-2017-verona
18. PAGE 18 | GRACE HOPPER CELEBRATION 2018
PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY #GH
C
1
Object Storage
Secure, durable, highly
scalable object storage
Access by web interface
or programmatic API.
Object upload/delete
triggers Lambda
compute.
S3
Compute
environment
Run event-driven,
stateless code without
provisioning servers.
Pay only for the
compute time,
eliminates idle time cost.
Scales automatically.
Supports Python, Java,
Node.js, C#
DynamoDB
NoSQL database
Fully managed NoSql
database.
Supports both key-value
and document store.
Scales consistently.
Latency in single digit
milliseconds.
Lambda
Networking
service
Allows to create,
maintain, deploy API
services.
Handles hundreds of
thousands of API
requests concurrently.
Provides secure
authentication and
authorization to APIs.
API Gateway
AWS serverless services
19. #GHC18PAGE 19 | GRACE HOPPER CELEBRATION 2018
PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY #GH
C
1
AWS serverless services
Others:
âą SQS (messaging),
âą Cognito (authentication),
âą IoT,
âą QuickSight (Visualization),
âą CloudWatch (Monitoring and logging),
âą Athena (Query service)
âą Kinesis (Stream processing and analytics)
20. PAGE 20 | GRACE HOPPER CELEBRATION 2018
PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GH
C
1
Architecting Microservices
Data
Cleaner
Service
Data
Collector
Service
Training
Predictor
Service
Data
Enhancer
Service
Data
Store
Data
Store
Prediction
Service
Data
Store
Data
Store
Prediction
Applications,
Social media,
Marketing tools,
Sales tools etc
What will customer XYZ
want to buy next?
21. PAGE 21 | GRACE HOPPER CELEBRATION 2018
PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GH
C
1
Architecting Microservices
Data
Cleaner
Service
Data
Collector
Service
Training
Predictor
Service
Data
Enhancer
Service
Data
Store
Data
Store
Prediction
Service
Data
Store
Data
Store
Prediction
Applications,
Social media,
Marketing tools,
Sales tools etc
What will customer XYZ
want to buy next?
23. What is real time?
Real time processing is the ability of a data pipeline to not
only ingest data as it is available, but also to process it,
and create predictions/analysis based on it.
Why real time?
This is a sophisticated but critical capability, especially in
fast changing scenarios like a stock market or monitoring
patient health in ICUs or high risk cases like fraud
detection and clickstream analysis.
How to implement real-time?
Cloud providers serverless real-time service
Open source alternatives: Kafka, Spark streaming
24. PAGE 24 | GRACE HOPPER CELEBRATION 2018
PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GH
C
1
AWS solutions for real-time analytics
Source:https://aws.amazon.com/serverless/
25. PAGE 25 | GRACE HOPPER CELEBRATION 2018
PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GH
C
1
Google Cloud solutions for real-time
analytics
Source:https://cloud.google.com/solutions/big-data/stream-analytics/
26. PAGE 26 | GRACE HOPPER CELEBRATION 2018
PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GH
C
1
Real time analytics with Microservices and serverless
Data
Cleaner
Service
Data
Collector
Service
Training
Predictor
Service
Data
Enhancer
Service
Data
Store
Data
Store
Prediction
Service
Data
Store
Data
Store
Prediction
Applications,
Social media,
Marketing tools,
Sales tools etc
What will customer XYZ
want to buy next?
27. Summary
Data pipelines are the backbone of any data science project.
Model of âbuild it, own itâ makes team collaboration efficient.
Microservices architecture enables parallel development, ease
of deploy and maintenance and promotes increased ownership.
Serverless services by cloud providers make development
faster and less complex.
Adding real-time serverless services yields working solution to
complexities of stream processing.