SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Lambda Architecture
Lambda architecture, devised by Nathan Marz, is a layered architecture which solves the
problem of computing arbitrary functions on arbitrary data in real time. In a real time system
the requirement is something like this -
result = function (all data)
With increasing volume of data, the query will take a significant amount of time to execute no
matter what resources we have used.
Lambda Architecture uses three layer architecture and a concept of pre-computed views to
solve this problem. Three layers are
● Batch Layer
● Speed Layer
● Serving Layer
Batch Layer
Batch layer stores immutable master data, computes arbitrary functions on all data and creates batch views.
Function of batch layer can be summarized as
batch view = function (all data)
Batch layer continuously does this job and updates batch views.
Traffic from Social Media
Serving Layer
Purpose of Serving Layer is to store batch views obtained from batch layer and provide random access to batch views.
When batch layer computes new views, they are updated in Serving Layer by Batch Layer.
The Serving Layer can be achieved by using a random access database.
Speed Layer
While batch layer computes batch view, it will not include data which came while re-computing batch views.
The purpose of Speed layer is to compute incremental views on recent data that is not included in batch views.
These views are called real time views.
A Speed Layer can be summarized as
real time view = function (real time view, new data)
So, our final query can be served by speed layer or serving layer.
batch view = function (all data)
real time view = function (real time view, new data)
result = merge (query (batch view), query (real time view))
An Example using Apache Spark
Suppose we want to build a system to find popular hash tags in a twitter stream, we can implement lambda architecture
using Apache Spark to build this system.
Batch Layer Implementation - Batch layer will read a file of tweets and calculate hash tag frequency map and will save
it to Cassandra database table.
Batch.java
Speed Layer Implementation - Speed layer can also be written in Apache spark using spark streaming feature.
We can get a stream of recent tweets and calculate recent real time view from this stream we can also save this
real time view to Cassandra for simplicity.
Speed.java :
Serving Layer implementation - Serving layer can be implemented as a RESTful web service which will query
Cassandra tables to get the final result in real time.
Unique Page Views
References and image credits
http://www.databasetube.com/database/big-data-lambda-architecture/
Big Data Principles and best practices of scalable real time data systems by Nathan Marz and James Warren

Weitere ähnliche Inhalte

Mehr von Quovantis

9 Deadliest Start-up Sins by Steve Blank
9 Deadliest Start-up Sins by Steve Blank9 Deadliest Start-up Sins by Steve Blank
9 Deadliest Start-up Sins by Steve BlankQuovantis
 
How caring for each design element changes everything!
How caring for each design element changes everything!How caring for each design element changes everything!
How caring for each design element changes everything!Quovantis
 
How to be an amazing presenter
How to be an amazing presenterHow to be an amazing presenter
How to be an amazing presenterQuovantis
 
Quovantis design principles
Quovantis design principlesQuovantis design principles
Quovantis design principlesQuovantis
 
How to succeed as technical lead or development manager
How to succeed as technical lead or development managerHow to succeed as technical lead or development manager
How to succeed as technical lead or development managerQuovantis
 
Frisby: Rest API Automation Framework
Frisby: Rest API Automation FrameworkFrisby: Rest API Automation Framework
Frisby: Rest API Automation FrameworkQuovantis
 
Who is an architect and Why care about Architecture
Who is an architect and Why care about ArchitectureWho is an architect and Why care about Architecture
Who is an architect and Why care about ArchitectureQuovantis
 

Mehr von Quovantis (7)

9 Deadliest Start-up Sins by Steve Blank
9 Deadliest Start-up Sins by Steve Blank9 Deadliest Start-up Sins by Steve Blank
9 Deadliest Start-up Sins by Steve Blank
 
How caring for each design element changes everything!
How caring for each design element changes everything!How caring for each design element changes everything!
How caring for each design element changes everything!
 
How to be an amazing presenter
How to be an amazing presenterHow to be an amazing presenter
How to be an amazing presenter
 
Quovantis design principles
Quovantis design principlesQuovantis design principles
Quovantis design principles
 
How to succeed as technical lead or development manager
How to succeed as technical lead or development managerHow to succeed as technical lead or development manager
How to succeed as technical lead or development manager
 
Frisby: Rest API Automation Framework
Frisby: Rest API Automation FrameworkFrisby: Rest API Automation Framework
Frisby: Rest API Automation Framework
 
Who is an architect and Why care about Architecture
Who is an architect and Why care about ArchitectureWho is an architect and Why care about Architecture
Who is an architect and Why care about Architecture
 

Kürzlich hochgeladen

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Lambda Architecture using Apache Spark – with Java code examples

  • 1.
  • 3. Lambda architecture, devised by Nathan Marz, is a layered architecture which solves the problem of computing arbitrary functions on arbitrary data in real time. In a real time system the requirement is something like this - result = function (all data) With increasing volume of data, the query will take a significant amount of time to execute no matter what resources we have used. Lambda Architecture uses three layer architecture and a concept of pre-computed views to solve this problem. Three layers are ● Batch Layer ● Speed Layer ● Serving Layer
  • 4.
  • 5. Batch Layer Batch layer stores immutable master data, computes arbitrary functions on all data and creates batch views. Function of batch layer can be summarized as batch view = function (all data) Batch layer continuously does this job and updates batch views.
  • 6. Traffic from Social Media Serving Layer Purpose of Serving Layer is to store batch views obtained from batch layer and provide random access to batch views. When batch layer computes new views, they are updated in Serving Layer by Batch Layer. The Serving Layer can be achieved by using a random access database. Speed Layer While batch layer computes batch view, it will not include data which came while re-computing batch views. The purpose of Speed layer is to compute incremental views on recent data that is not included in batch views. These views are called real time views. A Speed Layer can be summarized as real time view = function (real time view, new data) So, our final query can be served by speed layer or serving layer. batch view = function (all data) real time view = function (real time view, new data) result = merge (query (batch view), query (real time view))
  • 7.
  • 8. An Example using Apache Spark Suppose we want to build a system to find popular hash tags in a twitter stream, we can implement lambda architecture using Apache Spark to build this system. Batch Layer Implementation - Batch layer will read a file of tweets and calculate hash tag frequency map and will save it to Cassandra database table. Batch.java
  • 9. Speed Layer Implementation - Speed layer can also be written in Apache spark using spark streaming feature. We can get a stream of recent tweets and calculate recent real time view from this stream we can also save this real time view to Cassandra for simplicity. Speed.java :
  • 10. Serving Layer implementation - Serving layer can be implemented as a RESTful web service which will query Cassandra tables to get the final result in real time.
  • 11. Unique Page Views References and image credits http://www.databasetube.com/database/big-data-lambda-architecture/ Big Data Principles and best practices of scalable real time data systems by Nathan Marz and James Warren