SlideShare ist ein Scribd-Unternehmen logo
1 von 37
1
Confidential
2
Confidential
Big Data in Advertisement
Industry
3
Confidential
Agenda
- Intro into Ad Exchange business area
- Big Data tools overview
- Architectural approach
- JVM-based processing in Big Data analytics
4
Confidential
Intro into Ad Exchange business
area
5
Confidential
Ad Evolution
Reservation
Buying
ads sold via direct
transactions between
advertisers/agencies
and publishers
Ad Networks
ad networks
aggregate inventory
and sold it to
advertisers. Helped
publishers by selling
inventory they could
not sell themselves
Ad Exchanges &
SSPs
real-time
marketplaces with a
large pools of liquid
inventory not sold in
direct buys: SSPs
have more controls for
publishers to optimize
yield
DSPs
Bidding technology
designed to help
advertisers/agencies
target and optimize
their buys across
multiple ad
exchanges/publisher
inventory pools
Private Exchanges
& Automated
Guaranteed
Exclusive advertiser-to-
publisher inventory
relationship for
programmatic
purchasing in brand
safe environments
1990s Now
Direct Sold/
Guaranteed/
Reserved
Indirect/
Programmatic/
Unreserved
Programmatic
Premium
6
Confidential
SellersBuyers
Ad Ecosystem. How it works?
Ad Network Ad Network
Agency DSP Ad Exchange SSP Publisher
DMP/Data Supply
Brand Audience
RTB
7
Confidential
Big Data tools overview
8
Confidential
What is Big Data?
We’ve all heard the term “big data,”
but you may not know exactly what it
means. Most experts agree the term
describes information that shares
these three attributes:
9
Confidential
Typical Big Data pipeline
Data Sources
- Structured
- Unstructured
Data Ingestion
- Batch layer
- Stream layer
Storage
BI / Data
Warehouse
Visualization
and Reporting
ToolsProcessing Layer
- Data Mining
- Machine
Learning
Governance and Privacy Security Quality Management High Scale; Low Cost
10
Confidential
Storages (non-relational)
Key-value Document Column-oriented
Graph Full-text (search engine) BLOB
11
Confidential
Data ingestion or ETL
Batch Near to realtime Realtime
Source ETL Destination
12
Confidential
Resource management
Distributed storage
Hadoop
HDFS
YARNMapReduce 1.0
MapReduce 2.0
13
Confidential
MapReduce
14
Confidential
Spark
Apache Spark is
a unified
analytics engine
for large-scale
data processing
15
Confidential
MapReduce
● Good old, slow and reliable
● Written in Java
● Natively supports Java, though all JVM
compatible languages are adaptable
● Easy to learn and tune
● Just batch processing
● Hard to implement complex pipelines
● Unit testing
Spark
● “Brand-new”, fast and flexible
● Written in Scala
● Natively supports Scala and Java (R and
Python)
● Provides fat pack of functionality
● Batch and micro-batch processing
● Support of complex pipelines is its thing
● Unit testing
MapReduce vs Spark: Which one to pick up?
16
Confidential
Architectural approach
17
Confidential
High level overview
Bid PlatformAd Platform
Buyer Buyer Buyer
Analytical
Platform
Seller
18
Confidential
Big Data analytics: What’s the challenge?
Daily
● 65B of raw ad and bid events
19
Confidential
Big Data analytics: What’s the challenge?
Daily
● 65B of raw ad and bid events
● over 100 TB of serialized and compressed raw input data
20
Confidential
Big Data analytics: What’s the challenge?
Daily
● 65B of raw ad and bid events
● over 100 TB of serialized and compressed raw input data
● around 150K analytic queries over 110 dimensions in an analytic data store
21
Confidential
Big Data analytics: What’s the challenge?
Daily
● 65B of raw ad and bid events
● over 100 TB of serialized and compressed raw input data
● around 150K analytic queries over 110 dimensions in an analytic data store
● 4s of 98% query time and 1s of Avg query time
22
Confidential
Big Data pipeline applied
Ad & Bid
Platforms Data Collector HDFS
Druid
Performance
Analytics
MapReduce
Spark
23
Confidential
Big Data pipeline applied
Ad & Bid
Platforms Data Collector HDFS
Druid
Performance
Analytics
MapReduce
Spark
24
Confidential
UI
25
Confidential
JVM-based processing in Big Data
analytics
26
Confidential
Let’s solve some problem: Keywords
Seller
“I want to have an opportunity to get performance reports beyond the standard account, site, zone, size,
geography, etc”
27
Confidential
Let’s solve some problem: Keywords
Seller
“I want to have an opportunity to get performance reports beyond the standard account, site, zone, size,
geography, etc”
Ad Exchange Company
“I want to satisfy high demand of this functionality, let’s name it Keywords, but I also want to reduce processing
and retention cost by servicing only sellers with limited number of different keywords”
28
Confidential
Let’s solve some problem: Keywords
Seller
“I want to have an opportunity to get performance reports beyond the standard account, site, zone, size,
geography, etc”
Ad Exchange Company
“I want to satisfy high demand of this functionality, let’s name it Keywords, but also want to reduce processing
and retention cost by servicing only sellers with limited number of different keywords”
Engineering
“There are two steps to solve Keywords problem: first, we need to identify sellers which comply with a threshold;
second, we need to prepare reports only for them”
29
Confidential
Spark: Let’s write some code
def getKeyword(AELog) => Option[ ( AccountId, Keyword ) ]
AdLog.getDataset(inputPath)(sparkSession)
.flatMap( getKeyword )
.distinct
.mapValues(_ => 1L)
.reduceByKey(_ + _)
.filter { case (_, totalKeywords) => totalKeywords <= maxKeywordsNumber }
.keys
.collect()
.toSet
Step #1: Identify valid sellers
(AdLogs: SeqFile[ID,AdLog], maxKeywords: Long) => Set[AccountId]
30
Confidential
Spark: Let’s write some code
case class KeywordsRecord( … ) // fields which represent dimensions and metrics
object KeywordsRecord { .. } // functions pack to operate with input/output data
AdLog.getDataset(inputPath)(sparkSession)
.filter( adLog => validSellers.contains(adLog.getAccountId) )
.map( KeywordsRecord.fromAdLog )
.toDS
.groupBy( KeywordsRecord.groupBy: _* ) // dimensions
.agg( KeywordsRecord.aggregations.head, KeywordsRecord.aggregations.tail: _* ) // metrics
.select( KeywordsRecord.allCols: _* )
.as[ KeywordsRecord ]
.map( KeywordsRecord.toJson )
.write
.text(outputPath)
Step #2: Prepare Keywords Report
(AdLogs: SeqFile[ID,AdLog], validSellers: Set[AccountId]) => TextFile[Json]
31
Confidential
Spark: Let’s write some code
object KeywordsApplication {
def getValidSellers(inputPath, maxKeywordsNumber)(implicit SparkSession)
def prepareReport(inputPath, outputPath, validSellers)(implicit SparkSession)
def main(args: Array[String]) = {
…
implicit val sparkSession = SparkSession.builder()
.appName(jobName)
.getOrCreate
val validSellers = getValidSellers(inputPath, maxKeywordsNumber)
prepareReport(inputPath, outputPath, validSellers)
…
}
}
Put it together: Step #1 + Step#2
(AdLogs: SeqFile[ID,AdLog], maxKeywords: Long) => TextFile[Json]
32
Confidential
Is this all about writing clean code?
33
Confidential
Is this all about writing clean code?
Nope!
Network
Bandwidth
Storage I/OCPU RAM
It may be a
bottleneck
34
Confidential
Is this all about writing clean code?
Nope!
Network
Bandwidth
Storage I/OCPU RAM
Compression
algorithms
MapReduce
& Spark jobs
tuning
Storage
formats
Data access
patterns
It may be a
bottleneck
It may help to
overcome the
bottleneck
35
Confidential
MapReduce + Spark: One must use them right
36
Confidential
36
Q&A session
37
Confidential
Thank you!

Weitere ähnliche Inhalte

Ähnlich wie Big Data in Advertising Industry — Oleksandr Fedirko, Danylo Stepanchuk

Invite media playbook
Invite media playbookInvite media playbook
Invite media playbookAdCMO
 
A federated information infrastructure that works
A federated information infrastructure that works A federated information infrastructure that works
A federated information infrastructure that works Stratebi
 
Data.Monks sGTM is a universal endpoint.pptx
Data.Monks sGTM is a universal endpoint.pptxData.Monks sGTM is a universal endpoint.pptx
Data.Monks sGTM is a universal endpoint.pptxDoug Hall
 
Fast Data Driving Personalization - Nick Gorski
Fast Data Driving Personalization - Nick GorskiFast Data Driving Personalization - Nick Gorski
Fast Data Driving Personalization - Nick GorskiHakka Labs
 
Adobe Business.pptx
Adobe Business.pptxAdobe Business.pptx
Adobe Business.pptxAnkush Kapil
 
Digital Display Advertising For Medicare Marketing: The Comprehensive Medicar...
Digital Display Advertising For Medicare Marketing: The Comprehensive Medicar...Digital Display Advertising For Medicare Marketing: The Comprehensive Medicar...
Digital Display Advertising For Medicare Marketing: The Comprehensive Medicar...Scott Levine
 
Sapphire Ventures: The Startup's Guide to Cloud Marketplaces
Sapphire Ventures: The Startup's Guide to Cloud MarketplacesSapphire Ventures: The Startup's Guide to Cloud Marketplaces
Sapphire Ventures: The Startup's Guide to Cloud MarketplacesRico Mallozzi
 
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanelA Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanelData Science Club
 
Data.Monks SGTM is a universal endpoint.pptx
Data.Monks SGTM is a universal endpoint.pptxData.Monks SGTM is a universal endpoint.pptx
Data.Monks SGTM is a universal endpoint.pptxDougHall64
 
Programmatic 101 webinar slides ck 032714 final
Programmatic 101 webinar slides   ck 032714 finalProgrammatic 101 webinar slides   ck 032714 final
Programmatic 101 webinar slides ck 032714 finalIABmembership
 
Webinar: Retargeting to the Max
Webinar: Retargeting to the MaxWebinar: Retargeting to the Max
Webinar: Retargeting to the MaxKatana Media
 
A History of Programmatic Media
A History of Programmatic MediaA History of Programmatic Media
A History of Programmatic MediaThe Media Kitchen
 
Tracing Information Flows Between Ad Exchanges Using Retargeted Ads
Tracing Information Flows Between Ad Exchanges Using Retargeted AdsTracing Information Flows Between Ad Exchanges Using Retargeted Ads
Tracing Information Flows Between Ad Exchanges Using Retargeted AdsSajjad "JJ" Arshad
 
TMK.edu Programmatic: September 2016
TMK.edu Programmatic: September 2016TMK.edu Programmatic: September 2016
TMK.edu Programmatic: September 2016The Media Kitchen
 
Emerce Retail_Stage Session_SQLI&Spryker_2022_AU (1).pdf
Emerce Retail_Stage Session_SQLI&Spryker_2022_AU (1).pdfEmerce Retail_Stage Session_SQLI&Spryker_2022_AU (1).pdf
Emerce Retail_Stage Session_SQLI&Spryker_2022_AU (1).pdfGuido X Jansen
 
Kritter introduction - technology player
Kritter   introduction - technology playerKritter   introduction - technology player
Kritter introduction - technology playerKrittercorporate
 
Embedded analytics and digital transformation
Embedded analytics and digital transformationEmbedded analytics and digital transformation
Embedded analytics and digital transformationGuha Athreya
 
Presentation at CPDP
Presentation at CPDP Presentation at CPDP
Presentation at CPDP Johnny Ryan
 
Ad technology101 v8
Ad technology101 v8Ad technology101 v8
Ad technology101 v8Satish Mehta
 

Ähnlich wie Big Data in Advertising Industry — Oleksandr Fedirko, Danylo Stepanchuk (20)

Invite media playbook
Invite media playbookInvite media playbook
Invite media playbook
 
A federated information infrastructure that works
A federated information infrastructure that works A federated information infrastructure that works
A federated information infrastructure that works
 
Data.Monks sGTM is a universal endpoint.pptx
Data.Monks sGTM is a universal endpoint.pptxData.Monks sGTM is a universal endpoint.pptx
Data.Monks sGTM is a universal endpoint.pptx
 
Fast Data Driving Personalization - Nick Gorski
Fast Data Driving Personalization - Nick GorskiFast Data Driving Personalization - Nick Gorski
Fast Data Driving Personalization - Nick Gorski
 
Adobe Business.pptx
Adobe Business.pptxAdobe Business.pptx
Adobe Business.pptx
 
Digital Display Advertising For Medicare Marketing: The Comprehensive Medicar...
Digital Display Advertising For Medicare Marketing: The Comprehensive Medicar...Digital Display Advertising For Medicare Marketing: The Comprehensive Medicar...
Digital Display Advertising For Medicare Marketing: The Comprehensive Medicar...
 
Sapphire Ventures: The Startup's Guide to Cloud Marketplaces
Sapphire Ventures: The Startup's Guide to Cloud MarketplacesSapphire Ventures: The Startup's Guide to Cloud Marketplaces
Sapphire Ventures: The Startup's Guide to Cloud Marketplaces
 
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanelA Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
 
Data.Monks SGTM is a universal endpoint.pptx
Data.Monks SGTM is a universal endpoint.pptxData.Monks SGTM is a universal endpoint.pptx
Data.Monks SGTM is a universal endpoint.pptx
 
Programmatic 101 webinar slides ck 032714 final
Programmatic 101 webinar slides   ck 032714 finalProgrammatic 101 webinar slides   ck 032714 final
Programmatic 101 webinar slides ck 032714 final
 
Webinar: Retargeting to the Max
Webinar: Retargeting to the MaxWebinar: Retargeting to the Max
Webinar: Retargeting to the Max
 
A History of Programmatic Media
A History of Programmatic MediaA History of Programmatic Media
A History of Programmatic Media
 
Tracing Information Flows Between Ad Exchanges Using Retargeted Ads
Tracing Information Flows Between Ad Exchanges Using Retargeted AdsTracing Information Flows Between Ad Exchanges Using Retargeted Ads
Tracing Information Flows Between Ad Exchanges Using Retargeted Ads
 
TMK.edu Programmatic: September 2016
TMK.edu Programmatic: September 2016TMK.edu Programmatic: September 2016
TMK.edu Programmatic: September 2016
 
Emerce Retail_Stage Session_SQLI&Spryker_2022_AU (1).pdf
Emerce Retail_Stage Session_SQLI&Spryker_2022_AU (1).pdfEmerce Retail_Stage Session_SQLI&Spryker_2022_AU (1).pdf
Emerce Retail_Stage Session_SQLI&Spryker_2022_AU (1).pdf
 
Kritter introduction - technology player
Kritter   introduction - technology playerKritter   introduction - technology player
Kritter introduction - technology player
 
Embedded analytics and digital transformation
Embedded analytics and digital transformationEmbedded analytics and digital transformation
Embedded analytics and digital transformation
 
EnergyMarketPrice Platform
EnergyMarketPrice PlatformEnergyMarketPrice Platform
EnergyMarketPrice Platform
 
Presentation at CPDP
Presentation at CPDP Presentation at CPDP
Presentation at CPDP
 
Ad technology101 v8
Ad technology101 v8Ad technology101 v8
Ad technology101 v8
 

Mehr von GlobalLogic Ukraine

GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"
GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"
GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"GlobalLogic Ukraine
 
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”GlobalLogic Ukraine
 
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”GlobalLogic Ukraine
 
Штучний інтелект як допомога в навчанні, а не замінник.pptx
Штучний інтелект як допомога в навчанні, а не замінник.pptxШтучний інтелект як допомога в навчанні, а не замінник.pptx
Штучний інтелект як допомога в навчанні, а не замінник.pptxGlobalLogic Ukraine
 
Задачі AI-розробника як застосовується штучний інтелект.pptx
Задачі AI-розробника як застосовується штучний інтелект.pptxЗадачі AI-розробника як застосовується штучний інтелект.pptx
Задачі AI-розробника як застосовується штучний інтелект.pptxGlobalLogic Ukraine
 
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptxЩо треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptxGlobalLogic Ukraine
 
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...GlobalLogic Ukraine
 
JavaScript Community Webinar #14 "Why Is Git Rebase?"
JavaScript Community Webinar #14 "Why Is Git Rebase?"JavaScript Community Webinar #14 "Why Is Git Rebase?"
JavaScript Community Webinar #14 "Why Is Git Rebase?"GlobalLogic Ukraine
 
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...GlobalLogic Ukraine
 
Страх і сила помилок - IT Inside від GlobalLogic Education
Страх і сила помилок - IT Inside від GlobalLogic EducationСтрах і сила помилок - IT Inside від GlobalLogic Education
Страх і сила помилок - IT Inside від GlobalLogic EducationGlobalLogic Ukraine
 
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”GlobalLogic Ukraine
 
GlobalLogic QA Webinar “What does it take to become a Test Engineer”
GlobalLogic QA Webinar “What does it take to become a Test Engineer”GlobalLogic QA Webinar “What does it take to become a Test Engineer”
GlobalLogic QA Webinar “What does it take to become a Test Engineer”GlobalLogic Ukraine
 
“How to Secure Your Applications With a Keycloak?
“How to Secure Your Applications With a Keycloak?“How to Secure Your Applications With a Keycloak?
“How to Secure Your Applications With a Keycloak?GlobalLogic Ukraine
 
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...GlobalLogic Ukraine
 
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...GlobalLogic Ukraine
 
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”GlobalLogic Ukraine
 
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"GlobalLogic Ukraine
 
GlobalLogic Webinar "Introduction to Embedded QA"
GlobalLogic Webinar "Introduction to Embedded QA"GlobalLogic Webinar "Introduction to Embedded QA"
GlobalLogic Webinar "Introduction to Embedded QA"GlobalLogic Ukraine
 
C++ Webinar "Why Should You Learn C++ in 2021-22?"
C++ Webinar "Why Should You Learn C++ in 2021-22?"C++ Webinar "Why Should You Learn C++ in 2021-22?"
C++ Webinar "Why Should You Learn C++ in 2021-22?"GlobalLogic Ukraine
 
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...GlobalLogic Ukraine
 

Mehr von GlobalLogic Ukraine (20)

GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"
GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"
GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"
 
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
 
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
 
Штучний інтелект як допомога в навчанні, а не замінник.pptx
Штучний інтелект як допомога в навчанні, а не замінник.pptxШтучний інтелект як допомога в навчанні, а не замінник.pptx
Штучний інтелект як допомога в навчанні, а не замінник.pptx
 
Задачі AI-розробника як застосовується штучний інтелект.pptx
Задачі AI-розробника як застосовується штучний інтелект.pptxЗадачі AI-розробника як застосовується штучний інтелект.pptx
Задачі AI-розробника як застосовується штучний інтелект.pptx
 
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptxЩо треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
 
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
 
JavaScript Community Webinar #14 "Why Is Git Rebase?"
JavaScript Community Webinar #14 "Why Is Git Rebase?"JavaScript Community Webinar #14 "Why Is Git Rebase?"
JavaScript Community Webinar #14 "Why Is Git Rebase?"
 
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
 
Страх і сила помилок - IT Inside від GlobalLogic Education
Страх і сила помилок - IT Inside від GlobalLogic EducationСтрах і сила помилок - IT Inside від GlobalLogic Education
Страх і сила помилок - IT Inside від GlobalLogic Education
 
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
 
GlobalLogic QA Webinar “What does it take to become a Test Engineer”
GlobalLogic QA Webinar “What does it take to become a Test Engineer”GlobalLogic QA Webinar “What does it take to become a Test Engineer”
GlobalLogic QA Webinar “What does it take to become a Test Engineer”
 
“How to Secure Your Applications With a Keycloak?
“How to Secure Your Applications With a Keycloak?“How to Secure Your Applications With a Keycloak?
“How to Secure Your Applications With a Keycloak?
 
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
 
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
 
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
 
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
 
GlobalLogic Webinar "Introduction to Embedded QA"
GlobalLogic Webinar "Introduction to Embedded QA"GlobalLogic Webinar "Introduction to Embedded QA"
GlobalLogic Webinar "Introduction to Embedded QA"
 
C++ Webinar "Why Should You Learn C++ in 2021-22?"
C++ Webinar "Why Should You Learn C++ in 2021-22?"C++ Webinar "Why Should You Learn C++ in 2021-22?"
C++ Webinar "Why Should You Learn C++ in 2021-22?"
 
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...
 

Kürzlich hochgeladen

Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEselvakumar948
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 

Kürzlich hochgeladen (20)

Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 

Big Data in Advertising Industry — Oleksandr Fedirko, Danylo Stepanchuk

  • 2. 2 Confidential Big Data in Advertisement Industry
  • 3. 3 Confidential Agenda - Intro into Ad Exchange business area - Big Data tools overview - Architectural approach - JVM-based processing in Big Data analytics
  • 4. 4 Confidential Intro into Ad Exchange business area
  • 5. 5 Confidential Ad Evolution Reservation Buying ads sold via direct transactions between advertisers/agencies and publishers Ad Networks ad networks aggregate inventory and sold it to advertisers. Helped publishers by selling inventory they could not sell themselves Ad Exchanges & SSPs real-time marketplaces with a large pools of liquid inventory not sold in direct buys: SSPs have more controls for publishers to optimize yield DSPs Bidding technology designed to help advertisers/agencies target and optimize their buys across multiple ad exchanges/publisher inventory pools Private Exchanges & Automated Guaranteed Exclusive advertiser-to- publisher inventory relationship for programmatic purchasing in brand safe environments 1990s Now Direct Sold/ Guaranteed/ Reserved Indirect/ Programmatic/ Unreserved Programmatic Premium
  • 6. 6 Confidential SellersBuyers Ad Ecosystem. How it works? Ad Network Ad Network Agency DSP Ad Exchange SSP Publisher DMP/Data Supply Brand Audience RTB
  • 8. 8 Confidential What is Big Data? We’ve all heard the term “big data,” but you may not know exactly what it means. Most experts agree the term describes information that shares these three attributes:
  • 9. 9 Confidential Typical Big Data pipeline Data Sources - Structured - Unstructured Data Ingestion - Batch layer - Stream layer Storage BI / Data Warehouse Visualization and Reporting ToolsProcessing Layer - Data Mining - Machine Learning Governance and Privacy Security Quality Management High Scale; Low Cost
  • 10. 10 Confidential Storages (non-relational) Key-value Document Column-oriented Graph Full-text (search engine) BLOB
  • 11. 11 Confidential Data ingestion or ETL Batch Near to realtime Realtime Source ETL Destination
  • 14. 14 Confidential Spark Apache Spark is a unified analytics engine for large-scale data processing
  • 15. 15 Confidential MapReduce ● Good old, slow and reliable ● Written in Java ● Natively supports Java, though all JVM compatible languages are adaptable ● Easy to learn and tune ● Just batch processing ● Hard to implement complex pipelines ● Unit testing Spark ● “Brand-new”, fast and flexible ● Written in Scala ● Natively supports Scala and Java (R and Python) ● Provides fat pack of functionality ● Batch and micro-batch processing ● Support of complex pipelines is its thing ● Unit testing MapReduce vs Spark: Which one to pick up?
  • 17. 17 Confidential High level overview Bid PlatformAd Platform Buyer Buyer Buyer Analytical Platform Seller
  • 18. 18 Confidential Big Data analytics: What’s the challenge? Daily ● 65B of raw ad and bid events
  • 19. 19 Confidential Big Data analytics: What’s the challenge? Daily ● 65B of raw ad and bid events ● over 100 TB of serialized and compressed raw input data
  • 20. 20 Confidential Big Data analytics: What’s the challenge? Daily ● 65B of raw ad and bid events ● over 100 TB of serialized and compressed raw input data ● around 150K analytic queries over 110 dimensions in an analytic data store
  • 21. 21 Confidential Big Data analytics: What’s the challenge? Daily ● 65B of raw ad and bid events ● over 100 TB of serialized and compressed raw input data ● around 150K analytic queries over 110 dimensions in an analytic data store ● 4s of 98% query time and 1s of Avg query time
  • 22. 22 Confidential Big Data pipeline applied Ad & Bid Platforms Data Collector HDFS Druid Performance Analytics MapReduce Spark
  • 23. 23 Confidential Big Data pipeline applied Ad & Bid Platforms Data Collector HDFS Druid Performance Analytics MapReduce Spark
  • 26. 26 Confidential Let’s solve some problem: Keywords Seller “I want to have an opportunity to get performance reports beyond the standard account, site, zone, size, geography, etc”
  • 27. 27 Confidential Let’s solve some problem: Keywords Seller “I want to have an opportunity to get performance reports beyond the standard account, site, zone, size, geography, etc” Ad Exchange Company “I want to satisfy high demand of this functionality, let’s name it Keywords, but I also want to reduce processing and retention cost by servicing only sellers with limited number of different keywords”
  • 28. 28 Confidential Let’s solve some problem: Keywords Seller “I want to have an opportunity to get performance reports beyond the standard account, site, zone, size, geography, etc” Ad Exchange Company “I want to satisfy high demand of this functionality, let’s name it Keywords, but also want to reduce processing and retention cost by servicing only sellers with limited number of different keywords” Engineering “There are two steps to solve Keywords problem: first, we need to identify sellers which comply with a threshold; second, we need to prepare reports only for them”
  • 29. 29 Confidential Spark: Let’s write some code def getKeyword(AELog) => Option[ ( AccountId, Keyword ) ] AdLog.getDataset(inputPath)(sparkSession) .flatMap( getKeyword ) .distinct .mapValues(_ => 1L) .reduceByKey(_ + _) .filter { case (_, totalKeywords) => totalKeywords <= maxKeywordsNumber } .keys .collect() .toSet Step #1: Identify valid sellers (AdLogs: SeqFile[ID,AdLog], maxKeywords: Long) => Set[AccountId]
  • 30. 30 Confidential Spark: Let’s write some code case class KeywordsRecord( … ) // fields which represent dimensions and metrics object KeywordsRecord { .. } // functions pack to operate with input/output data AdLog.getDataset(inputPath)(sparkSession) .filter( adLog => validSellers.contains(adLog.getAccountId) ) .map( KeywordsRecord.fromAdLog ) .toDS .groupBy( KeywordsRecord.groupBy: _* ) // dimensions .agg( KeywordsRecord.aggregations.head, KeywordsRecord.aggregations.tail: _* ) // metrics .select( KeywordsRecord.allCols: _* ) .as[ KeywordsRecord ] .map( KeywordsRecord.toJson ) .write .text(outputPath) Step #2: Prepare Keywords Report (AdLogs: SeqFile[ID,AdLog], validSellers: Set[AccountId]) => TextFile[Json]
  • 31. 31 Confidential Spark: Let’s write some code object KeywordsApplication { def getValidSellers(inputPath, maxKeywordsNumber)(implicit SparkSession) def prepareReport(inputPath, outputPath, validSellers)(implicit SparkSession) def main(args: Array[String]) = { … implicit val sparkSession = SparkSession.builder() .appName(jobName) .getOrCreate val validSellers = getValidSellers(inputPath, maxKeywordsNumber) prepareReport(inputPath, outputPath, validSellers) … } } Put it together: Step #1 + Step#2 (AdLogs: SeqFile[ID,AdLog], maxKeywords: Long) => TextFile[Json]
  • 32. 32 Confidential Is this all about writing clean code?
  • 33. 33 Confidential Is this all about writing clean code? Nope! Network Bandwidth Storage I/OCPU RAM It may be a bottleneck
  • 34. 34 Confidential Is this all about writing clean code? Nope! Network Bandwidth Storage I/OCPU RAM Compression algorithms MapReduce & Spark jobs tuning Storage formats Data access patterns It may be a bottleneck It may help to overcome the bottleneck
  • 35. 35 Confidential MapReduce + Spark: One must use them right