SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
BigData @ comScore
Michael Brown, CTO, comScore, Inc.
March 25th
, 2011
comScore is a Global Leader in Measuring the Digital World
NASDAQ SCOR
Clients 1600+ worldwide
Employees 1,000+
Headquarters Reston, VA
Global Coverage
170+ countries under measurement;
43 markets reported
Local Presence 30+ locations in 21 countries
2© comScore, Inc. Proprietary.
Local Presence 30+ locations in 21 countries
V0910
Broad Client Base and Deep Expertise Across Key Industries
Media Agencies Telecom/Mobile Financial Retail Travel CPG Pharma Technology
3© comScore, Inc. Proprietary. V0910
The Trusted Source for Digital Intelligence Across Vertical Markets
47 out of the top 50
4 out of the top 4
WIRELESS CARRIERS
9 out of the top 10
INVESTMENT BANKS
9 out of the top 10
9 out of the top 10
INTERNET SERVICE
PROVIDERS
9 out of the top 10
AUTO INSURERS
4© comScore, Inc. Proprietary.
47 out of the top 50
ONLINE PROPERTIES
45 out of the top 50
ADVERTISING AGENCIES
9 out of the top 10
MAJOR MEDIA COMPANIES
9 out of the top 10
PHARMACEUTICAL
COMPANIES
9 out of the top 10
CONSUMER FINANCE
COMPANIES
9 out of the top 10
CPG COMPANIES
V0910
comScore History of Leadership and Innovation
To measure the search market
To measure
video streaming
To provide behavioral ad effectiveness
To meter mobile user behavior
1st
To Unify census + panel measurement
5© comScore, Inc. Proprietary.
To build and project from 2 million+ longitudinal panel
To monitor and report e-commerce data
1
To deliver a worldwide Internet audience measurement
Global Shaper
Company
2010
V0910
Average Records Captured per Day (2005-2009)
800,000,000
1,000,000,000
1,200,000,000
1,400,000,000
1,600,000,000
1,800,000,000
6© comScore, Inc. Proprietary.
-
200,000,000
400,000,000
600,000,000
800,000,000
Launching the 3rd Generation
In 2009, in the midst of the recession, comScore decided to build and
release its 3rd
Generation Product – Unified Digital Measurement (UDM or
Hybrid)
Technology Goals
– Ramp up data collection
– Deploy new methodologies for data processing and analysis
– Be able to scale linearly to the environment to support growth
7© comScore, Inc. Proprietary.
– Be able to scale linearly to the environment to support growth
– Have yesterdays data available today
And one more thing … do it in 4 months or less.
Unified Digital Measurement™ (UDM) Establishes Platform For
Panel + Census Data Integration
Global
PERSON Measurement
Global
MACHINE Measurement
8© comScore, Inc. Proprietary.
PAGE TAGSPANEL
Unified Digital Measurement (UDM)
Patent-Pending Methodology
Adopted by 88% of Top U.S. Media Properties
V0910
How Does the Hybrid Process Work?
Collect Traffic from
PCs and devices
Clean Traffic – remove non-
human, bots, apply edit rules
9© comScore, Inc. Proprietary.
Apply comScore
URL Dictionary
Total Traffic Filtered Traffic
URL Dictionary (CFD): Advertising Industry “Currency”
Intelligent grouping of
Properties with 7+ levels of
detail
– Property (e.g., Yahoo!
Properties, Microsoft Sites)
– Media Title (e.g., Yahoo!, MSN)
10© comScore, Inc. Proprietary.
– Channel (e.g., Yahoo! Search,
MSN Homepages)
– Subchannel (e.g., Yahoo!
Image Search, MSNBC)
– Group/Subgroup (e.g., Yahoo!
Calendar, Today)
URL Dictionary (CFD) Coverage Statistics
11MM Unique Domains Average/Month in 2010
• Over 80% pages viewed from top 131K domains in 2010 vs. 392K in 2009
11© comScore, Inc. Proprietary.
• 2,360K patterns in January 2011represents 85% of all pages
• 1,254K syndicated entities in January 2010
• 41K patterns added/month in 2010.
Worldwide UDM™ Penetration
Europe
Austria 80%
Asia Pacific
Australia 91%
North America
Canada 94%
Latin America
Argentina 94%
Middle East & Africa
Israel 93%
Percentage of Machines Included in UDM Measurement
12© comScore, Inc. Proprietary. July 2010 Penetration Data
Austria 80%
Belgium 85%
Switzerland 84%
Germany 84%
Denmark 82%
Spain 90%
Finland 85%
France 91%
Ireland 91%
Italy 80%
Netherlands 88%
Norway 84%
Portugal 86%
Sweden 85%
United Kingdom 90%
Australia 91%
Hong Kong 88%
India 84%
Japan 73%
Malaysia 87%
New Zealand 88%
Singapore 91%
Canada 94%
United States 91%
Argentina 94%
Brazil 92%
Chile 94%
Colombia 95%
Mexico 93%
Puerto Rico 92%
Israel 93%
South Africa 73%
V0910
Worldwide Tags per Day
15,000,000,000
20,000,000,000
25,000,000,000
#ofrecords
13© comScore, Inc. Proprietary.
0
5,000,000,000
10,000,000,000
Jul
2009
Aug
2009
Sep
2009
Oct
2009
Nov
2009
Dec
2009
Jan
2010
Feb
2010
Mar
2010
Apr
2010
May
2010
Jun
2010
Jul
2010
Aug
2010
Sep
2010
Oct
2010
Nov
2010
Dec
2010
Jan
2011
Feb
2011
#ofrecords
Beacon Records Panel Records
Monthly Totals
300,000,000,000
400,000,000,000
500,000,000,000
600,000,000,000
#ofrecords
14© comScore, Inc. Proprietary.
0
100,000,000,000
200,000,000,000
300,000,000,000
Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
2009 2010 2011
#ofrecords
Beacon Records Panel Records
High Level Data Flow
Panel
ETL
15© comScore, Inc. Proprietary.
Census
ETL
Delivery
Enterprise Data Warehouse : Sybase IQ 15.2 Multiplex
EDW is currently comprised of 20 servers running Windows 2003 R2 x64
– Currently 220 Intel CPUs
– Dedicated EDW technical team of 3 DBAs and 1 Administrator
– Ability to grow compute capacity and storage capacity independently
EDW data repository housed on both EMC VMAX and Clarion
– 4 EDW instances (2 in Virginia and 2 in Illinois)
– One EDW instance is 147TB usable (app. 200TB of raw data)
16© comScore, Inc. Proprietary.
– One EDW instance is 147TB usable (app. 200TB of raw data)
– Production EDW Drive Layout
416 x 1TB SATA, RAID6, 14+2
42 x 600GB 15K, RAID1
8 X 400GB Flash, RAID5, 7+1
Current Capacity and Performance Metrics
– 1,835,412,793,799 Rows loaded
– 140TB in 14,168 tables
– Capable of Loading 56 Billion rows per hour
Subsystem
System designed using multiple sub systems
Easily take out and replace different components as demands changed
Moved from a single server to a cluster of servers in a few months in some
cases with first stage tag processing
Periodically redesign different subsystems to support increased
processing demands
17© comScore, Inc. Proprietary.
Many systems on their third generation of technology
Homegrown Distributed Processing
Reduced core
aggregation from
Reduce final
product creation
2002 – comScore distributed processing framework
Open Source
Hadoop
ScalabilityWall
18© comScore, Inc. Proprietary.
aggregation from
48 hours to 7 hours
product creation
from 24 hours to
2 hours
Hadoop
framework
ScalabilityWall
GreenPlum
GreenPlum MPP
– 80 Node Cluster: 1 Master; 6 ETL; 72 Workers
– Using Dell R510 with 12 600GB 15K RAID, 64GB RAM, 24 cores (HT)
– Support analytic end users with access to record level data, through a SQL
interface
– Ability to load over 400 billion rows in 8 hours
– Hourly data loading in place
19© comScore, Inc. Proprietary.
– Hourly data loading in place
– Allow the analysts to mine the data for the business uses
– Use for quick analysis of raw event data and for the ideation and creation of
new products
Hadoop
Hadoop
– Dev - 6x Dell 2950 w/6 1TB
– Prod - 10x Dell R710 w/ 6 600GB
– Prod in 2 weeks – 10x Dell R710 w/6 600GB & 20x Dell R510 w/12 2TB
– Moving large processing jobs that currently are constrained by our current
framework to Hadoop. We have some large analytical runs that currently go
for over 40 hours on 32 servers and we are re-engineering to reduce
20© comScore, Inc. Proprietary.
for over 40 hours on 32 servers and we are re-engineering to reduce
processing time.
– We have found that the Fair Scheduler works well for our job loads
– We use a “homegrown” workflow system (BORG) that manages tasks inside
and outside hadoop.
Sharding
Sharding divides work across multiple systems using different mechanisms
Shard data as far up stream as possible
Ability to break data into multiple chunks early in processing, enables ability to
compute capacity down stream to accommodate large volume increases in data
ingest
21© comScore, Inc. Proprietary.
Sorting
We use DMExpress from SyncSort across hundreds of servers this allows
for efficient data processing
We sort input data based on a column in advance
To calculate uniques, check if the prior value changed from the current
value and then increment a counter
We now have aggregation systems that can process over 50 GB of data
with 357 million rows in less than an hour on a Dell R710 2U serve
22© comScore, Inc. Proprietary.
with 357 million rows in less than an hour on a Dell R710 2U serve
Compression w/Sorting
Compress Log Files when processing large volumes of log data
Several advantages to Sorting Data First:
– Reduces the size of the data
– Improves application performance
Examples:
– 1 Hour of our data (313 GB raw, 815 million rows)
23© comScore, Inc. Proprietary.
1 Hour of our data (313 GB raw, 815 million rows)
– Standard compression of time ordered data is 93GB (30% of original)
– Standard compression on a 2 key sorted set is 56GB (18% of original)
– For one day it saves 800GB
– For one month it saves 25 TB
– For 90 days it saves 75TB
Big data makes you think differently
Question: How many distinct cookies over 3 months?
Data: 3 monthly tables with distinct cookies, indexed
Size: 10B records per table
Platform: Sybase IQ
Attempt: UNION select count(cookies) over 3 monthly tables
24© comScore, Inc. Proprietary.
– Union operator distincts
Result: FAIL. Out of temp space. Out of luck.
– Failed after 30 minutes.
Why? UNION performs a SELECT and then a DISTINCT (sorting 30B rows)
Rethink the problem!
INNER joins are cheaper
No sort, they use existing indexes
Remember set theory? Of course you do!
Let months be {A, B, C}
A B
∪ ∪
25© comScore, Inc. Proprietary.
INNER join on only 2 tables of data at a time
2 month intersections took 2 hours each and less taxing on memory
Used intersection of intermediate (indexed!) results… 5 mins
C
A ∪ B ∪ C = A + B + C – A ∩ B – A ∩ C – C ∩ B + A ∩ B ∩ C
A ∩ B ∩ C = (A ∩ B) ∩ (A ∩ C) ∩ (C ∩ B)
Total query time: 6.5 hours
TCO with Large Cluster Systems
Examine replication factor and disk configuration for systems with
replication built into the framework to support redundancy and
concurrency
Example:
Hadoop cluster that supports 108TB of base compressed data
Hypothetical Configurations:
26© comScore, Inc. Proprietary.
– Replication Factor of 3
R710 (6x drives, JBOD); requires 162 servers
R510 (12x drives JBOD); requires 68 servers
– Replication Factor of 2
R710 (6x drives, RAID 5); requires 129 servers
R510 (12x drives, RAID 5); requires 54 servers
Useful Factoids
Colorful, bite-sized graphical representations of the best discoveries we unearth.
27© comScore, Inc. Proprietary.
Visit www.comscoredatamine.com or follow @datagems for the latest gems.
Thank You!
Michael Brown
CTO
comScore, Inc.
mbrown@comscore.com
28© comScore, Inc. Proprietary.

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...
Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...
Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...Matt Stubbs
 
Meruvian - Introduction to MapR
Meruvian - Introduction to MapRMeruvian - Introduction to MapR
Meruvian - Introduction to MapRThe World Bank
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architecturesArun Kejariwal
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Big data processing with PubSub, Dataflow, and BigQuery
Big data processing with PubSub, Dataflow, and BigQueryBig data processing with PubSub, Dataflow, and BigQuery
Big data processing with PubSub, Dataflow, and BigQueryThuyen Ho
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The EdgeArun Kejariwal
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataMathieu Dumoulin
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
Rhino: Efficient Management of Very Large Distributed State for Stream Proces...
Rhino: Efficient Management of Very Large Distributed State for Stream Proces...Rhino: Efficient Management of Very Large Distributed State for Stream Proces...
Rhino: Efficient Management of Very Large Distributed State for Stream Proces...Bonaventura Del Monte
 
High Performance Big Data Loading for AWS: Deep Dive and Best Practices from ...
High Performance Big Data Loading for AWS: Deep Dive and Best Practices from ...High Performance Big Data Loading for AWS: Deep Dive and Best Practices from ...
High Performance Big Data Loading for AWS: Deep Dive and Best Practices from ...Amazon Web Services
 

Was ist angesagt? (20)

Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...
Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...
Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...
 
Meruvian - Introduction to MapR
Meruvian - Introduction to MapRMeruvian - Introduction to MapR
Meruvian - Introduction to MapR
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architectures
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Big data processing with PubSub, Dataflow, and BigQuery
Big data processing with PubSub, Dataflow, and BigQueryBig data processing with PubSub, Dataflow, and BigQuery
Big data processing with PubSub, Dataflow, and BigQuery
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The Edge
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
Rhino: Efficient Management of Very Large Distributed State for Stream Proces...
Rhino: Efficient Management of Very Large Distributed State for Stream Proces...Rhino: Efficient Management of Very Large Distributed State for Stream Proces...
Rhino: Efficient Management of Very Large Distributed State for Stream Proces...
 
High Performance Big Data Loading for AWS: Deep Dive and Best Practices from ...
High Performance Big Data Loading for AWS: Deep Dive and Best Practices from ...High Performance Big Data Loading for AWS: Deep Dive and Best Practices from ...
High Performance Big Data Loading for AWS: Deep Dive and Best Practices from ...
 

Andere mochten auch

Bitcoin 101 - Certified Bitcoin Professional Training Session
Bitcoin 101 - Certified Bitcoin Professional Training SessionBitcoin 101 - Certified Bitcoin Professional Training Session
Bitcoin 101 - Certified Bitcoin Professional Training SessionLisa Cheng
 
Broadband tech 2005
Broadband tech 2005Broadband tech 2005
Broadband tech 2005eaiti
 
KULTPRIT LookBook %231
KULTPRIT LookBook %231KULTPRIT LookBook %231
KULTPRIT LookBook %231Flavia Furtos
 
Daniel Niersbach Resume 2014
Daniel Niersbach Resume 2014Daniel Niersbach Resume 2014
Daniel Niersbach Resume 2014Daniel Niersbach
 
Official short presentation (eng)
Official short presentation (eng)Official short presentation (eng)
Official short presentation (eng)Ivelin Stoyanov
 
Spring2016Report
Spring2016ReportSpring2016Report
Spring2016ReportErika Hang
 
Have a taste of Cocktail Advertising - Digital & Social Media
Have a taste of Cocktail Advertising - Digital & Social MediaHave a taste of Cocktail Advertising - Digital & Social Media
Have a taste of Cocktail Advertising - Digital & Social MediaFlavia Furtos
 
How To Structure Large Applications With AngularJS
How To Structure Large Applications With AngularJSHow To Structure Large Applications With AngularJS
How To Structure Large Applications With AngularJSStefan Unterhofer
 
Video presentation
Video presentationVideo presentation
Video presentationszeming_teoh
 
Hitesh cross cultural comm in business
Hitesh cross cultural comm in businessHitesh cross cultural comm in business
Hitesh cross cultural comm in businessSolanki Hitesh
 
Ctolinux 2001
Ctolinux 2001Ctolinux 2001
Ctolinux 2001eaiti
 
Ping solutions overview_111904
Ping solutions overview_111904Ping solutions overview_111904
Ping solutions overview_111904eaiti
 
Cto forum nirav_kapadia_2006_03_31_2006
Cto forum nirav_kapadia_2006_03_31_2006Cto forum nirav_kapadia_2006_03_31_2006
Cto forum nirav_kapadia_2006_03_31_2006eaiti
 

Andere mochten auch (20)

Cosso cox
Cosso coxCosso cox
Cosso cox
 
Bitcoin 101 - Certified Bitcoin Professional Training Session
Bitcoin 101 - Certified Bitcoin Professional Training SessionBitcoin 101 - Certified Bitcoin Professional Training Session
Bitcoin 101 - Certified Bitcoin Professional Training Session
 
Broadband tech 2005
Broadband tech 2005Broadband tech 2005
Broadband tech 2005
 
KULTPRIT LookBook %231
KULTPRIT LookBook %231KULTPRIT LookBook %231
KULTPRIT LookBook %231
 
Daniel Niersbach Resume 2014
Daniel Niersbach Resume 2014Daniel Niersbach Resume 2014
Daniel Niersbach Resume 2014
 
Journal
JournalJournal
Journal
 
Psych comic strip
Psych comic stripPsych comic strip
Psych comic strip
 
Official short presentation (eng)
Official short presentation (eng)Official short presentation (eng)
Official short presentation (eng)
 
Meritlist nbf
Meritlist nbfMeritlist nbf
Meritlist nbf
 
Spring2016Report
Spring2016ReportSpring2016Report
Spring2016Report
 
Have a taste of Cocktail Advertising - Digital & Social Media
Have a taste of Cocktail Advertising - Digital & Social MediaHave a taste of Cocktail Advertising - Digital & Social Media
Have a taste of Cocktail Advertising - Digital & Social Media
 
How To Structure Large Applications With AngularJS
How To Structure Large Applications With AngularJSHow To Structure Large Applications With AngularJS
How To Structure Large Applications With AngularJS
 
Hitesh renuwel
Hitesh renuwelHitesh renuwel
Hitesh renuwel
 
English essay
English essayEnglish essay
English essay
 
Video presentation
Video presentationVideo presentation
Video presentation
 
Hitesh cross cultural comm in business
Hitesh cross cultural comm in businessHitesh cross cultural comm in business
Hitesh cross cultural comm in business
 
Ctolinux 2001
Ctolinux 2001Ctolinux 2001
Ctolinux 2001
 
Ping solutions overview_111904
Ping solutions overview_111904Ping solutions overview_111904
Ping solutions overview_111904
 
Awardees b
Awardees bAwardees b
Awardees b
 
Cto forum nirav_kapadia_2006_03_31_2006
Cto forum nirav_kapadia_2006_03_31_2006Cto forum nirav_kapadia_2006_03_31_2006
Cto forum nirav_kapadia_2006_03_31_2006
 

Ähnlich wie BigData @ comScore

How to Suceed in Hadoop
How to Suceed in HadoopHow to Suceed in Hadoop
How to Suceed in HadoopPrecisely
 
Demantra Case Study Doug
Demantra Case Study DougDemantra Case Study Doug
Demantra Case Study Dougsichie
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Germany
 
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...Teradata Aster
 
The Evolution of Data Architecture
The Evolution of Data ArchitectureThe Evolution of Data Architecture
The Evolution of Data ArchitectureWei-Chiu Chuang
 
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Edwin Poot
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsData Driven Innovation
 
DataOps: Control-M's role in data pipeline orchestration
DataOps: Control-M's role in data pipeline orchestrationDataOps: Control-M's role in data pipeline orchestration
DataOps: Control-M's role in data pipeline orchestrationpzjnjr6rsg
 
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...mattdenesuk
 
New Technologies For The Sustainable Enterprise; keynote @Wharton
New Technologies For The Sustainable Enterprise; keynote @WhartonNew Technologies For The Sustainable Enterprise; keynote @Wharton
New Technologies For The Sustainable Enterprise; keynote @WhartonPaul Hofmann
 
Inside 6 Dimensional Model for Industry 4.0 Smart Factory by Webonise
Inside 6 Dimensional Model for Industry 4.0 Smart Factory by WeboniseInside 6 Dimensional Model for Industry 4.0 Smart Factory by Webonise
Inside 6 Dimensional Model for Industry 4.0 Smart Factory by WeboniseWebonise Lab
 
Applying linear regression and predictive analytics
Applying linear regression and predictive analyticsApplying linear regression and predictive analytics
Applying linear regression and predictive analyticsMariaDB plc
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsSeeling Cheung
 
Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...
Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...
Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...COIICV
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBMongoDB
 
NI Automated Test Outlook 2016
NI Automated Test Outlook 2016NI Automated Test Outlook 2016
NI Automated Test Outlook 2016Hank Lydick
 
STS. Smarter devices. Smarter test systems.
STS. Smarter devices. Smarter test systems.STS. Smarter devices. Smarter test systems.
STS. Smarter devices. Smarter test systems.Hank Lydick
 
Pentaho Reporting Solution for a Leading Energy Company in US
Pentaho Reporting Solution for a Leading Energy Company in USPentaho Reporting Solution for a Leading Energy Company in US
Pentaho Reporting Solution for a Leading Energy Company in USSigma Infosolutions, LLC
 

Ähnlich wie BigData @ comScore (20)

How to Suceed in Hadoop
How to Suceed in HadoopHow to Suceed in Hadoop
How to Suceed in Hadoop
 
Demantra Case Study Doug
Demantra Case Study DougDemantra Case Study Doug
Demantra Case Study Doug
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
 
comScore
comScorecomScore
comScore
 
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
 
The Evolution of Data Architecture
The Evolution of Data ArchitectureThe Evolution of Data Architecture
The Evolution of Data Architecture
 
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and Analytics
 
DataOps: Control-M's role in data pipeline orchestration
DataOps: Control-M's role in data pipeline orchestrationDataOps: Control-M's role in data pipeline orchestration
DataOps: Control-M's role in data pipeline orchestration
 
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
 
New Technologies For The Sustainable Enterprise; keynote @Wharton
New Technologies For The Sustainable Enterprise; keynote @WhartonNew Technologies For The Sustainable Enterprise; keynote @Wharton
New Technologies For The Sustainable Enterprise; keynote @Wharton
 
Inside 6 Dimensional Model for Industry 4.0 Smart Factory by Webonise
Inside 6 Dimensional Model for Industry 4.0 Smart Factory by WeboniseInside 6 Dimensional Model for Industry 4.0 Smart Factory by Webonise
Inside 6 Dimensional Model for Industry 4.0 Smart Factory by Webonise
 
Applying linear regression and predictive analytics
Applying linear regression and predictive analyticsApplying linear regression and predictive analytics
Applying linear regression and predictive analytics
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with Telematics
 
Bigdata
BigdataBigdata
Bigdata
 
Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...
Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...
Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
 
NI Automated Test Outlook 2016
NI Automated Test Outlook 2016NI Automated Test Outlook 2016
NI Automated Test Outlook 2016
 
STS. Smarter devices. Smarter test systems.
STS. Smarter devices. Smarter test systems.STS. Smarter devices. Smarter test systems.
STS. Smarter devices. Smarter test systems.
 
Pentaho Reporting Solution for a Leading Energy Company in US
Pentaho Reporting Solution for a Leading Energy Company in USPentaho Reporting Solution for a Leading Energy Company in US
Pentaho Reporting Solution for a Leading Energy Company in US
 

Mehr von eaiti

Handheld device med_care_2001
Handheld device med_care_2001Handheld device med_care_2001
Handheld device med_care_2001eaiti
 
Dc roundtablesmall webservices_2002
Dc roundtablesmall webservices_2002Dc roundtablesmall webservices_2002
Dc roundtablesmall webservices_2002eaiti
 
Middleware 2002
Middleware 2002Middleware 2002
Middleware 2002eaiti
 
J2ee 2000
J2ee 2000J2ee 2000
J2ee 2000eaiti
 
Xp presentation 2003
Xp presentation 2003Xp presentation 2003
Xp presentation 2003eaiti
 
Push to pull
Push to pullPush to pull
Push to pulleaiti
 
Intrusion detection 2001
Intrusion detection 2001Intrusion detection 2001
Intrusion detection 2001eaiti
 
Cloud mz cto_roundtable
Cloud mz cto_roundtableCloud mz cto_roundtable
Cloud mz cto_roundtableeaiti
 
Mobile 2000
Mobile 2000Mobile 2000
Mobile 2000eaiti
 
Stateof cto career_2002
Stateof cto career_2002Stateof cto career_2002
Stateof cto career_2002eaiti
 
Dions globalsoa web2presentation1_2006
Dions globalsoa web2presentation1_2006Dions globalsoa web2presentation1_2006
Dions globalsoa web2presentation1_2006eaiti
 
Thads globalsoa web2presentation2_2006
Thads globalsoa web2presentation2_2006Thads globalsoa web2presentation2_2006
Thads globalsoa web2presentation2_2006eaiti
 
Social apps 3_1_2008
Social apps 3_1_2008Social apps 3_1_2008
Social apps 3_1_2008eaiti
 
It outsourcing 2005
It outsourcing 2005It outsourcing 2005
It outsourcing 2005eaiti
 
Washdc cto-0905-2003
Washdc cto-0905-2003Washdc cto-0905-2003
Washdc cto-0905-2003eaiti
 
Quantum technology
Quantum technologyQuantum technology
Quantum technologyeaiti
 
Hemispheres of Data
Hemispheres of DataHemispheres of Data
Hemispheres of Dataeaiti
 
Enterprise Mobility Management
Enterprise Mobility ManagementEnterprise Mobility Management
Enterprise Mobility Managementeaiti
 
Greenplum: Driving the future of Data Warehousing and Analytics
Greenplum: Driving the future of Data Warehousing and AnalyticsGreenplum: Driving the future of Data Warehousing and Analytics
Greenplum: Driving the future of Data Warehousing and Analyticseaiti
 

Mehr von eaiti (19)

Handheld device med_care_2001
Handheld device med_care_2001Handheld device med_care_2001
Handheld device med_care_2001
 
Dc roundtablesmall webservices_2002
Dc roundtablesmall webservices_2002Dc roundtablesmall webservices_2002
Dc roundtablesmall webservices_2002
 
Middleware 2002
Middleware 2002Middleware 2002
Middleware 2002
 
J2ee 2000
J2ee 2000J2ee 2000
J2ee 2000
 
Xp presentation 2003
Xp presentation 2003Xp presentation 2003
Xp presentation 2003
 
Push to pull
Push to pullPush to pull
Push to pull
 
Intrusion detection 2001
Intrusion detection 2001Intrusion detection 2001
Intrusion detection 2001
 
Cloud mz cto_roundtable
Cloud mz cto_roundtableCloud mz cto_roundtable
Cloud mz cto_roundtable
 
Mobile 2000
Mobile 2000Mobile 2000
Mobile 2000
 
Stateof cto career_2002
Stateof cto career_2002Stateof cto career_2002
Stateof cto career_2002
 
Dions globalsoa web2presentation1_2006
Dions globalsoa web2presentation1_2006Dions globalsoa web2presentation1_2006
Dions globalsoa web2presentation1_2006
 
Thads globalsoa web2presentation2_2006
Thads globalsoa web2presentation2_2006Thads globalsoa web2presentation2_2006
Thads globalsoa web2presentation2_2006
 
Social apps 3_1_2008
Social apps 3_1_2008Social apps 3_1_2008
Social apps 3_1_2008
 
It outsourcing 2005
It outsourcing 2005It outsourcing 2005
It outsourcing 2005
 
Washdc cto-0905-2003
Washdc cto-0905-2003Washdc cto-0905-2003
Washdc cto-0905-2003
 
Quantum technology
Quantum technologyQuantum technology
Quantum technology
 
Hemispheres of Data
Hemispheres of DataHemispheres of Data
Hemispheres of Data
 
Enterprise Mobility Management
Enterprise Mobility ManagementEnterprise Mobility Management
Enterprise Mobility Management
 
Greenplum: Driving the future of Data Warehousing and Analytics
Greenplum: Driving the future of Data Warehousing and AnalyticsGreenplum: Driving the future of Data Warehousing and Analytics
Greenplum: Driving the future of Data Warehousing and Analytics
 

Kürzlich hochgeladen

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Kürzlich hochgeladen (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

BigData @ comScore

  • 1. BigData @ comScore Michael Brown, CTO, comScore, Inc. March 25th , 2011
  • 2. comScore is a Global Leader in Measuring the Digital World NASDAQ SCOR Clients 1600+ worldwide Employees 1,000+ Headquarters Reston, VA Global Coverage 170+ countries under measurement; 43 markets reported Local Presence 30+ locations in 21 countries 2© comScore, Inc. Proprietary. Local Presence 30+ locations in 21 countries V0910
  • 3. Broad Client Base and Deep Expertise Across Key Industries Media Agencies Telecom/Mobile Financial Retail Travel CPG Pharma Technology 3© comScore, Inc. Proprietary. V0910
  • 4. The Trusted Source for Digital Intelligence Across Vertical Markets 47 out of the top 50 4 out of the top 4 WIRELESS CARRIERS 9 out of the top 10 INVESTMENT BANKS 9 out of the top 10 9 out of the top 10 INTERNET SERVICE PROVIDERS 9 out of the top 10 AUTO INSURERS 4© comScore, Inc. Proprietary. 47 out of the top 50 ONLINE PROPERTIES 45 out of the top 50 ADVERTISING AGENCIES 9 out of the top 10 MAJOR MEDIA COMPANIES 9 out of the top 10 PHARMACEUTICAL COMPANIES 9 out of the top 10 CONSUMER FINANCE COMPANIES 9 out of the top 10 CPG COMPANIES V0910
  • 5. comScore History of Leadership and Innovation To measure the search market To measure video streaming To provide behavioral ad effectiveness To meter mobile user behavior 1st To Unify census + panel measurement 5© comScore, Inc. Proprietary. To build and project from 2 million+ longitudinal panel To monitor and report e-commerce data 1 To deliver a worldwide Internet audience measurement Global Shaper Company 2010 V0910
  • 6. Average Records Captured per Day (2005-2009) 800,000,000 1,000,000,000 1,200,000,000 1,400,000,000 1,600,000,000 1,800,000,000 6© comScore, Inc. Proprietary. - 200,000,000 400,000,000 600,000,000 800,000,000
  • 7. Launching the 3rd Generation In 2009, in the midst of the recession, comScore decided to build and release its 3rd Generation Product – Unified Digital Measurement (UDM or Hybrid) Technology Goals – Ramp up data collection – Deploy new methodologies for data processing and analysis – Be able to scale linearly to the environment to support growth 7© comScore, Inc. Proprietary. – Be able to scale linearly to the environment to support growth – Have yesterdays data available today And one more thing … do it in 4 months or less.
  • 8. Unified Digital Measurement™ (UDM) Establishes Platform For Panel + Census Data Integration Global PERSON Measurement Global MACHINE Measurement 8© comScore, Inc. Proprietary. PAGE TAGSPANEL Unified Digital Measurement (UDM) Patent-Pending Methodology Adopted by 88% of Top U.S. Media Properties V0910
  • 9. How Does the Hybrid Process Work? Collect Traffic from PCs and devices Clean Traffic – remove non- human, bots, apply edit rules 9© comScore, Inc. Proprietary. Apply comScore URL Dictionary Total Traffic Filtered Traffic
  • 10. URL Dictionary (CFD): Advertising Industry “Currency” Intelligent grouping of Properties with 7+ levels of detail – Property (e.g., Yahoo! Properties, Microsoft Sites) – Media Title (e.g., Yahoo!, MSN) 10© comScore, Inc. Proprietary. – Channel (e.g., Yahoo! Search, MSN Homepages) – Subchannel (e.g., Yahoo! Image Search, MSNBC) – Group/Subgroup (e.g., Yahoo! Calendar, Today)
  • 11. URL Dictionary (CFD) Coverage Statistics 11MM Unique Domains Average/Month in 2010 • Over 80% pages viewed from top 131K domains in 2010 vs. 392K in 2009 11© comScore, Inc. Proprietary. • 2,360K patterns in January 2011represents 85% of all pages • 1,254K syndicated entities in January 2010 • 41K patterns added/month in 2010.
  • 12. Worldwide UDM™ Penetration Europe Austria 80% Asia Pacific Australia 91% North America Canada 94% Latin America Argentina 94% Middle East & Africa Israel 93% Percentage of Machines Included in UDM Measurement 12© comScore, Inc. Proprietary. July 2010 Penetration Data Austria 80% Belgium 85% Switzerland 84% Germany 84% Denmark 82% Spain 90% Finland 85% France 91% Ireland 91% Italy 80% Netherlands 88% Norway 84% Portugal 86% Sweden 85% United Kingdom 90% Australia 91% Hong Kong 88% India 84% Japan 73% Malaysia 87% New Zealand 88% Singapore 91% Canada 94% United States 91% Argentina 94% Brazil 92% Chile 94% Colombia 95% Mexico 93% Puerto Rico 92% Israel 93% South Africa 73% V0910
  • 13. Worldwide Tags per Day 15,000,000,000 20,000,000,000 25,000,000,000 #ofrecords 13© comScore, Inc. Proprietary. 0 5,000,000,000 10,000,000,000 Jul 2009 Aug 2009 Sep 2009 Oct 2009 Nov 2009 Dec 2009 Jan 2010 Feb 2010 Mar 2010 Apr 2010 May 2010 Jun 2010 Jul 2010 Aug 2010 Sep 2010 Oct 2010 Nov 2010 Dec 2010 Jan 2011 Feb 2011 #ofrecords Beacon Records Panel Records
  • 14. Monthly Totals 300,000,000,000 400,000,000,000 500,000,000,000 600,000,000,000 #ofrecords 14© comScore, Inc. Proprietary. 0 100,000,000,000 200,000,000,000 300,000,000,000 Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb 2009 2010 2011 #ofrecords Beacon Records Panel Records
  • 15. High Level Data Flow Panel ETL 15© comScore, Inc. Proprietary. Census ETL Delivery
  • 16. Enterprise Data Warehouse : Sybase IQ 15.2 Multiplex EDW is currently comprised of 20 servers running Windows 2003 R2 x64 – Currently 220 Intel CPUs – Dedicated EDW technical team of 3 DBAs and 1 Administrator – Ability to grow compute capacity and storage capacity independently EDW data repository housed on both EMC VMAX and Clarion – 4 EDW instances (2 in Virginia and 2 in Illinois) – One EDW instance is 147TB usable (app. 200TB of raw data) 16© comScore, Inc. Proprietary. – One EDW instance is 147TB usable (app. 200TB of raw data) – Production EDW Drive Layout 416 x 1TB SATA, RAID6, 14+2 42 x 600GB 15K, RAID1 8 X 400GB Flash, RAID5, 7+1 Current Capacity and Performance Metrics – 1,835,412,793,799 Rows loaded – 140TB in 14,168 tables – Capable of Loading 56 Billion rows per hour
  • 17. Subsystem System designed using multiple sub systems Easily take out and replace different components as demands changed Moved from a single server to a cluster of servers in a few months in some cases with first stage tag processing Periodically redesign different subsystems to support increased processing demands 17© comScore, Inc. Proprietary. Many systems on their third generation of technology
  • 18. Homegrown Distributed Processing Reduced core aggregation from Reduce final product creation 2002 – comScore distributed processing framework Open Source Hadoop ScalabilityWall 18© comScore, Inc. Proprietary. aggregation from 48 hours to 7 hours product creation from 24 hours to 2 hours Hadoop framework ScalabilityWall
  • 19. GreenPlum GreenPlum MPP – 80 Node Cluster: 1 Master; 6 ETL; 72 Workers – Using Dell R510 with 12 600GB 15K RAID, 64GB RAM, 24 cores (HT) – Support analytic end users with access to record level data, through a SQL interface – Ability to load over 400 billion rows in 8 hours – Hourly data loading in place 19© comScore, Inc. Proprietary. – Hourly data loading in place – Allow the analysts to mine the data for the business uses – Use for quick analysis of raw event data and for the ideation and creation of new products
  • 20. Hadoop Hadoop – Dev - 6x Dell 2950 w/6 1TB – Prod - 10x Dell R710 w/ 6 600GB – Prod in 2 weeks – 10x Dell R710 w/6 600GB & 20x Dell R510 w/12 2TB – Moving large processing jobs that currently are constrained by our current framework to Hadoop. We have some large analytical runs that currently go for over 40 hours on 32 servers and we are re-engineering to reduce 20© comScore, Inc. Proprietary. for over 40 hours on 32 servers and we are re-engineering to reduce processing time. – We have found that the Fair Scheduler works well for our job loads – We use a “homegrown” workflow system (BORG) that manages tasks inside and outside hadoop.
  • 21. Sharding Sharding divides work across multiple systems using different mechanisms Shard data as far up stream as possible Ability to break data into multiple chunks early in processing, enables ability to compute capacity down stream to accommodate large volume increases in data ingest 21© comScore, Inc. Proprietary.
  • 22. Sorting We use DMExpress from SyncSort across hundreds of servers this allows for efficient data processing We sort input data based on a column in advance To calculate uniques, check if the prior value changed from the current value and then increment a counter We now have aggregation systems that can process over 50 GB of data with 357 million rows in less than an hour on a Dell R710 2U serve 22© comScore, Inc. Proprietary. with 357 million rows in less than an hour on a Dell R710 2U serve
  • 23. Compression w/Sorting Compress Log Files when processing large volumes of log data Several advantages to Sorting Data First: – Reduces the size of the data – Improves application performance Examples: – 1 Hour of our data (313 GB raw, 815 million rows) 23© comScore, Inc. Proprietary. 1 Hour of our data (313 GB raw, 815 million rows) – Standard compression of time ordered data is 93GB (30% of original) – Standard compression on a 2 key sorted set is 56GB (18% of original) – For one day it saves 800GB – For one month it saves 25 TB – For 90 days it saves 75TB
  • 24. Big data makes you think differently Question: How many distinct cookies over 3 months? Data: 3 monthly tables with distinct cookies, indexed Size: 10B records per table Platform: Sybase IQ Attempt: UNION select count(cookies) over 3 monthly tables 24© comScore, Inc. Proprietary. – Union operator distincts Result: FAIL. Out of temp space. Out of luck. – Failed after 30 minutes. Why? UNION performs a SELECT and then a DISTINCT (sorting 30B rows)
  • 25. Rethink the problem! INNER joins are cheaper No sort, they use existing indexes Remember set theory? Of course you do! Let months be {A, B, C} A B ∪ ∪ 25© comScore, Inc. Proprietary. INNER join on only 2 tables of data at a time 2 month intersections took 2 hours each and less taxing on memory Used intersection of intermediate (indexed!) results… 5 mins C A ∪ B ∪ C = A + B + C – A ∩ B – A ∩ C – C ∩ B + A ∩ B ∩ C A ∩ B ∩ C = (A ∩ B) ∩ (A ∩ C) ∩ (C ∩ B) Total query time: 6.5 hours
  • 26. TCO with Large Cluster Systems Examine replication factor and disk configuration for systems with replication built into the framework to support redundancy and concurrency Example: Hadoop cluster that supports 108TB of base compressed data Hypothetical Configurations: 26© comScore, Inc. Proprietary. – Replication Factor of 3 R710 (6x drives, JBOD); requires 162 servers R510 (12x drives JBOD); requires 68 servers – Replication Factor of 2 R710 (6x drives, RAID 5); requires 129 servers R510 (12x drives, RAID 5); requires 54 servers
  • 27. Useful Factoids Colorful, bite-sized graphical representations of the best discoveries we unearth. 27© comScore, Inc. Proprietary. Visit www.comscoredatamine.com or follow @datagems for the latest gems.
  • 28. Thank You! Michael Brown CTO comScore, Inc. mbrown@comscore.com 28© comScore, Inc. Proprietary.