SlideShare ist ein Scribd-Unternehmen logo
1 von 38
eBay EDW  Metadata Management and Applications Dec 2011 熊家治  eBay 数据分析平台架构师 [email_address]
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The Birth of eBay . . .  . . . sold for $14.83 USD Started with a  Broken  Laser Pointer . . . AuctionWeb was born on the Labor Day weekend in September 1995 Pierre Omidyar $30 eBay Founder
The Birth of eBay . . .  FREE Service Running Off from a Home Server . . . $240 USD/month Pierre Omidyar
The Birth of eBay . . .  Requesting for donations . . . Coins Personal Check Bills Money Order Coupons Movie Tickets
The Birth of eBay . . .  Start Profitable . . .
The Birth of eBay . . .  Initial Business Model and Target Users . . . Build  equitable electronic marketplace for Americans to buy and sell their stuff
eBay Facts 450+ Million Registered Users Over 2 Billion Photos 220+ Million Active Item Listing for sale 50,000 Categories 2 Petabytes  Stored 25 Petabytes Processed daily 300+ Features per quarter 100,000 lines  of code rolled out every 2 weeks 48 Billion SQL Calls Per day 5.5 Billion API Calls Per month > 4.4 GB Source Code - 16 Years After . . . Global Presents In 33 International Markets 10+ Million New Items Added Per Day $2,000+ USD Trading Value Per Second
Analytical Data Platforms Singularity EDW Low End Enterprise-class System Discover & Explore Analyze & Report 20-50  concurrent users 500+  concurrent users Enterprise-class System >5  concurrent users Structure the Unstructured Detect Patterns Hadoop Developer System EDW/ODW (Primary& Secondary) “ Compare User Activity against last year” Trending and Forecast Analysis (large history) Operational Analytics Transactional Analytics High volume ad hoc queries Contextual-Complex Analytics Deep, Seasonal, Consumable Data Sets Production Data Warehousing Large Concurrent User-base Image Fingerprinting Image Classification Pattern Recognition Detect Counterfeits & SNADs
eBay EDW ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
EDW Architecture
Closed loop, active Data Warehouse Site Databases Analytical Reporting Enterprise DW Raw data:  daily, hourly feeds Knowledge:  Integrated, aggregated, augmented www.ebay.com Trust & Safety Customer Support ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Wisdom:  informed, fact based actions Marketing
APD– Resource Distribution Chennai, India Cognizant Technology Services (on shore / off shore model) Shanghai, CN DW Core Team, APD Ops  anchor point for China based outsourcers (HP, DX).  Core competencies DW Development, Business System Analysis, Quality Assurance, Architecture, Project Management Office and Production Support. Seattle, WA DW Core Team & anchor point for India based outsourcing.  Core competencies in VLDB and highly efficient / scaleable arch (Next Gen). San Jose, CA BU Dedicated Teams (IMS, DMS, MRM, UBI), DW Core, and Arch & Ops.  Core competencies in rapid development, VLDB, MPP, business analysis, DW Dev.
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
EDW Metadata ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],We designed a tool to collect up-to-date ETL metadata automatically.
Subject Areas and Tables
Domains
Data Model Management
ETL Metadata Collecting Automation Meta Data Analysis Engine Meta Data Repository DBQL ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],TUM ,[object Object],[object Object],[object Object]
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],Killer App 1: Data Flow Diagram(DFD)
BEFORE/AFTER ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DFD Tool Architecture DBQL Table Usage Info ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],DFD Repository  is the collection of DFDs, we organize and display online Data Flow Analysis Engine DFD Meta Data DFD Repository
How to Read DFD? Step2: the step number is ordered by the job start time  Job Start/End Time(HH:MM:SS) The script(job) name to populate the table in the step The output table of step1, also, it is the input table of step2 Round Corner Rectangle: The upstream tables from other subject area Blue line:  Stands for the process  critical path  Set Background as gray to highlight the target table of the diagram
DFD’s   Homepage
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],Killer App 2: Data Rationalization ,[object Object]
Table Usage Metrics Platform ,[object Object]
Table Usage Metrics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],2011-07-20/2011-07-26 Primary Secondary TableName  Table Size(GB)  Table Skew Factor Daily CPU Cost Batch Hits End User Hits Loading Strategy  Finish Time  Table Size(GB)  Table Skew Factor Daily CPU Cost Batch Hits End User Hits Loading Strategy  Finish Time  Table 1 3,990 5 308,778 0 15 Daily Batch 15 3,969 2 323,822 0 26 Daily Batch 14 Table2  3,125 4 76,274 24,837 28,338 Daily Batch 2 3,071 2 124,307 13,710 21,126 Daily Batch 3
Data Rationalization Process
Typical Approaches ,[object Object],[object Object],[object Object],[object Object]
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ETL JOB RUNTIME INFO from all ETL SERVER UC4 TABLE USAGE MASTER DATA FLOW JOBTRACK REPOSITORY TERADATA QUERYLOG from TD1/TD2/TD3/TD5  TABLE DEPENDECY QUERY PATTERN QUERY USER BEHAVIOR  USER QUERY/BATCTH JOB ENHANCEMENT MDR TABLE USAGE INRO ETL JOB STATUS JOB TRACK REPOSITORY DATA SOURCE Applications … JOBTRACK   OVERVIEW
JOBTRACK FEATURES AUTOMATION for  Any  Table +  Any  ETL JOB REALTIME + HISTORY + FORECAST ALL  INFORMATION IN  ONE   PAGE NOT ONLY Dataflow, you can get all data about Data  info you need
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Other Applications
Q &A

Weitere ähnliche Inhalte

Was ist angesagt?

Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
InformaticaTrainingClasses
 
SAP BW vs Teradat; A White Paper
SAP BW vs Teradat; A White PaperSAP BW vs Teradat; A White Paper
SAP BW vs Teradat; A White Paper
Vipul Neema
 

Was ist angesagt? (20)

Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
 
User 2013-oracle-big-data-analytics-1971985
User 2013-oracle-big-data-analytics-1971985User 2013-oracle-big-data-analytics-1971985
User 2013-oracle-big-data-analytics-1971985
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 
Dimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleDimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with Example
 
02. Data Warehouse and OLAP
02. Data Warehouse and OLAP02. Data Warehouse and OLAP
02. Data Warehouse and OLAP
 
OLAP & Data Warehouse
OLAP & Data WarehouseOLAP & Data Warehouse
OLAP & Data Warehouse
 
Star schema
Star schemaStar schema
Star schema
 
An overview of data warehousing and OLAP technology
An overview of  data warehousing and OLAP technology An overview of  data warehousing and OLAP technology
An overview of data warehousing and OLAP technology
 
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases  to understand big data platformAnalysing data analytics use cases  to understand big data platform
Analysing data analytics use cases to understand big data platform
 
Star schema PPT
Star schema PPTStar schema PPT
Star schema PPT
 
Introducing to Datamining vs. OLAP - مقدمه و مقایسه ای بر داده کاوی و تحلیل ...
Introducing to Datamining vs. OLAP -  مقدمه و مقایسه ای بر داده کاوی و تحلیل ...Introducing to Datamining vs. OLAP -  مقدمه و مقایسه ای بر داده کاوی و تحلیل ...
Introducing to Datamining vs. OLAP - مقدمه و مقایسه ای بر داده کاوی و تحلیل ...
 
Business Intelligence: Data Warehouses
Business Intelligence: Data WarehousesBusiness Intelligence: Data Warehouses
Business Intelligence: Data Warehouses
 
SAP BW vs Teradat; A White Paper
SAP BW vs Teradat; A White PaperSAP BW vs Teradat; A White Paper
SAP BW vs Teradat; A White Paper
 
ETL QA
ETL QAETL QA
ETL QA
 
Module 02 teradata basics
Module 02 teradata basicsModule 02 teradata basics
Module 02 teradata basics
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
Oracle: DW Design
Oracle: DW DesignOracle: DW Design
Oracle: DW Design
 
OLAP technology
OLAP technologyOLAP technology
OLAP technology
 
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
 

Andere mochten auch

Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Erik Fransen
 

Andere mochten auch (6)

Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
 
Business Intelligence Release Management Best Practices
Business Intelligence Release Management Best PracticesBusiness Intelligence Release Management Best Practices
Business Intelligence Release Management Best Practices
 
Business intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lakeBusiness intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lake
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
 
85 business analyst interview questions and answers
85 business analyst interview questions and answers85 business analyst interview questions and answers
85 business analyst interview questions and answers
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 

Ähnlich wie eBay EDW元数据管理及应用

Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
vivekjv
 
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Cloudera, Inc.
 

Ähnlich wie eBay EDW元数据管理及应用 (20)

Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsight
 
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
 
Applying linear regression and predictive analytics
Applying linear regression and predictive analyticsApplying linear regression and predictive analytics
Applying linear regression and predictive analytics
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Database Shootout: What's best for BI?
Database Shootout: What's best for BI?
 
3dw
3dw3dw
3dw
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
 
Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data. Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data.
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapers
 
3dw
3dw3dw
3dw
 
MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra
 
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases  to understand big data platformAnalysing data analytics use cases  to understand big data platform
Analysing data analytics use cases to understand big data platform
 
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
Overview of business intelligence
Overview of business intelligenceOverview of business intelligence
Overview of business intelligence
 

Mehr von mysqlops

Mehr von mysqlops (20)

The simplethebeautiful
The simplethebeautifulThe simplethebeautiful
The simplethebeautiful
 
Oracle数据库分析函数详解
Oracle数据库分析函数详解Oracle数据库分析函数详解
Oracle数据库分析函数详解
 
Percona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-managementPercona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-management
 
Percona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replicationPercona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replication
 
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB ClusterPercona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimization
 
Pldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internalsPldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internals
 
DBA新人的述职报告
DBA新人的述职报告DBA新人的述职报告
DBA新人的述职报告
 
分布式爬虫
分布式爬虫分布式爬虫
分布式爬虫
 
MySQL应用优化实践
MySQL应用优化实践MySQL应用优化实践
MySQL应用优化实践
 
基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现
 
eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析
 
对MySQL DBA的一些思考
对MySQL DBA的一些思考对MySQL DBA的一些思考
对MySQL DBA的一些思考
 
QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示
 
腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事
 
分布式存储与TDDL
分布式存储与TDDL分布式存储与TDDL
分布式存储与TDDL
 
MySQL数据库生产环境维护
MySQL数据库生产环境维护MySQL数据库生产环境维护
MySQL数据库生产环境维护
 
Memcached
MemcachedMemcached
Memcached
 
DevOPS
DevOPSDevOPS
DevOPS
 
MySQL数据库开发的三十六条军规
MySQL数据库开发的三十六条军规MySQL数据库开发的三十六条军规
MySQL数据库开发的三十六条军规
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

eBay EDW元数据管理及应用

  • 1. eBay EDW Metadata Management and Applications Dec 2011 熊家治 eBay 数据分析平台架构师 [email_address]
  • 2.
  • 3. The Birth of eBay . . . . . . sold for $14.83 USD Started with a Broken Laser Pointer . . . AuctionWeb was born on the Labor Day weekend in September 1995 Pierre Omidyar $30 eBay Founder
  • 4. The Birth of eBay . . . FREE Service Running Off from a Home Server . . . $240 USD/month Pierre Omidyar
  • 5. The Birth of eBay . . . Requesting for donations . . . Coins Personal Check Bills Money Order Coupons Movie Tickets
  • 6. The Birth of eBay . . . Start Profitable . . .
  • 7. The Birth of eBay . . . Initial Business Model and Target Users . . . Build equitable electronic marketplace for Americans to buy and sell their stuff
  • 8. eBay Facts 450+ Million Registered Users Over 2 Billion Photos 220+ Million Active Item Listing for sale 50,000 Categories 2 Petabytes Stored 25 Petabytes Processed daily 300+ Features per quarter 100,000 lines of code rolled out every 2 weeks 48 Billion SQL Calls Per day 5.5 Billion API Calls Per month > 4.4 GB Source Code - 16 Years After . . . Global Presents In 33 International Markets 10+ Million New Items Added Per Day $2,000+ USD Trading Value Per Second
  • 9. Analytical Data Platforms Singularity EDW Low End Enterprise-class System Discover & Explore Analyze & Report 20-50 concurrent users 500+ concurrent users Enterprise-class System >5 concurrent users Structure the Unstructured Detect Patterns Hadoop Developer System EDW/ODW (Primary& Secondary) “ Compare User Activity against last year” Trending and Forecast Analysis (large history) Operational Analytics Transactional Analytics High volume ad hoc queries Contextual-Complex Analytics Deep, Seasonal, Consumable Data Sets Production Data Warehousing Large Concurrent User-base Image Fingerprinting Image Classification Pattern Recognition Detect Counterfeits & SNADs
  • 10.
  • 11.
  • 13.
  • 14. APD– Resource Distribution Chennai, India Cognizant Technology Services (on shore / off shore model) Shanghai, CN DW Core Team, APD Ops anchor point for China based outsourcers (HP, DX). Core competencies DW Development, Business System Analysis, Quality Assurance, Architecture, Project Management Office and Production Support. Seattle, WA DW Core Team & anchor point for India based outsourcing. Core competencies in VLDB and highly efficient / scaleable arch (Next Gen). San Jose, CA BU Dedicated Teams (IMS, DMS, MRM, UBI), DW Core, and Arch & Ops. Core competencies in rapid development, VLDB, MPP, business analysis, DW Dev.
  • 15.
  • 16.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25. How to Read DFD? Step2: the step number is ordered by the job start time Job Start/End Time(HH:MM:SS) The script(job) name to populate the table in the step The output table of step1, also, it is the input table of step2 Round Corner Rectangle: The upstream tables from other subject area Blue line: Stands for the process critical path Set Background as gray to highlight the target table of the diagram
  • 26. DFD’s Homepage
  • 27.
  • 28.
  • 29.
  • 30.
  • 32.
  • 33.
  • 34. ETL JOB RUNTIME INFO from all ETL SERVER UC4 TABLE USAGE MASTER DATA FLOW JOBTRACK REPOSITORY TERADATA QUERYLOG from TD1/TD2/TD3/TD5 TABLE DEPENDECY QUERY PATTERN QUERY USER BEHAVIOR USER QUERY/BATCTH JOB ENHANCEMENT MDR TABLE USAGE INRO ETL JOB STATUS JOB TRACK REPOSITORY DATA SOURCE Applications … JOBTRACK OVERVIEW
  • 35. JOBTRACK FEATURES AUTOMATION for Any Table + Any ETL JOB REALTIME + HISTORY + FORECAST ALL INFORMATION IN ONE PAGE NOT ONLY Dataflow, you can get all data about Data info you need
  • 36.
  • 37.