Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×
Nächste SlideShare
Extreme Analytics @ eBay
Extreme Analytics @ eBay
Wird geladen in …3

Hier ansehen

1 von 61 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)


Ähnlich wie Extreme Analytics @ eBay (20)

Weitere von DataWorks Summit/Hadoop Summit (20)


Aktuellste (20)

Extreme Analytics @ eBay

  1. 1. Extreme Analytics @ eBay Evolution of GovernedSelf Service Analytics
  2. 2. Agenda •eBay Today •Big Data @ eBay •The HOW? •Q&A PRESENTATION TITLE GOES HERE 2
  4. 4. Most Powerful Selling Platform For business sellers: the potential to drive profitable sales and build a brand For consumer sellers: an easy way to declutter, sell and make money A partnership not a competition Best Choice Providing the greatest selection of inventory for our buyers From new, everyday items to rare and unique goods And incredible deals only found on eBay Most Relevance A shopping experience that is simple, data-driven and personalized Enabling buyers to easily find, compare and purchase items they need and want Highlighting the unique value that eBay brings OUR STRATEGY
  5. 5. EBAY INC AT A GLANCE $2.1B Revenue in Q1 2016 $20.5B GMV in Q1 2016 162M Global Active Buyers 57% International revenue Q1 2016 data $9B Mobile Volume 314M App downloads
  6. 6. EBAY MARKETPLACE AT A GLANCE $19.6B GMV in Q1 2016 9.5M New listings added via mobile per week 300M Searches each day 63% Transactions that ship for free (in US, UK, DE) 79% Items sold as new Q1 2016 data ~900M Live listings One of the world’s largest and most vibrant marketplaces
  7. 7. VELOCITY STATS US 3 car parts or accessories are sold every A smartphone is sold every A dress is sold every 1 sec 4 sec 6 sec UK A necklace is sold every A make-up product is sold every A Lego product is sold every 10 sec 3 sec 19 sec GERMANY A truck or car is sold every A pair of women’s jeans is sold every A video game is sold every 5 min 4 sec 11 sec AUSTRALIA A pair of men’s sunglasses is sold every A home décor item is sold every A car or truck part is sold every 1 min 12 sec 4 sec
  8. 8. MOBILE VELOCITY STATS US A woman’s handbag is sold every A car or truck is sold every An action figure is sold every 10 sec 5 min 10 sec UK A tablet is sold every A cookware item is sold every A car is sold every 1 min 6 sec 2 min GERMANY A pair of women’s shoes is sold every A watch is sold every A tire or car part is sold every 20 sec 48 sec 35 sec AUSTRALIA A piece of jewelry is sold every A baby clothing item is sold every A motorcycle part is sold every 12 sec 46 sec 51 sec
  9. 9. THREE KEY TRENDS ARE REDEFINING COMMERCE Smart CommerceSeamless Commerce True Global Commerce
  11. 11. TRUE GLOBAL COMMERCE of eBay’s business is international57% of commercial sellers engage in exporting* 95% languages 8 *Sellers with $10,000 or more/year in sales
  12. 12. SMART COMMERCE Identify an interesting set of candidate items, trends, events, etc. Personalize the results Inspiration at scale!
  14. 14. Volume Variety Velocity VALUE
  17. 17. Big Data @ eBay
  18. 18. BIG Data VVC 20 >50 TB/day new data >100 PB/day >100 Trillion pairs of information Millionsof queries/day >7500 business users & analysts >50k chains of logic 24x7x365 99.98+%Availability turning over a TB every second Active/Active Near-Real-time >100kdata elements Always online Processed >1.5 x 1012 new records/day
  19. 19. 21 TECHNOLOGY EDW Analytics Application Analysts and Data Scientists Management Integrators Business Owners Application Servers Data Processing Clusters Aggregation & Summarization Visualization & Reporting ClicktoInsights Kylin OVER Billings EVENTS FROM 162M EBAY BUYERS CAPTURED, TRANSFORMED, SYNTHESIZED TO PROVIDE ACTIONABLE INSIGHTS
  20. 20. 22 eBay has one of the largest most active data platforms in the world. eBay has one of the largest most active data platforms in the world with a diverse set of users.
  23. 23. 25 Agile Data Warehousing EDWs + VDMs1
  24. 24. Semi-Structured SQL++Structured SQL Low End Enterprise-class System Contextual-complex analytics, deep, seasonal, consumable datasets Production data warehousing, large concurrent user base Discover & Explore Analyze & Report Enterprise-class System Unstructured JAVA / C Structure the unstructured, detect patterns Commodity Hardware System Singularity HadoopTeradata Enterprise Data Warehouse DISCOVER & EXPLOREANALYZE & REPORT 26
  25. 25. page 27 Biggest complexity drivers are  Maintaining separate databases  weekly/daily/hourly data transfers  Data inconsistencies  Data duplication  Increased complexity  Loss of centralized viz & control DMs A data mart cannot be cheap enough to justify its existence
  26. 26. PRESENTATION TITLE GOES HERE 28 ...the wrong way Data Marts in the Cloud Customer Customer Customer Customer Customer Product Customer Product Customer Product Customer Product Trx Customer Product Trx Customer Product Trx Customer Product Trx
  27. 27. PRESENTATION TITLE GOES HERE 29 Virtual Data Marts Customer Product Transactio n Behavior Virtual DataMart Virtual DataMart Virtual DataMart Virtual DataMart Virtual DataMart Virtual DataMartVirtual DataMart Virtual DataMart Virtual DataMart
  28. 28. 30 Deep Data Platforms Hadoop + ddDBMS2
  29. 29. Semi-Structured SQL++Structured SQL Low End Enterprise-class System Contextual-complex analytics, deep, seasonal, consumable datasets Production data warehousing, large concurrent user base Discover & Explore Analyze & Report Enterprise-class System Unstructured JAVA / C Structure the unstructured, detect patterns Commodity Hardware System Singularity HadoopTeradata Deep-Data Platforms DISCOVER & EXPLOREANALYZE & REPORT 31 Behavioral Data Centric
  30. 30. 32 The Data Hub Collaborative Analytics3
  31. 31. 33 Collaborative Analytics Compose Write and discover queries with ease; understand and reuse code easily; drives time and savings. Catalog & Govern Document and discover data and concepts; structured and crowd sourced tagging of content in a stewarded environment. Answers Fast, trusted answers for everyone; search for analytic products (metrics, reports, KPIs). Forensics Insightful IT and operational data to expose and eliminateredundancies Experts / Stewards Govern Simple Data Management Analyst Compose Better,Faster Queries Business Users Answers Google for your Data IT Forensics Intelligence about your data
  32. 32. Wiki + metadata repository Alation SQL Assistant Metadata repository + + Storytelling Mixing textual analysis with graphs WHAT IT’S LIKECOLLABORATION TOOL 2013 2009 2014 AnswerHub Discussion forum moderated by support DataHub + for data2010 COLLABORATION JOURNEY 2014
  33. 33. 35 The App Platform Analytics Application Platform4
  34. 34. ENTERPRISE DATA PLATFORM 36 Data Warehouse Data Streams Batch Humans Sets of data Streams Systems Sets of data Data Services Services Applications Specific calls Populated Used by How Enterprise Populated Used by How
  35. 35. DQRecon Data Processing Ecosystem 37 Curated Streams Applications Data Services ApplicationAnalytics Data Scientists Analysts BU/PD Leaders Site DBs Real-Time Data Sources External Data Sources ETL Enterprise Data Warehouse Deep Data Analytics Platform Hadoop Engineers Stream Processing Caching DOE DQFirewall Buyers/Sellers
  37. 37. 39 Automated Decision Support Signal Detection @ Scale5
  38. 38. Automated Signal Detection 40 Prediction – anomaly signal detection Massively scalable and automated signal detection and prediction  Phase 1: Signal detection  Phase 2: Root Cause analysis
  39. 39. 41 ANALYTICS IN EBAY Measure Everything Embedded in our daily life Bottom-up & Top-down Think and Live Analytics Always But know when to avoid Analysis Paralysis! Analytics DNA
  40. 40. page 43 IKEA Job Interview Please have a seat
  41. 41. page 44 Analytics at eBay Go use data
  42. 42. 45 The Diverse User Community
  43. 43. page 46 Diverse User Community Data Scientists Financial Planning & Analytics Site Analysts Business Analysts Consumers One-off Analysis Descriptive, Predictive & Prescriptive Modeling Experimentation & Mining Standard Reports Dashboards Hadoop R/SAS/SQL on Teradata Excel Tableau MicroStrategy , Diverse Needs& Diverse Tools
  44. 44. 47 The Analytics Environment at eBay  Direct SQL access  User datasets  MicroStrategy  Tableau  Web based App  1000+ files  10,000+ tables  5000+ reports  10,000+  100+ named apps  Tough to find the right metrics and reports  Hard to build new metrics and reports  Impossible to know which metrics and reports are correct vs old
  45. 45. 48
  46. 46. “We can’t solve problems by using the same kind of thinking we used when we created them.” • - Albert Einstein
  47. 47. 51 Organizing for Success Governed Self Service0
  48. 48. Self-service Strategy changes everything… 52 The data user experience is…. Incoherent Isolated Disjointed Uncertain Consolidates all knowledge about data for “Just-in- Time” use Unifies a consistent set of Data Products on the hub Makes it easy to find and trace the path from Business Insights and summaries to the underlying SQL, metrics and metadata Delivers transparency and build trust with Data Governance and Stewardship
  49. 49. Comprehensive & Documented -- Self-directed Experience Insights Hub ONE portal , ONE framework, ONE analytics app Store Targeted & Simplified -- Self-service Experiences SQL Writer Search Collaberation Knowledge Management Subject Matter Expert (SME) Directory and Subject Domain pages Business Metrics Glossary Certified data assets, endorsements, descriptions. MoreDetailedMoreSummary TechnicalAnalysisBusinessInsight Self Service Strategy, Governed Exploration for Analysis and Business Insight
  50. 50. DATA GOVERNANCE 54 Business Glossary – Managed articles about logic and language. Knowledge: What should it be? Data Asset Certification Trust: Is this the right view? Who says so? As of when? Well Managed – Quality checks, release notes, load updates Trust: Is it ok to use RIGHT NOW?
  51. 51. DATA GOVERNANCE 55 Business GlossaryData Asset Certification Well Managed What we do: Data knowledge management and data stewardship Goals: • Demystify our data warehouse of tens of thousands of datasets • Increase trust in data by increasing transparency • Save analysts’ time and reduce their opportunities for error
  52. 52. 56 Value Generation Governed Self Service0
  53. 53. 57 Organizing for Success Purified Data Science6
  54. 54. Data Prep Data Science 58
  55. 55. 59 Data ScienceData +
  56. 56. 60 A COMPLETE VIEW OF OUR CUSTOMERS Behavior Demographics & Interests AttitudeValue to eBay
  57. 57. 61  DATA SCIENCE Data Data Science Business ImpactData Data Science Data Science Data Science Business Impact Insights Customer Insights used to make decisions and set strategy Predictive Models Models that predict outcomes to achieve optimal targeting Segments New ways to assess value and attitudes of our customers DNA
  58. 58. 62 CONVERSION MODEL User Category Probability 111602**** 1564** 10.1% 111602**** 1562** 6.54% 111602**** 1569** 5.67% 111602**** 3564** 4.33% 111602**** 1397** 1.19% 111602**** 3877** 1.11% 111602**** 9282** 1.01% 111602**** 3607** 0.91% 111602**** 1040** 0.81% 111602**** 1564** 0.76% 111602**** 1040** 0.66% 111602**** 4250** 0.01% 111602**** 5235** 0.01% • Cart data • Watch data • Mobile watch • Search pages • Browse data • Purchase history Models
  59. 59. Thanks! • ALEX LIANG • hliang@ebay.com • http://www.linkedin.com/in/alexlianghu

Hinweis der Redaktion

  • eBay is the world’s most vibrant marketplace where the world goes to shop, sell, and give. Whether you are buying something new or used, luxurious or modest, rare or commonplace, trendy or one-of-a-kind – if it exists in the world, it’s probably for sale on eBay. Our mission is to be the world’s favorite destination for discovering great value and unique selection.
    eBay connects millions of buyers and sellers around the globe, empowering people and creating opportunity. Our vision for commerce is one that is enabled by people, powered by technology, and open to everyone.
    We give sellers the platform, solutions, and support they need to grow their businesses and thrive, but we never compete with them. We measure our success by our customers' success.
  • Our vision for commerce is one that is enabled by people, powered by technology, and open to everyone.
    Our strategy is to drive the best choice, have the most relevance, and deliver the most powerful selling platform.
  • eBay Inc. is a global commerce leader including our Marketplace, StubHub and Classifieds platforms.
    Collectively, we connect millions of buyers and sellers around the world.
    The technologies and services that power our platforms are designed to enable sellers worldwide to organize and offer their inventory for sale and buyers to find and buy it virtually anytime and anywhere.
    eBay Inc. employs approximately 11,600 people globally (as of Dec. 31, 2015)
  • Today’s eBay isn’t what it used to be - many people think of us only as an auction site, but that perception hasn’t kept up with reality.
    The reality is that 79% of what is sold on eBay is new merchandise, available for purchase immediately.
    We have more than 900 million items listed for sale and 162 million active buyers, effectively making us the world’s biggest shopping destination.
  • From our vantage point, we believe the impact of these three trends will transform the commerce landscape.
  • Seamless commerce is much more than a mobile experience. To engage with consumers in the “new retail”, brands must take a multi-screen approach.
    We must stop thinking of experiences across individual devices - and start thinking of holistic shopping experiences, where consumers can seamlessly engage with your brand across multiple screens, literally from wherever they are.
    Brands also recognize that online and offline are not mutually exclusive. Consumers want the best of both worlds, shopping online and across multiple devices, and offline in-store. The continued proliferation of mobile will deliver a richer consumer experience that help shoppers navigate seamlessly between the digital and physical worlds.
    At eBay, we’re finding that the multi-screen consumer is more highly engaged. They visit sites more frequently, and they buy significantly more when online. Multiscreen is device agnostic, which means every screen is shoppable. Because we can’t predict what the next great device will be, we must focus on providing customers with the best possible experience - regardless of the device – so consumers can shop when they want, for what they want.
    At eBay, we are innovating across devices, creating seamless buying and selling experiences for iOS, Android, desktop, and even wearables to make sure our customers can engage at every touch point. We are also allowing people to shop the way they want: online, offline and mobile are coming together in services like Click & Collect offered by eBay with Argos in the UK, which allows buyers to pick-up their purchase in-store if they choose.
  • Consumers are increasingly able to shop the world. Their market, or where they shop, is no longer defined by borders. They go online to explore the world – interests/likes come to life in different places.
    Because the brands and products they love can be difficult to find in their markets, they’re willing to shop foreign websites. They tolerate friction in buying in order to access the selection that a global marketplace has to offer.
    At eBay, 57% of our business is international and 95% of our commercial sellers engage in exporting. The eBay app is available in 190 countries, we host 25 localized websites across the globe and are available in 8 languages.
    We are offering innovative approaches to eliminate friction points in global shopping, such as programs like the Global Shipping Program – which enables sellers to more easily ship to 64 countries around the world.

  • Consumers are overwhelmed by the number of choices they face day-to-day. Smart brands are using data to surface inventory to their consumers in ways that feel relevant, helpful and familiar.
    At eBay, we are curating and simplifying content in ways that align to users’ stated (and sometimes unstated) preferences, serving up content in new, simplified interfaces that surprise and delight them. We are also experimenting with machine learning to help bridge the gap between intent and understanding.

  • 大数据是一个数量级大于你习惯的数据, Grasshopper
  • This one take more time. Big Data – size, complexity, velocity. Intersection of #products with customers and activity cause huge volumes
    All of this needs to be loaded and maintained – daily, hourly, 15-minutes, near-real-time
    These users generate millions of requests per day, Add HA.
    Make a big deal about 24x7, no place for batch or query windows. We are a global company with analysts and users all over the world. We load and process and query 24x7. If we take a backup, it has to happen with everything else.
    100 PB/day, processed by our systems, going over data over and over against to find new patterns, etc. Vivaldi touches a TB/second itself. That’s 86 PB/day on one system.
    Its easy to build a large PB store for 10s even 100s of PBs. But, accessing that data and use it in a meaningful way is the challenge. We design our systems for extremely high usage.
  • Our technology is proprietary – but leverages a lot of Open Source Stack

    Most of the Data Processing heavy lifting happens on Hadoop Clusters. Majority of it – MR jobs.

    We also leverage Scala/Scoobi and have a custom built framework (Cascading Based) through a host of libraries, all internally customized.

    Our approach to reporting – is very ‘democratic”. Since a large part of the analytics are for internal consumption, we have to deal with a wide variety of data customers with different degrees of data knowledge and data handling maturity.

    A strategy that has worked very well for us is to provide a top line Analytical Tool ( combination of reports and dashboards), depending on the use case and then, provide curated data sets – to allow for interactive querying and analysis.
  • What exactly do my teams do? Data engineering and technology development at scale

    You can’t see/touch/feel most of what we do.
    We build and manage platforms, used by over 10K distinct users in the last year. .
    Fully integrated DP with history back to beginning.   RJ saying it's our most powerful weapon
    Emphasize engineering org and expertise. - scale and complexity, 

    Search science and best match
    Include detail slides in deck, Advertising buildout example.- Ilari  
    And/or Buildout what is required to make trending campaign work in nous and customer Dna
    Real time PLA
    Data management slide 
    Similar to commerce OS for site Dev we do for data
    Use product slide to answer the question -- but what exactly do you do -- emohaisze most resources are working on the platform -- some of what we finis very visible, but most is not as its a platform that enalrd others. 
  • 5 Stages to something they refer to as the SENTIENT ENTERPRISE
    Framework for maximizing speed/value/agility of investments in Data
    Data Management at eBay roughly follows a 5-stage model developed by
    OLIVER  RATZESBERGER | Teradata MOHAN SAWHNEY | Kellogg School of Management
    I’ve discussed how agile businesses create a balance between imposing and loosening structure – centralizing the definition of data rules but decentralizing use cases to drive innovation.
    Agile businesses are able to make more strategic decisions based on higher levels of both breadth and depth of data.
  • The Agile Data Warehouse moves traditional central DW structures to a balanced decentralized framework built for agility.
    Centralized data – decentralized access. Data Labs that support experimentation and self service
    Promotion process for VDM to Prod.
  • We avoid federation of DMs – no pooling, redundant data, inconsistencies, HC to manage, probably 10x more expensive than they appear
  • But DMs provide agility and speed. The way we do it is completely different. The right way to do cloud for analytics. When we provision a virtual DM, yes it has 100 GB of empty space, but it also has access to PB of reusable company data instantly. Also used for interative development and test.
  • Behavioral Data Platform
    From Transactional to Behavioral Data. Value comes from behaviors rather than transactions
  • LinkedIn for Analytics
    Harnessing the power of social & crowed sourcing to empower the enterprise to collaborate on analytics as scacle
    Share, Follow, Like
  • Working with 3rd party vendor on building out our collaborative data hub
  • But these are just the initial steps towards an even more exciting future for the enterprise.
    Our efforts today around creating agile and transparent data architectures and systems will enable us to create the sentient enterprise.

    What do I mean by “sentient”? Do I want companies to have feelings?
    It’s true that the word “sentient’ is derived from the Latin word “sentīre,” meaning “to feel,” and it refers to any entity that can feel or perceive things.
    This is key to why we think it’s the perfect descriptor. At Teradata we work on helping companies on building a corporation that can sense when something’s wrong and report it to the humans in charge of fixing it.
    In the sentient enterprise, an entire company operates like a single organism, where the left hand knows what the right hand is doing, and where human beings can get signals and suggestions that inform and guide their critical business decision

    In the sentient enterprise, your data talks to you, like it has a brain of its own.
    In the sentient enterprise, the CFO could have answers within hours instead of days.
    In fact, imagine if long before she received her weekly revenue trends report, the CFO could get an alert from the enterprise pointing her to the root cause so she could do something about it.
    Or, better yet, what if she didn’t get an alert at all – because the revenue dip was prevented in the first place?
  • Analytical Application Platform
    Analytical Apps. From static applications and ETL to agile Self Service Apps.
    From Extraction of Data to Enterprise Listening.
    From centralized ETL heavy static code to agile frameworks (if you want your data integrated, conform to this service API)
    From manual data extraction after the fact to real-time Data Listening – Streaming Transactions + Pulsar
  • Introduce EDP

    We’ve been on the path of the agile data warehouse for a long time
    We have the requisite user created data labs, and support experimentation and testing as well as highly integrated core (production) data
    Our recent change has been the addition of enterprise data streams
    Enterprise data streams are near real time streaming data designed to mirror the critical core data from the EDW but in real time – enhanced with history from the EDW so that actions and recommendations can be made based upon new (live) actions while taking into account rich context

    Without this key enabler we could not progress into stage 5: autonomous decision making

    Reusing what we learned from EDW in the other two areas
  • doe.corp.ebay.com
  • Enterprise data streams and real time services
  • Automated Decisioning Platform
    Predictive Technologies and Algorithms. From 10% of time on decision making and 90% sifting through data to 90% on decision making with the help of automated algorithms.
    Implementing Predictive Technologies and Algorithms at scale and operationalizing them throughout the enterprise
    Let systems deal with the ever increasing combinations and intersections of data
    Focus the human brain on making decisions

  • 50K intersection points across our customer experiences modeled each day
    Example: total GMV in Fashion, New listings in Electronics
  • 5:00 - 2 minutes with next – 5:02

    This joke reminded me of the situation at eBay in terms of how to use data. It’s very confusing, just like putting together an IKEA piece of furniture.
  • 5:00 - 2 minutes with previous– 5:02

    Same situation at eBay when it comes to finding/using data
    Transition 1
    Reality its much more confusing…also note no instructions
  • 5:02 - 2 minutes with next – 5:04

    Before I talk about how we are transforming analytics at eBay, let me tell you more about our analytic environment.

    At eBay, a lot of what we do – the decisions we make – are driven by data. We have a large and diverse community of analysts, and we have an even larger and diverse community of data consumers – from Executives all the way through to Business users.
  • 5:02 - 2 minutes with previous – 5:04

    We have a diverse user community…
    Transition 1
    Diverse types of individuals that use data in their day-to-day job, from Data Scientists through traditional FP&A, from Site and Business Analysts to the Exec & Business consumer.
    Transition 2
    This diverse community has a diverse set of needs in using data – from one-off analysis to statistical modeling; from A B experimentation comparisons and deep data mining of unstructured data to standard reports and dashboards using transaction data.
    Transition 3
    And we have a number of tools to enable all of this – from Hadoop to R, SAS & SQL on the large Teradata stores that I have showed, reporting from Excel to visualization tools like Tableau and enterprise class tools like MicroStrategy. But such diversity of Users, needs and tools…
    Transition 4
    …does cause some chaos.

  • 5:04 - 3 minutes – 5:07

    Whaddya mean Chaos?
    Users can have direct SQL access to the biggest data systems on the plant
    They can create their own datasets
    They have enterprise class tools like MicroStrategy
    And slick visualization builders like Tableau and Excel
    Transition 1
    What’s the problem? ? Why is there a need to “transform analytics”?
    Transition 2
    Well, it is a classic problem that I am sure many of you share or recognize.
    Transition 3
    There are hundreds of files sent via email or squirreled away on SharePoint or shared network drives
    There are thousands of tables built without consideration of reuse, retention or rationalization
    There are over 5,000 reports in MicroStrategy with little capability to know if they are relevant or if the data/thinking is stale
    And there are tens of thousands of workbooks in Tableau that share these same issues
    Transition 4
    So, like I said earlier, at eBay you are expected to “go use data”
    but it is tough to find existing metrics and reports – there are so many
    If you are adept at SQL and finding data, it is easy to build metrics – but you might be adding to the chaos
    And in an unstructured environment, it is tough to know which metrics and reports are the right ones to use, or which ones are old or stale
  • So we need to think differently about solving these big data problems.