Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

21.182 Aufrufe

Veröffentlicht am

Veröffentlicht in: Technologie
  • Your opinions matter! get paid for them! click here for more info...♥♥♥ https://tinyurl.com/make2793amonth
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • Discover a WEIRD trick I use to make over $3500 per month taking paid surveys online. read more... ●●● https://tinyurl.com/realmoneystreams2019
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • Discover a WEIRD trick I use to make over $3500 per month taking paid surveys online. read more... ♣♣♣ http://ishbv.com/surveys6/pdf
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • I went from getting $3 surveys to $500 surveys every day!! learn more... ➤➤ https://tinyurl.com/realmoneystreams2019
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • Secrets to making $$$ with paid surveys... ♣♣♣ https://tinyurl.com/realmoneystreams2019
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013

  1. Real-time “OLAP” for Big Data (+ use cases) Cosmin Lehene | Adobe #bigdataro - 30 January 2013© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
  2. What we needed … and built  OLAP Semantics  Low Latency Ingestion  High Throughput  Real-time Query API© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 2
  3. “Physical” Building Blocks© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 3
  4. Logical Building Blocks  Dimensions, Metrics  Aggregations  Roll-up, drill-down, slicing and dicing, sorting© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 4
  5. OLAP 101 – Queries example Date Country City OS Browser Sale 2012-05-21 USA NY Windows FF 0.0 2012-05-21 USA NY Windows FF 10.0 2012-05-22 USA SF OSX Chrome 25.0 2012-05-22 Canada Ontario Linux Chrome 0.0 2012-05-23 USA Chicago OSX Safari 15.0 5 visits, 2 4 cities: 3 OS-es 3 browsers 50.0 3 days countries NY: 2 Win: 2 FF: 2 3 sales USA: 4 SF: 1 OSX: 2 Chrome:2 Canada: 1© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 5
  6. OLAP 101 – Queries example  Rolling up to country level: Country visits sales SELECT COUNT(visits), SUM(sales) USA 4 $50 GROUP BY country Canada 1 0  “Slice” by browser Country visits sales SELECT COUNT(visits), SUM(sales) USA 2 $10 GROUP BY country Canada 0 0 HAVING browser = “FF” Browser sales visits  Top browsers by sales SELECT SUM(sales), COUNT(visits) Chrome $25 2 GROUP BY browser Safari $15 1 ORDER BY sales FF $10 2© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 6
  7. OLAP – Runtime Aggregation vs. Pre-aggregation  Aggregate at runtime  Pre-aggregate  Most flexible  Fast  Fast – scatter gather  Efficient – O(1)  Space efficient  High throughput  But  But  I/O, CPU intensive  More effort to process (latency)  slow for larger data  Combinatorial explosion (space)  low throughput  No flexibility© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 7
  8. SaasBase Map© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 8
  9. SaasBase Domain Model Mapping© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 9
  10. SaasBase - Domain Model Mapping© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 10
  11. SaasBase - Ingestion, Processing, Indexing, Querying© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 11
  12. SaasBase - Ingestion, Processing, Indexing, Querying© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 12
  13. Ingestion© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 13
  14. Ingestion(ETL) throughput vs. latency  Historical data (large batches)  Optimize for throughput  Increments (latest data, smaller)  Optimize for latency© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 14
  15. Processing© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 15
  16. Processing  Processing involves reading the Input (files, tables, events), pre- aggregating it (reducing cardinality) and generating cubes that can be queried in real-time  “Super Processor” code running in Storm, Map-Reduce, HBase© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 16
  17. Processing for OLAP semantics  GROUP BY (process, query)  COUNT, SUM, AVG, etc. (process, query)  SORT (process, query)  HAVING (mostly query, can define pre-process constraints)© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 17
  18. SaasBase vs. SQL Views Comparison© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 18
  19. Query Engine  Always reads indexed, compact data  Query parsing  Scan strategy  Single vs. multiple scans  Start/stop rows (prefixes, index positions, etc.)  Index selection (volatile indexes with incremental processing)  Deserialization  Post-aggregation, sorting, fuzzy-sorting etc.  Paging  Custom dimension/metric class loading© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 19
  20. Adobe Business Catalyst  Online business presence: e-commerce, marketing, web analytics etc.  Use case: Web Analytics (visitors, channels, content, e- commerce, campaigns, etc.)© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 20
  21. BC - Workflow© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 21
  22. Adobe Business Catalyst - Stats  3 active datacenters  Raw data ~6TB (from ~1TB 18 months ago)  Visits table: ~1TB each(compressed)  OLAP cubes (stats): 49GB – 64GB (compressed)  ~30 minutes latency (from actual pageview/sale to chart in UI)  10s – 100s of milliseconds latency for queries  ~3000/s max concurrent OLAP queries (actual traffic is much lower)© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 22
  23. Adobe Pass for TV Everywhere  Authentication & Authorization  Single sign-on to Programmer content (e.g. Turner, NBC, Hulu, MTV, etc) with Cable operator credentials (e.g. Comcast, Dish, etc.)© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 23
  24. Adobe Pass – Use Case  Analytics use case: Operational metrics (users, devices, latencies, etc.)  Real-time ingestion in HBase  High Frequency Map Reduce jobs (every 2 minutes)© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 24
  25. Adobe Pass - Stats (London Olympics 2012)  67M streams ~ 5.3M hours  1.5M concurrent streams  > 7M unique users  1 Technical & Engineering Emmy Award ;)© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 25
  26. Adobe Primetime – Real-time Video Analytics  Unified video platform (acquisition, transcoding, broadcast, ads, analytics)© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 26
  27. Adobe Primetime – Use Case  Use Cases:  Audience metrics – minutes latency ok  Ads metrics – seconds to minutes ok  Streaming QoS metrics – seconds must  Requirements:  Massive throughput (millions of streams, multiple heartbeats every 10 seconds)  Low latency (end-to-end)© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 27
  28. © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 28
  29. Conclusions  OLAP semantics on a simple data model  Data as first class citizen  Domain Specific “Language” for Dimensions, Metrics, Aggregations  Framework for vertical analytics systems  Tunable performance, resource allocation© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 29
  30. Thank you! Cosmin Lehene @clehene http://hstack.org© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 30
  31. Related http://www.hbasecon.com/sessions/low-latency-olap-with-hbase/ http://www.slideshare.net/clehene/low-latency-olap-with-hbase-hbasecon-2012© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 31
  32. © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

×