SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Interana
Puree through Trillions of clicks in seconds
Agenda
Behaviour Queries
Fast Data
Ingest
Deployment
Who Am I
Big Data Engineer at Interana
SQL, Cassandra, Redis, Mongo, SOLR and now Interana
jag@interana.com/Github/LinkedIn
If u want create a big problem, build a database,
if u want to create huge problem, never delete anything
Data > Opinion Journey
● Three Engineers Lior Abraham, Bobby Johnson and Ann Johnson
● Scuba at Facebook
● Take it to the masses
● Full Stack Solution - UI/API, Ingest and Storage Tier
Customers
Use Cases
● Dynamic Session
● Purchase/Ad Funnels
● Continuous deployment - A/B
● Bot Detection
Want answers over TRILLIONS of data points in REALTIME
What is Behaviour
Concepts Event Stream
Event - Actor - Behavior - At Time T, Jack makes purchase
Session - Between Login and Logout, inactive time
Cohort - Male, 25, California, In-Out-Burger
Funnel - Click On Ad => Viewed Item => Add to Cart => Made Purchase =>
Satisfaction
Metric - Equation column from existing, storageless
Time ordered Event Data
Timestamp User (SK) Ad_id (SK) Behaviour Is_Alcoholic
(DM)
July 1st Jack Beer Clicked on Ad True
July 1st Jill Juice Clicked on
Add
False
July 2nd Jack Added To
Cart
July 3rd Jack Purchase
July 5th Jill Added To
Cart
July 10th Jill Logged Out
Performance, Performance, Performance
● Columnar Store design for fast scanning
● C++/MMAP
● Pipelining
● Compression
I/O - keep it near the core
Sampling
● Lies, Damn Lies and Sampling
● Sampling take advantage of SK <-> Actor relationship
● Confidence depends on the shape of the distributions
● Sample rate is key. Sparse data get tricky across 100’s of shards
Sampling
Ingest
Ingest
● Schemaless, evolving organically
● Transformers and Pipelines
● Pull(S3, Blob, FS) or Push (HTTP)
● Dedupe and Replay friendly.
Columnar Format
Operational Approach
● Managed Service - Hosted in AWS/Azure environment
● Coming Soon - Container based solution (AMI), self-server Import as well
● Performance is critical - Sata vs SSD, Tiered Disks, RAM
● Redundancy - currently no live redundancy, use backup and playback
The Cluster
Typical Performance
Sampled Unsampled Sampled/
Tiered
Count * 2s 11s 20s
Group/Filte
r
2s 17s 10s
Session
duration
5s 45s 10s
Funnel 10s 60s 20s
Ex. AWS - 32 * I2.xlarge - 800 GB SSD + 1.6 TB Tiered = 70 TB of storage
2 Trillion Rows * 1000 columns
Peak
Throughput 500 MB/s
Lines 10 M
Rows/min
Latency 5-20
minutes
Query Import
Explorer
Music Data Set - 4B Rows
ts 1412210780000
userId(sk) 130065
sessionId(sk) 999FAFD51ASD
artist Audioslave
auth Logged In
lastName Brown
level free
page NextSong
song Gasoline
Demo
● Music Data Set - 4B
● Dashboards
● Explorer
● Funnels
Thanks
For more tech advice come to our blog
https://www.interana.com/blog/

Weitere ähnliche Inhalte

Ähnlich wie Puree through Trillion of clicks in seconds using Interana

Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101Data Con LA
 
Webinar share point performance feb2016 slideshare
Webinar share point performance feb2016 slideshareWebinar share point performance feb2016 slideshare
Webinar share point performance feb2016 slideshareDynatrace
 
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Soroosh Khodami
 
Best Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and DeltaBest Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and DeltaDatabricks
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...Amazon Web Services Korea
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsVMware Tanzu
 
Interactive Analytics in Human Time
Interactive Analytics in Human TimeInteractive Analytics in Human Time
Interactive Analytics in Human TimeDataWorks Summit
 
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...Amazon Web Services
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataData Con LA
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
June 2014 HUG: Interactive analytics over hadoop
June 2014 HUG: Interactive analytics over hadoopJune 2014 HUG: Interactive analytics over hadoop
June 2014 HUG: Interactive analytics over hadoopYahoo Developer Network
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Amazon Web Services
 
I pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekendI pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekendNicolas Carlier
 
How to Get IBM i Security and Operational Insights with Splunk
How to Get IBM i Security and Operational Insights with SplunkHow to Get IBM i Security and Operational Insights with Splunk
How to Get IBM i Security and Operational Insights with SplunkPrecisely
 
LatJUG. Google App Engine
LatJUG. Google App EngineLatJUG. Google App Engine
LatJUG. Google App Enginedenis Udod
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...Adrian Cockcroft
 
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...Amazon Web Services
 

Ähnlich wie Puree through Trillion of clicks in seconds using Interana (20)

kdd2015
kdd2015kdd2015
kdd2015
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101
 
Webinar share point performance feb2016 slideshare
Webinar share point performance feb2016 slideshareWebinar share point performance feb2016 slideshare
Webinar share point performance feb2016 slideshare
 
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
 
Best Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and DeltaBest Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and Delta
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven Applications
 
Interactive Analytics in Human Time
Interactive Analytics in Human TimeInteractive Analytics in Human Time
Interactive Analytics in Human Time
 
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Streaming Analytics
Streaming AnalyticsStreaming Analytics
Streaming Analytics
 
June 2014 HUG: Interactive analytics over hadoop
June 2014 HUG: Interactive analytics over hadoopJune 2014 HUG: Interactive analytics over hadoop
June 2014 HUG: Interactive analytics over hadoop
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
 
I pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekendI pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekend
 
How to Get IBM i Security and Operational Insights with Splunk
How to Get IBM i Security and Operational Insights with SplunkHow to Get IBM i Security and Operational Insights with Splunk
How to Get IBM i Security and Operational Insights with Splunk
 
LatJUG. Google App Engine
LatJUG. Google App EngineLatJUG. Google App Engine
LatJUG. Google App Engine
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
 

Kürzlich hochgeladen

Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniquesugginaramesh
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 

Kürzlich hochgeladen (20)

Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniques
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 

Puree through Trillion of clicks in seconds using Interana

  • 1. Interana Puree through Trillions of clicks in seconds
  • 3. Who Am I Big Data Engineer at Interana SQL, Cassandra, Redis, Mongo, SOLR and now Interana jag@interana.com/Github/LinkedIn If u want create a big problem, build a database, if u want to create huge problem, never delete anything
  • 4. Data > Opinion Journey ● Three Engineers Lior Abraham, Bobby Johnson and Ann Johnson ● Scuba at Facebook ● Take it to the masses ● Full Stack Solution - UI/API, Ingest and Storage Tier
  • 6. Use Cases ● Dynamic Session ● Purchase/Ad Funnels ● Continuous deployment - A/B ● Bot Detection Want answers over TRILLIONS of data points in REALTIME
  • 8. Concepts Event Stream Event - Actor - Behavior - At Time T, Jack makes purchase Session - Between Login and Logout, inactive time Cohort - Male, 25, California, In-Out-Burger Funnel - Click On Ad => Viewed Item => Add to Cart => Made Purchase => Satisfaction Metric - Equation column from existing, storageless
  • 9. Time ordered Event Data Timestamp User (SK) Ad_id (SK) Behaviour Is_Alcoholic (DM) July 1st Jack Beer Clicked on Ad True July 1st Jill Juice Clicked on Add False July 2nd Jack Added To Cart July 3rd Jack Purchase July 5th Jill Added To Cart July 10th Jill Logged Out
  • 10. Performance, Performance, Performance ● Columnar Store design for fast scanning ● C++/MMAP ● Pipelining ● Compression
  • 11. I/O - keep it near the core
  • 12. Sampling ● Lies, Damn Lies and Sampling ● Sampling take advantage of SK <-> Actor relationship ● Confidence depends on the shape of the distributions ● Sample rate is key. Sparse data get tricky across 100’s of shards
  • 15. Ingest ● Schemaless, evolving organically ● Transformers and Pipelines ● Pull(S3, Blob, FS) or Push (HTTP) ● Dedupe and Replay friendly.
  • 17. Operational Approach ● Managed Service - Hosted in AWS/Azure environment ● Coming Soon - Container based solution (AMI), self-server Import as well ● Performance is critical - Sata vs SSD, Tiered Disks, RAM ● Redundancy - currently no live redundancy, use backup and playback
  • 19. Typical Performance Sampled Unsampled Sampled/ Tiered Count * 2s 11s 20s Group/Filte r 2s 17s 10s Session duration 5s 45s 10s Funnel 10s 60s 20s Ex. AWS - 32 * I2.xlarge - 800 GB SSD + 1.6 TB Tiered = 70 TB of storage 2 Trillion Rows * 1000 columns Peak Throughput 500 MB/s Lines 10 M Rows/min Latency 5-20 minutes Query Import
  • 21. Music Data Set - 4B Rows ts 1412210780000 userId(sk) 130065 sessionId(sk) 999FAFD51ASD artist Audioslave auth Logged In lastName Brown level free page NextSong song Gasoline
  • 22. Demo ● Music Data Set - 4B ● Dashboards ● Explorer ● Funnels
  • 23. Thanks For more tech advice come to our blog https://www.interana.com/blog/

Hinweis der Redaktion

  1. Analytics engine
  2. Franca lingua - analytics engine
  3. Started event data to represent satellite ground events in finite state machine 15 years ago. Could fit into single db (10M rows). We generate 10B Rows a day ( 1/1000 day = 86 seconds).
  4. High dimensional. know the question. Naming things and cache invalidation. How many people have worked in an environment, where u had people use your software or want to. How many people made a feature decisions based on managers decision? And how many have used data to drive there decisions for products?
  5. Came of stealth. Billions to Trillians. Startups To Enterprise. Consumer to B2B.
  6. We do both!!! Unauthorized - application Took Funnel users for devices. Sonos funnel, 3 step, first step attempt for action, error, action. Funnel Dropoff. Drill down cohort, matrix. User id and count events, different types of errors.
  7. Circle - follow link. Is Sparse. Is event based (time variying vs timeseries). Questions :
  8. Actor is the thing that your are following. Most system only have one or two of these. Franca Lingua of Click Stream/Event Sream
  9. Sessions are dynamically generated depending on the state. Highlighted are sessions The turn a a set of events into a shard key. Funnels go across sessions. They talk about the population as a whole
  10. Like cassandra MMAP/MALLOC Pipeling - using caches. Keep data and instructions ocal Compression - focus on streaming data compression vs block level compressions
  11. Realtime, neartime and batch time
  12. Why does it matter - It impacts the answer but you save 10x the time producing the answer. Google example. Directionaly. Power Law - Uniform Distribution. Distribution Counts(click). 100 shards, 100 users. Uniform across (i.e. The data distribution is normal). Outliers. Unsampled. Number of Actor Result. Density Actors/shards. Spend Summing to distribution is to wide. Tail distribution/Pareto Distribution. Heavy Ended. In those cases we alert, fully unsampled amount of time. Quality Distribution. Confidence measure. Shard Key. Effectively high availability. Delta-Dira, One User, Scaling or Delta Distribution. Count * event. Delta on check engine light. One user. Hackers behaviour, get 0 back. Check engine light. In Delta Dirac. Impact Scaling---------*-------------------*------
  13. Ensure all distribution are similar. Uniformily distributed across shards. Sharding
  14. Default, Continuous window temporal with time. Time Append Write. Ask during demo, how many have dataset on these backings. How much data.. Is it, Gigabyte, Terabyets, PetaBytes.
  15. Question : Where is your data housed? Import/Export vs Ingest. Formatting. Other things that things struggle with. File/Stream/Custom Protocol(Kafka, et all). More confortable, Unix box command line stuff.
  16. How many people use Amazon Marketplace. How many use a Virtual solution (.hvi) ? How many just plain old tarball?
  17. https://demo.interana.com/?dashboard=dashboard-test-2&name=Music%20Dashboard. 1440 minutes a day
  18. Demographic,, flexible behavior driven cohorts. The music data
  19. Demographic, old school metal, flexible behavior driven cohorts