SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Discontinuities Demo
Shrayes Ramesh, PhD.
Data Tactics Corporation

Thursday, November 14, 13
Challenges
• how do you decide which variables capture
what happened?
• /when/ did an event happen
• what's the effect of the event on the variables
• Can we construct a UI and algorithm to tackle
all three problem simultaneously?

Thursday, November 14, 13
Goals
• The goal is to feed in raw data as the sole
input, and obtain answers to all three
questions:
• (1) when did an event likely occur
• (2) what variables can we use to measure the
event
• (3) what was the effect of the event on those
variables

Thursday, November 14, 13
variable outcome

Simple example

effect

event

Thursday, November 14, 13

time
Scaled up

Thursday, November 14, 13
With limited insight…
• if we know timing and the important
variables, we can measure the effect of the
shock on the variables. (standard regression
techniques)
• if we know the set of important variables and
track variables over time, we can identify
timing of shocks.
• if we know timing and have a long history of
variable evolution, we can cluster variables by
their behavior at the important point in time
(relative to other points in time)
Thursday, November 14, 13
estimated effect

Methodology

correct
effect

correct timing
Thursday, November 14, 13
Methodology
•

For every time T and variable K, run an OLS, under the hypothesis that a shock
occurred at time T to variable K

•

Sample is restricted to variables for a neighborhood around t, i.e. [t-bandwith , t
+bandwidth]:
Y(K,t) = A(K,T) + B(K,T)S(t) + e(K,t) with
S(t) = 1(t>T) is an indicator with T as the time to test

•

Results are stored as the matrix of coefficients B(K,T)

•

OLS estimates of B(K,T) are biased towards zero to the extent that S(t) is
misspecified.

•

In other words, B(K,T) will be maximally different from zero (and unbiased) at the
true break T

Thursday, November 14, 13
Methodology

Thursday, November 14, 13
Answers
• when did an event likely occur?
– aggregate (sum) effects across all variables

• what variables can we use to measure the
event?
– which variables had the largest effect at time point?

• what was the effect of the event on those
variables?
– we just measured that

• what variables move together often across
time?
– show similar variables
Thursday, November 14, 13
Example 1: Super Bowl tweets
• Twitter streaming API (every tweet)
• Sample of data selected from Sunday,
February 3, 1600-2210 hours
• Binned into minute-by-minute word counts
• Out of 651k 1-grams, kept 1035 least sparse
(> 30% sparse) words.
• Input data is 371x1035 matrix

Thursday, November 14, 13
SUPERBOWL SHINY

Thursday, November 14, 13
Network graph of variables with
correlations > .95

Halftime show

Power outage

Thursday, November 14, 13
Deployable and Repeatable
• The model only requires data to be
transformed to a KxT matrix.
– K variables
– T time periods

We could use this model on many other data sets!
• minute-by-minute word count in twitter
• stock prices
• chatter on social media forums

Thursday, November 14, 13
Network graph of forums with
correlations > .27

Hezbollah

Thursday, November 14, 13
Future improvements
• OLS is simple and efficient, but other models
may be more accurate at estimating effects in
some cases
• exploring different approaches to choosing
which variables to consider and approach to
aggregating variable effects.
• massively parallel on all 630k words
simultaneously?
• real-time analytics on streaming data

Thursday, November 14, 13

Weitere ähnliche Inhalte

Mehr von DataTactics

NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATANETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATADataTactics
 
C Star Analytic Presentation
C Star Analytic PresentationC Star Analytic Presentation
C Star Analytic PresentationDataTactics
 
Text Analysis Using Twitter: A Case Study in Dhaka
Text Analysis Using Twitter: A Case Study in Dhaka Text Analysis Using Twitter: A Case Study in Dhaka
Text Analysis Using Twitter: A Case Study in Dhaka DataTactics
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown BagDataTactics
 
Data Tactics Analytics Practice
Data Tactics Analytics PracticeData Tactics Analytics Practice
Data Tactics Analytics PracticeDataTactics
 
Big Data Conference
Big Data ConferenceBig Data Conference
Big Data ConferenceDataTactics
 
Analytics Brownbag
Analytics Brownbag Analytics Brownbag
Analytics Brownbag DataTactics
 
Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013DataTactics
 
Ontology and Reports
Ontology and ReportsOntology and Reports
Ontology and ReportsDataTactics
 
Data Tactics Unified Dataspace Architecture and Description
Data Tactics Unified Dataspace Architecture and DescriptionData Tactics Unified Dataspace Architecture and Description
Data Tactics Unified Dataspace Architecture and DescriptionDataTactics
 
Data Tactics Semantic and Interoperability Summit Feb 12, 2013
Data Tactics Semantic and Interoperability Summit Feb 12, 2013Data Tactics Semantic and Interoperability Summit Feb 12, 2013
Data Tactics Semantic and Interoperability Summit Feb 12, 2013DataTactics
 
Horizontal Integration of Big Intelligence Data
Horizontal Integration of Big Intelligence DataHorizontal Integration of Big Intelligence Data
Horizontal Integration of Big Intelligence DataDataTactics
 
Bill Ontology Summit (08 feb 1400hrs) v2
Bill Ontology Summit (08 feb 1400hrs) v2Bill Ontology Summit (08 feb 1400hrs) v2
Bill Ontology Summit (08 feb 1400hrs) v2DataTactics
 
DT Company Overview January 2013
DT Company Overview January 2013DT Company Overview January 2013
DT Company Overview January 2013DataTactics
 
Capabilities Brief Analytics
Capabilities Brief AnalyticsCapabilities Brief Analytics
Capabilities Brief AnalyticsDataTactics
 
Data Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcData Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcDataTactics
 
Multi Discipline Intelligence Production Teams 1
Multi Discipline Intelligence Production Teams 1Multi Discipline Intelligence Production Teams 1
Multi Discipline Intelligence Production Teams 1DataTactics
 
Data Tactics and Nervve Integrated Big Data v3
Data Tactics and Nervve Integrated Big Data v3Data Tactics and Nervve Integrated Big Data v3
Data Tactics and Nervve Integrated Big Data v3DataTactics
 
Data Tactics Open Source Brief
Data Tactics Open Source BriefData Tactics Open Source Brief
Data Tactics Open Source BriefDataTactics
 

Mehr von DataTactics (20)

NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATANETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
 
C Star Analytic Presentation
C Star Analytic PresentationC Star Analytic Presentation
C Star Analytic Presentation
 
Text Analysis Using Twitter: A Case Study in Dhaka
Text Analysis Using Twitter: A Case Study in Dhaka Text Analysis Using Twitter: A Case Study in Dhaka
Text Analysis Using Twitter: A Case Study in Dhaka
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Data Tactics Analytics Practice
Data Tactics Analytics PracticeData Tactics Analytics Practice
Data Tactics Analytics Practice
 
Big Data Conference
Big Data ConferenceBig Data Conference
Big Data Conference
 
DLISA
DLISADLISA
DLISA
 
Analytics Brownbag
Analytics Brownbag Analytics Brownbag
Analytics Brownbag
 
Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013
 
Ontology and Reports
Ontology and ReportsOntology and Reports
Ontology and Reports
 
Data Tactics Unified Dataspace Architecture and Description
Data Tactics Unified Dataspace Architecture and DescriptionData Tactics Unified Dataspace Architecture and Description
Data Tactics Unified Dataspace Architecture and Description
 
Data Tactics Semantic and Interoperability Summit Feb 12, 2013
Data Tactics Semantic and Interoperability Summit Feb 12, 2013Data Tactics Semantic and Interoperability Summit Feb 12, 2013
Data Tactics Semantic and Interoperability Summit Feb 12, 2013
 
Horizontal Integration of Big Intelligence Data
Horizontal Integration of Big Intelligence DataHorizontal Integration of Big Intelligence Data
Horizontal Integration of Big Intelligence Data
 
Bill Ontology Summit (08 feb 1400hrs) v2
Bill Ontology Summit (08 feb 1400hrs) v2Bill Ontology Summit (08 feb 1400hrs) v2
Bill Ontology Summit (08 feb 1400hrs) v2
 
DT Company Overview January 2013
DT Company Overview January 2013DT Company Overview January 2013
DT Company Overview January 2013
 
Capabilities Brief Analytics
Capabilities Brief AnalyticsCapabilities Brief Analytics
Capabilities Brief Analytics
 
Data Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcData Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtc
 
Multi Discipline Intelligence Production Teams 1
Multi Discipline Intelligence Production Teams 1Multi Discipline Intelligence Production Teams 1
Multi Discipline Intelligence Production Teams 1
 
Data Tactics and Nervve Integrated Big Data v3
Data Tactics and Nervve Integrated Big Data v3Data Tactics and Nervve Integrated Big Data v3
Data Tactics and Nervve Integrated Big Data v3
 
Data Tactics Open Source Brief
Data Tactics Open Source BriefData Tactics Open Source Brief
Data Tactics Open Source Brief
 

Kürzlich hochgeladen

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Kürzlich hochgeladen (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Discontinuities Demo

  • 1. Discontinuities Demo Shrayes Ramesh, PhD. Data Tactics Corporation Thursday, November 14, 13
  • 2. Challenges • how do you decide which variables capture what happened? • /when/ did an event happen • what's the effect of the event on the variables • Can we construct a UI and algorithm to tackle all three problem simultaneously? Thursday, November 14, 13
  • 3. Goals • The goal is to feed in raw data as the sole input, and obtain answers to all three questions: • (1) when did an event likely occur • (2) what variables can we use to measure the event • (3) what was the effect of the event on those variables Thursday, November 14, 13
  • 6. With limited insight… • if we know timing and the important variables, we can measure the effect of the shock on the variables. (standard regression techniques) • if we know the set of important variables and track variables over time, we can identify timing of shocks. • if we know timing and have a long history of variable evolution, we can cluster variables by their behavior at the important point in time (relative to other points in time) Thursday, November 14, 13
  • 8. Methodology • For every time T and variable K, run an OLS, under the hypothesis that a shock occurred at time T to variable K • Sample is restricted to variables for a neighborhood around t, i.e. [t-bandwith , t +bandwidth]: Y(K,t) = A(K,T) + B(K,T)S(t) + e(K,t) with S(t) = 1(t>T) is an indicator with T as the time to test • Results are stored as the matrix of coefficients B(K,T) • OLS estimates of B(K,T) are biased towards zero to the extent that S(t) is misspecified. • In other words, B(K,T) will be maximally different from zero (and unbiased) at the true break T Thursday, November 14, 13
  • 10. Answers • when did an event likely occur? – aggregate (sum) effects across all variables • what variables can we use to measure the event? – which variables had the largest effect at time point? • what was the effect of the event on those variables? – we just measured that • what variables move together often across time? – show similar variables Thursday, November 14, 13
  • 11. Example 1: Super Bowl tweets • Twitter streaming API (every tweet) • Sample of data selected from Sunday, February 3, 1600-2210 hours • Binned into minute-by-minute word counts • Out of 651k 1-grams, kept 1035 least sparse (> 30% sparse) words. • Input data is 371x1035 matrix Thursday, November 14, 13
  • 13. Network graph of variables with correlations > .95 Halftime show Power outage Thursday, November 14, 13
  • 14. Deployable and Repeatable • The model only requires data to be transformed to a KxT matrix. – K variables – T time periods We could use this model on many other data sets! • minute-by-minute word count in twitter • stock prices • chatter on social media forums Thursday, November 14, 13
  • 15. Network graph of forums with correlations > .27 Hezbollah Thursday, November 14, 13
  • 16. Future improvements • OLS is simple and efficient, but other models may be more accurate at estimating effects in some cases • exploring different approaches to choosing which variables to consider and approach to aggregating variable effects. • massively parallel on all 630k words simultaneously? • real-time analytics on streaming data Thursday, November 14, 13