Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)

•Als PPTX, PDF herunterladen•

1 gefällt mir•578 views

This is part of a presentation for QCon New York 2015. On April 23, 2013 the stock market experienced one of its biggest flash-crash drops of the year, with the Dow Jones industrial average falling 143 points (over 1%) in a matter of minutes. Unlike the 2012 stock market blip, this one wasn't caused by an individual trade, but rather by a single tweet from The Associated Press account on the social network, Twitter. The tweet, of course, wasn't written by AP, but rather by an impostor who had temporarily gained control of the account. Could a computer program have detected the tweet as hacked? In this presentation, we'll discuss how machine learning was used to classify tweets as having been authored by The Associated Press or not. As a final test, the program was run on the hacked tweet and we'll reveal if it was able to successfully classify the tweet as being authentic or hacked. Full article: http://www.primaryobjects.com/cms/article158

Technologie

DETECTING A HACKED
TWEET
with Machine Learning and Artificial Intelligence
Sponsored by
Kory Becker 2015
http://primaryobjects.com/cms/article158
http://linkedin.com/in/korybecker
http://twitter.com/primaryobjects

ALL YOUR DATA ARE BELONG TO US
 Accord.NET SVM, Tried Gaussian (96%), then linear (97%) kernel
 Extract Tweets with TweetSharp
 Create Document Corpus (6,054 tweets)
 Create Vocabulary (2,225 words)
 Digitize Corpus
 Porter-Stemmer (“talking” => “talk”, “explosion” => “explos”)
 Term Frequency Inverse Document Frequency (TF*IDF)
 Word Existence
 Vector Size = Vocabulary Size | Matrix = double[6054][2225]

ACCURACY
100% TRAINING
97.38% CV
96.23% TEST

CONCLUSION
Kory Becker
http://linkedin.com/in/korybecker
http://twitter.com/primaryobjects
Detecting a Hacked Tweet with Machine Learning
http://primaryobjects.com/CMS/Article158
An Intelligent Approach to Image
Classification By Color
http://primaryobjects.com/CMS/Article154
Self-Programming Artificial Intelligence
http://primaryobjects.com/CMS/Article149

Empfohlen

Drupal Security: How to survive Drupalgeddon and prepare for future (European...Eugenio Minardi

IBM Watson Concept InsightsKory Becker

All your types are belong to us!Phillip Trelford

Self Programming Artificial IntelligenceKory Becker

Marketing Week Live 2017Jeremy Waite

Alchemy Catalyst 10 - What's NewCheer Chain Enterprise Co., Ltd.

Knowtech2013 peter schuett_ibm_resonanzgesellschaftPeter Schuett

Watson Marketing 2017 ResearchJeremy Waite

Empfohlen

Drupal Security: How to survive Drupalgeddon and prepare for future (European...Eugenio Minardi

IBM Watson Concept InsightsKory Becker

All your types are belong to us!Phillip Trelford

Self Programming Artificial IntelligenceKory Becker

Marketing Week Live 2017Jeremy Waite

Alchemy Catalyst 10 - What's NewCheer Chain Enterprise Co., Ltd.

Knowtech2013 peter schuett_ibm_resonanzgesellschaftPeter Schuett

Watson Marketing 2017 ResearchJeremy Waite

Cognitive Computing.PDFCharles Quincy

The New Era of Cognitive ComputingIBM Research

IBM Watson OverviewPenn State EdTech Network

IBM Watson Analytics PresentationIan Balina

IBM Internet of Things OfferingsIBM Internet of Things

The Future is Artificial Intelligence, David Cole, IBM WatsonThe Drum

TEDx Manchester: AI & The Future of WorkVolker Hirsch

Intelligent Heuristics for the Game IsolationKory Becker

Tips for Submitting a Proposal to Grace Hopper GHC 2020Kory Becker

Grace Hopper 2019 Quantum Computing RecapKory Becker

An Introduction to Quantum Computing - Hopper X1 NYC 2019Kory Becker

Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18Kory Becker

2017 CodeFest Wrap-up PresentationKory Becker

Discovering Trending Topics in News - 2017 EditionKory Becker

Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...Kory Becker

Self Programming Artificial Intelligence - Lightning TalkKory Becker

Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...Kory Becker

Machine Learning in a Flash: An Introduction to Natural Language ProcessingKory Becker

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Powerpoint exploring the locations used in television show Time Clashcharlottematthew16

"ML in Production",Oleksandr BaganFwdays

How to write a Business Continuity PlanDatabarracks

Weitere ähnliche Inhalte

Andere mochten auch

Cognitive Computing.PDFCharles Quincy

The New Era of Cognitive ComputingIBM Research

IBM Watson OverviewPenn State EdTech Network

IBM Watson Analytics PresentationIan Balina

IBM Internet of Things OfferingsIBM Internet of Things

The Future is Artificial Intelligence, David Cole, IBM WatsonThe Drum

TEDx Manchester: AI & The Future of WorkVolker Hirsch

Andere mochten auch (7)

Cognitive Computing.PDF

The New Era of Cognitive Computing

IBM Watson Overview

IBM Watson Analytics Presentation

IBM Internet of Things Offerings

The Future is Artificial Intelligence, David Cole, IBM Watson

TEDx Manchester: AI & The Future of Work

Mehr von Kory Becker

Intelligent Heuristics for the Game IsolationKory Becker

Tips for Submitting a Proposal to Grace Hopper GHC 2020Kory Becker

Grace Hopper 2019 Quantum Computing RecapKory Becker

An Introduction to Quantum Computing - Hopper X1 NYC 2019Kory Becker

Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18Kory Becker

2017 CodeFest Wrap-up PresentationKory Becker

Discovering Trending Topics in News - 2017 EditionKory Becker

Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...Kory Becker

Self Programming Artificial Intelligence - Lightning TalkKory Becker

Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...Kory Becker

Machine Learning in a Flash: An Introduction to Natural Language ProcessingKory Becker

Mehr von Kory Becker (11)

Intelligent Heuristics for the Game Isolation

Tips for Submitting a Proposal to Grace Hopper GHC 2020

Grace Hopper 2019 Quantum Computing Recap

An Introduction to Quantum Computing - Hopper X1 NYC 2019

Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18

2017 CodeFest Wrap-up Presentation

Discovering Trending Topics in News - 2017 Edition

Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...

Self Programming Artificial Intelligence - Lightning Talk

Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...

Machine Learning in a Flash: An Introduction to Natural Language Processing

Kürzlich hochgeladen

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Powerpoint exploring the locations used in television show Time Clashcharlottematthew16

"ML in Production",Oleksandr BaganFwdays

How to write a Business Continuity PlanDatabarracks

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

From Family Reminiscence to Scholarly Archive .Alan Dix

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Kürzlich hochgeladen (20)

TeamStation AI System Report LATAM IT Salaries 2024

Powerpoint exploring the locations used in television show Time Clash

"ML in Production",Oleksandr Bagan

How to write a Business Continuity Plan

The Ultimate Guide to Choosing WordPress Pros and Cons

WordPress Websites for Engineers: Elevate Your Brand

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf

From Family Reminiscence to Scholarly Archive .

How AI, OpenAI, and ChatGPT impact business and software.

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

Unleash Your Potential - Namagunga Girls Coding Club

DevEX - reference for building teams, processes, and platforms

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Scanning the Internet for External Cloud Exposures via SSL Certs

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

Advanced Test Driven-Development @ php[tek] 2024

SIP trunking in Janus @ Kamailio World 2024

Human Factors of XR: Using Human Factors to Design XR Systems

Nell’iperspazio con Rocket: il Framework Web di Rust!

Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)

1. DETECTING A HACKED TWEET with Machine Learning and Artificial Intelligence Sponsored by Kory Becker 2015 http://primaryobjects.com/cms/article158 http://linkedin.com/in/korybecker http://twitter.com/primaryobjects

2. APRIL 23, 2013 1:15PM 143 POINT DROP

3. ALL YOUR DATA ARE BELONG TO US  Accord.NET SVM, Tried Gaussian (96%), then linear (97%) kernel  Extract Tweets with TweetSharp  Create Document Corpus (6,054 tweets)  Create Vocabulary (2,225 words)  Digitize Corpus  Porter-Stemmer (“talking” => “talk”, “explosion” => “explos”)  Term Frequency Inverse Document Frequency (TF*IDF)  Word Existence  Vector Size = Vocabulary Size | Matrix = double[6054][2225]

4. ACCURACY 100% TRAINING 97.38% CV 96.23% TEST

5. CONCLUSION Kory Becker http://linkedin.com/in/korybecker http://twitter.com/primaryobjects Detecting a Hacked Tweet with Machine Learning http://primaryobjects.com/CMS/Article158 An Intelligent Approach to Image Classification By Color http://primaryobjects.com/CMS/Article154 Self-Programming Artificial Intelligence http://primaryobjects.com/CMS/Article149

Hinweis der Redaktion

1. Introduction My name is Kory Becker. I'm a Software Architect at The Associated Press. I develop web applications by day, and have a fascination with artificial intelligence. If you like, you can follow the (short) slides for this talk at slideshare.net/korybecker.
2. What? On April 23, 2013 the stock market experienced one of its biggest flash-crash drops of the year, with the Dow Jones industrial average falling 143 points (over 1%) in a matter of minutes. Unlike the 2012 stock market blip, this one wasn't caused by an individual trade, but rather by a single tweet from The Associated Press account on the social network, Twitter. The tweet, of course, wasn't written by AP, but rather by an impostor (claimed by the Syrian Electronic Army) who had temporarily gained control of the account. Could a computer program have detected the tweet as hacked? The tweet was "Breaking: Two Explosions in the White House and Barack Obama is injured". Now, there are a couple of specific characteristics about the text in question. The term "Breaking" has incorrect casing, coming from AP. It would usually be all capitals. The combination of "White House" + "and" + "Barack Obama" is rare. Maybe a computer could pick up on this? So, what did we do?
3. How? The idea was to write a program using artificial intelligence. Specifically, a machine learning algorithm with supervised learning. The computer would be given a list of tweets and be told whether a tweet is real or fake. It can then learn common terms in each category and (hopefully) figure out how to detect the hacked tweet. Using the Accord.NET machine learning library, I started by implementing a support vector machine (SVM) with a gaussian kernel. SVMs work with different kernels, and gaussian allows fitting data points in a variety of non-linear shapes (round, curvy, etc). I extracted tweets using the TweetSharp library. I created a document corpus of about 6,000 tweets and a vocabulary of about 2,000 words. The documents were digitized by tokenizing the tweets, running porter-stemmer to shorten words, and then creating a bag-of-words model. Each tweet's unique terms were added to the vocabulary. Then, you loop through each tweet and check each word against the vocabulary. If the word exists, you mark a 1 in an array for that tweet. If it doesn't exist, you mark a 0. You end up with an array of 1's and 0's for each tweet. This is perfect for training a machine learning program. To train and test the accuracy, the tweets were split into a training, cross validation, and test set. The computer uses the training set to learn which tweets it classifies right or wrong and fine-tune its model. It then runs against the cross validation set to see how it does on tweets that it hasn't trained on. So, what were the results?
4. Result? The gaussian kernel did pretty well. It scored 99.7% accuracy on the training set and 96% on the cross validation. The SVM was then switched to use a linear kernel. This bumped up the accuracy to 100% training and 97% cross validation. Ok, but did it detect the hacked tweet? The initial training set contained random tweets from AP and non-AP Twitter accounts. It correctly classified AP tweets, but failed on the particular hacked tweet. I fed the training set additional tweets, such as "-from:AP obama" and "-from:AP breaking" so it had knowledge of the actual topic. And what do you know, it worked!
5. Conclusion There are a lot more details in this project, including some cool learning curve charts and examples of tweets being classified. You can read my full article at http://www.primaryobjects.com/cms/article158 (the top link in the last slide). There are some code samples for setting up the SVM and you can even download the test set results. If you're curious about artificial intelligence, I also have some other interesting articles, including Self-Programming Artificial Intelligence (the last link in the slide), where a computer program uses genetic algorithms to successfully write its own computer programs. Scary stuff! In conclusion, my name is Kory Becker. Feel free to chat if you have any questions or connect online via @primaryobjects on Twitter or Kory Becker on LinkedIn. Thanks.