SlideShare a Scribd company logo
1 of 9
Download to read offline
© 2013 Sqor, Inc.
Sqor
Using R For Social Media and Sports Data
Athletes SuccessData
Noah Gift: CTO @ Sqor
© 2013 Sqor, Inc.
What is Sqor?

•  Social Network hyper-focused on enhancing fan/athlete relationships. We only do Sports!: Now
•  Marketplace for athletes to build and market their digital brand: Now
•  Social Analytics and Prediction Engine as a Service: Q1 2015
•  Micro-endorsement platform: Q1 2015
•  Crowdfunding for athletes: Now
•  Game platform: First Homegrown game featuring Brett Favre: Now
•  Cross-Social Network Publishing Platform: Facebook, Twitter, Embeddable posts.: Now
•  Website, Android App, and iOS App:
© 2013 Sqor, Inc.
Key Aspects of Data Pipeline
•  Multiple languages involved: Python, R, Erlang, C#, SQL and Javascript.
•  Multiple persistence options: SQL Server (RDS), Riak (No SQL), CSV Files, Mnesia (Distributed Soft Realtime
DB)
•  RabbitMQ and Erlang handle messaging and job communication
•  Easy to debug: daily and nightly scripts, intermediate CSV files, deep storage in K/V store and reports live in
RDS.
•  R is used exclusively for machine learning and statistics (Although recommendation engine v1 was written in
Python. We are going to replace it with R/Erlang code though)
© 2013 Sqor, Inc.
Things They Don’t Tell You Building A Data Pipeline 
From Scratch (Our you should have paid attention to) 
•  Getting the data in the right format and making sure it is accurate is back breaking work. It truly is horrible.
•  Keeping track of model prediction accuracy over time: both with new data and new models is really important
•  Non-linear regression is non-trivial
•  Automation and debuggability of every step is very important. Think Unix Tools
•  Expensive, exotic solutions sometimes aren’t worth it at first…or maybe ever. Weird databases, etc.
•  Making predictions involving real money with limited data is scary and really hard. If your not scared about this,
you should be.)
© 2013 Sqor, Inc.
Predicting Top Athletic Performers in Social Media
•  Sqor finds influential athletes and collaborates with them using our prediction algorithms
© 2013 Sqor, Inc.
Our Prediction Algorithms Appear To Work
•  Or we got really lucky….
© 2013 Sqor, Inc.
Clustering
•  We use R clustering packages for classification, visualization of patterns and diagnostics for predictions
© 2013 Sqor, Inc.
Clustering
•  We use kNN clustering for NBA and MLB Sports. Plan on expanding this further in the near future.
© 2013 Sqor, Inc.
Erlang/R Bridge
•  Sqor is a heavy user of Erlang
•  We like Erlang because it has unique concurrency abilities and high uptime (and also because I had a lot of
bosses who told me I couldn’t use).
•  ➜ ~ curl -v -X PUT -H 'content-type: application/json' http://127.0.0.1:8080/api/script/foo -d
'{"script":"execute <- function (A) { A * 2 }", "docs":"this doubles stuff"}'
•  ➜ ~ curl -v http://127.0.0.1:8080/api/script/foo -X POST -H 'content-type: application/json' -d '[25]’
•  Returns: [50.0]
•  We plan on open sourcing this in next 2 months: Run scripts, runs jobs, scales R

More Related Content

What's hot

Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with SparkKrishna Sankar
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with AnacondaTravis Oliphant
 
Distributed processing of large graphs in python
Distributed processing of large graphs in pythonDistributed processing of large graphs in python
Distributed processing of large graphs in pythonJose Quesada (hiring)
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engineLars Marius Garshol
 
Agile data science with scala
Agile data science with scalaAgile data science with scala
Agile data science with scalaAndy Petrella
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
 
Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Andy Petrella
 
The Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesThe Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesAlice Zheng
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSpark Summit
 
What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.Andy Petrella
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and GiraphDoug Needham
 
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkPandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkLi Jin
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedTuri, Inc.
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphsStanka Dalekova
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners Jen Stirrup
 
EDHREC @ Data Science MD
EDHREC @ Data Science MDEDHREC @ Data Science MD
EDHREC @ Data Science MDDonald Miner
 
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATLParikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATLMLconf
 
Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark Turi, Inc.
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlySarah Guido
 
The MADlib Analytics Library
The MADlib Analytics Library The MADlib Analytics Library
The MADlib Analytics Library EMC
 

What's hot (20)

Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with Anaconda
 
Distributed processing of large graphs in python
Distributed processing of large graphs in pythonDistributed processing of large graphs in python
Distributed processing of large graphs in python
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engine
 
Agile data science with scala
Agile data science with scalaAgile data science with scala
Agile data science with scala
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)
 
The Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesThe Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the Masses
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya Hristakeva
 
What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
 
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkPandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySpark
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphs
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners
 
EDHREC @ Data Science MD
EDHREC @ Data Science MDEDHREC @ Data Science MD
EDHREC @ Data Science MD
 
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATLParikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
 
Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
The MADlib Analytics Library
The MADlib Analytics Library The MADlib Analytics Library
The MADlib Analytics Library
 

Similar to Using R for Social Media and Sports Analytics

SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?Nicolas Georgeault
 
San Francisco Atlassian User Group - February 2014
San Francisco Atlassian User Group - February 2014San Francisco Atlassian User Group - February 2014
San Francisco Atlassian User Group - February 2014Nicholas Muldoon
 
Measuring Programmer Performance with SourceKibitzer EyeQ
Measuring Programmer Performance with SourceKibitzer EyeQMeasuring Programmer Performance with SourceKibitzer EyeQ
Measuring Programmer Performance with SourceKibitzer EyeQMark Kofman
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with DatabricksGrega Kespret
 
RightScale Webinar: Get Top Performance for Your Games
RightScale Webinar: Get Top Performance for Your GamesRightScale Webinar: Get Top Performance for Your Games
RightScale Webinar: Get Top Performance for Your GamesRightScale
 
Snowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWSSnowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWSGiuseppe Gaviani
 
Extending the Reach of R to the Enterprise with TERR and Spotfire
Extending the Reach of R to the Enterprise with TERR and SpotfireExtending the Reach of R to the Enterprise with TERR and Spotfire
Extending the Reach of R to the Enterprise with TERR and SpotfireLou Bajuk
 
Software Development & Architecture @ LinkedIn
Software Development & Architecture @ LinkedInSoftware Development & Architecture @ LinkedIn
Software Development & Architecture @ LinkedInC4Media
 
Splunk bangalore user group 2020-06-01
Splunk bangalore user group   2020-06-01Splunk bangalore user group   2020-06-01
Splunk bangalore user group 2020-06-01NiketNilay
 
Can We Make Maps from Videos? ~From AI Algorithm to Engineering for Continuou...
Can We Make Maps from Videos? ~From AI Algorithm to Engineering for Continuou...Can We Make Maps from Videos? ~From AI Algorithm to Engineering for Continuou...
Can We Make Maps from Videos? ~From AI Algorithm to Engineering for Continuou...DeNA
 
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017MLconf
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Kent Graziano
 
Software Development Services
Software Development ServicesSoftware Development Services
Software Development ServicesCassie Nguyen
 
Redefining the Role of IT in a Self-Help Data Integration Environment
Redefining the Role of IT in a Self-Help Data Integration EnvironmentRedefining the Role of IT in a Self-Help Data Integration Environment
Redefining the Role of IT in a Self-Help Data Integration EnvironmentUNIFI Software
 
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Data Con LA
 
Designing a pragmatic back-end service for mobile games
Designing a pragmatic back-end service for mobile gamesDesigning a pragmatic back-end service for mobile games
Designing a pragmatic back-end service for mobile gamesiFunFactory Inc.
 
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-AriThinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
Manage the Velocity of Change with Cloud Computing
Manage the Velocity of Change with Cloud Computing Manage the Velocity of Change with Cloud Computing
Manage the Velocity of Change with Cloud Computing Janine Sneed
 

Similar to Using R for Social Media and Sports Analytics (20)

SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?
 
San Francisco Atlassian User Group - February 2014
San Francisco Atlassian User Group - February 2014San Francisco Atlassian User Group - February 2014
San Francisco Atlassian User Group - February 2014
 
Measuring Programmer Performance with SourceKibitzer EyeQ
Measuring Programmer Performance with SourceKibitzer EyeQMeasuring Programmer Performance with SourceKibitzer EyeQ
Measuring Programmer Performance with SourceKibitzer EyeQ
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
RightScale Webinar: Get Top Performance for Your Games
RightScale Webinar: Get Top Performance for Your GamesRightScale Webinar: Get Top Performance for Your Games
RightScale Webinar: Get Top Performance for Your Games
 
Snowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWSSnowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWS
 
Extending the Reach of R to the Enterprise with TERR and Spotfire
Extending the Reach of R to the Enterprise with TERR and SpotfireExtending the Reach of R to the Enterprise with TERR and Spotfire
Extending the Reach of R to the Enterprise with TERR and Spotfire
 
Software Development & Architecture @ LinkedIn
Software Development & Architecture @ LinkedInSoftware Development & Architecture @ LinkedIn
Software Development & Architecture @ LinkedIn
 
Splunk bangalore user group 2020-06-01
Splunk bangalore user group   2020-06-01Splunk bangalore user group   2020-06-01
Splunk bangalore user group 2020-06-01
 
Can We Make Maps from Videos? ~From AI Algorithm to Engineering for Continuou...
Can We Make Maps from Videos? ~From AI Algorithm to Engineering for Continuou...Can We Make Maps from Videos? ~From AI Algorithm to Engineering for Continuou...
Can We Make Maps from Videos? ~From AI Algorithm to Engineering for Continuou...
 
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
 
Software Development Services
Software Development ServicesSoftware Development Services
Software Development Services
 
Redefining the Role of IT in a Self-Help Data Integration Environment
Redefining the Role of IT in a Self-Help Data Integration EnvironmentRedefining the Role of IT in a Self-Help Data Integration Environment
Redefining the Role of IT in a Self-Help Data Integration Environment
 
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
 
Designing a pragmatic back-end service for mobile games
Designing a pragmatic back-end service for mobile gamesDesigning a pragmatic back-end service for mobile games
Designing a pragmatic back-end service for mobile games
 
Yuvaraj
YuvarajYuvaraj
Yuvaraj
 
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-AriThinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
 
RootandLeaves.pptx
RootandLeaves.pptxRootandLeaves.pptx
RootandLeaves.pptx
 
Manage the Velocity of Change with Cloud Computing
Manage the Velocity of Change with Cloud Computing Manage the Velocity of Change with Cloud Computing
Manage the Velocity of Change with Cloud Computing
 

More from Ajay Ohri

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay OhriAjay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RAjay Ohri
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionAjay Ohri
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for freeAjay Ohri
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10Ajay Ohri
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri ResumeAjay Ohri
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientistsAjay Ohri
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...Ajay Ohri
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data ScienceAjay Ohri
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data ScientistsAjay Ohri
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in PythonAjay Ohri
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen OomsAjay Ohri
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha Ajay Ohri
 
Summer school python in spanish
Summer school python in spanishSummer school python in spanish
Summer school python in spanishAjay Ohri
 
Introduction to sas in spanish
Introduction to sas in spanishIntroduction to sas in spanish
Introduction to sas in spanishAjay Ohri
 

More from Ajay Ohri (20)

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 Election
 
Pyspark
PysparkPyspark
Pyspark
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for free
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri Resume
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
 
Tradecraft
Tradecraft   Tradecraft
Tradecraft
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data Scientists
 
Craps
CrapsCraps
Craps
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen Ooms
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha
 
Summer school python in spanish
Summer school python in spanishSummer school python in spanish
Summer school python in spanish
 
Introduction to sas in spanish
Introduction to sas in spanishIntroduction to sas in spanish
Introduction to sas in spanish
 

Recently uploaded

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 

Recently uploaded (20)

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 

Using R for Social Media and Sports Analytics

  • 1. © 2013 Sqor, Inc. Sqor Using R For Social Media and Sports Data Athletes SuccessData Noah Gift: CTO @ Sqor
  • 2. © 2013 Sqor, Inc. What is Sqor? •  Social Network hyper-focused on enhancing fan/athlete relationships. We only do Sports!: Now •  Marketplace for athletes to build and market their digital brand: Now •  Social Analytics and Prediction Engine as a Service: Q1 2015 •  Micro-endorsement platform: Q1 2015 •  Crowdfunding for athletes: Now •  Game platform: First Homegrown game featuring Brett Favre: Now •  Cross-Social Network Publishing Platform: Facebook, Twitter, Embeddable posts.: Now •  Website, Android App, and iOS App:
  • 3. © 2013 Sqor, Inc. Key Aspects of Data Pipeline •  Multiple languages involved: Python, R, Erlang, C#, SQL and Javascript. •  Multiple persistence options: SQL Server (RDS), Riak (No SQL), CSV Files, Mnesia (Distributed Soft Realtime DB) •  RabbitMQ and Erlang handle messaging and job communication •  Easy to debug: daily and nightly scripts, intermediate CSV files, deep storage in K/V store and reports live in RDS. •  R is used exclusively for machine learning and statistics (Although recommendation engine v1 was written in Python. We are going to replace it with R/Erlang code though)
  • 4. © 2013 Sqor, Inc. Things They Don’t Tell You Building A Data Pipeline From Scratch (Our you should have paid attention to) •  Getting the data in the right format and making sure it is accurate is back breaking work. It truly is horrible. •  Keeping track of model prediction accuracy over time: both with new data and new models is really important •  Non-linear regression is non-trivial •  Automation and debuggability of every step is very important. Think Unix Tools •  Expensive, exotic solutions sometimes aren’t worth it at first…or maybe ever. Weird databases, etc. •  Making predictions involving real money with limited data is scary and really hard. If your not scared about this, you should be.)
  • 5. © 2013 Sqor, Inc. Predicting Top Athletic Performers in Social Media •  Sqor finds influential athletes and collaborates with them using our prediction algorithms
  • 6. © 2013 Sqor, Inc. Our Prediction Algorithms Appear To Work •  Or we got really lucky….
  • 7. © 2013 Sqor, Inc. Clustering •  We use R clustering packages for classification, visualization of patterns and diagnostics for predictions
  • 8. © 2013 Sqor, Inc. Clustering •  We use kNN clustering for NBA and MLB Sports. Plan on expanding this further in the near future.
  • 9. © 2013 Sqor, Inc. Erlang/R Bridge •  Sqor is a heavy user of Erlang •  We like Erlang because it has unique concurrency abilities and high uptime (and also because I had a lot of bosses who told me I couldn’t use). •  ➜ ~ curl -v -X PUT -H 'content-type: application/json' http://127.0.0.1:8080/api/script/foo -d '{"script":"execute <- function (A) { A * 2 }", "docs":"this doubles stuff"}' •  ➜ ~ curl -v http://127.0.0.1:8080/api/script/foo -X POST -H 'content-type: application/json' -d '[25]’ •  Returns: [50.0] •  We plan on open sourcing this in next 2 months: Run scripts, runs jobs, scales R