SlideShare a Scribd company logo
1 of 3
Download to read offline
The Path to TrstRank
         Building One-Click Twitter Influence Metrics
                                    Since the launch of Twitter, people have clamored for ways to
                                    access and “slice and dice” its data. One of the most common
                                    ways people use the Twitter data corpus is to measure a person’s
                                    importance and influence. Klout is an example of one product that
                                    specializes in this kind of “influencer” data.
What is
TrstRank?                           A few years ago, we created our own special version of Klout,
                                    one that took advantage of our vast historical record of the
TrstRank is an Infochimps           relationships to create an accurate number describing how
developed dataset and API           influential a Twitter user is. It’s called TrstRank and it ranks a user
that provides Twitter influence     on a scale of 1-10, with 10 being the most influential you
metrics. This API provides          can get.
Twitter influence metrics with
the click of a button! TrstRank     Coming up with such a number like TrstRank is no small task.
measures Twitter user               Setting aside the issues of getting the data, there are some very
reputation, importance and          real Big Data problems surrounding the product that require
influence in a far more             special tools for getting it done efficiently. And when you’re a
robust way than counting the        bootstrapped startup, like we were at the time, you have to be
number of followers. It is a        resourceful if you are going to get by.
sophisticated measure of a
user’s relative importance          The biggest issue with pursuing a new data product like TrstRank
within the entire Twitter           is the same one any company faces when they decide to venture
network.                            into new territory - the high risks of wasting time and money.


                                    Wasting Time
                                    One of the first problems you run into as a small team trying your
                                    hand at data science is the excess time spent on server and ma-
                                    chine configuration, instead of focusing on modeling, algorithms,
                                    and manipulating the data.

© 2012 Infochimps, Inc. All rights reserved.                                                           1
Ramp-up time for even the first phase of a project like TrstRank
                                    can be a whole day or more of engineering time.


                                    Wasting Money
                                    From our earliest days Infochimps has been based on Amazon
                                    Web Services’ (AWS) cloud, taking advantage of the flexibility
                                    and scalability it provides. With AWS, you pay for what you use,
                                    so you are always inclined to eliminate waste. In our early days
                                    we even created decision trees for when to shut down a cluster or
                                    not, depending on how many hours it was to be up but not used.


                                    This can set conflicting goals for the data scientist who would
                                    prefer to leave a cluster up overnight, even if it’s unused, so they
                                    don’t have to deal with setting everything up again the next day!


                                    Enter Ironfan
                                    We created Ironfan to solve our own problems of how to save
                                    time and money during our data science operations in the cloud.
                                    When we came up with the idea for TrstRank, it was a simple
                                    operation to spin up a cluster for early analysis and experimenta-
                                    tion. We could validate some of our algorithms and ideas on a
                                    simple cluster before moving to something more heavyweight.

                                    Ironfan and TrstRank, Now
                                    Ironfan has continued as a key tool for our monthly TrstRank
                                    operation. We continue to scrape Twitter for follower information,
                                    and with the updated data every month we crunch the TrstRank
                                    numbers again.


                                    With Ironfan, we’re able to run a multiple step operation on
                                    8 billion tweets on clusters of 30 m1.xlarge EC2 machines,
                                    while only running the resources we need when they’re needed.
                                    TrstRank takes 72 hours to complete, with resources being paid
                                    for commensurately. Without Ironfan, we’d be looking at 2-3x the
                                    costs in time and money!



© 2012 Infochimps, Inc. All rights reserved.                                                         2
About Infochimps
                                    Our mission is to make the world’s data more accessible.
                                    Infochimps helps companies understand their data. We provide
                                    tools and services that connect their internal data, leverage the
                                    power of cloud computing and new technologies such as Hadoop,
                                    and provide a wealth of external datasets, which organizations
                                    can connect to their own data.


                                    Contact Us
                                    Infochimps, Inc.
                                    1214 W 6th St. Suite 202
                                    Austin, TX 78703


                                    1-855-DATA-FUN (1-855-328-2386)


                                    www.infochimps.com
                                    info@infochimps.com


                                    Twitter: @infochimps




                      Get a free Big Data consultation
                          Let’s talk Big Data in the enterprise!

     Get a free conference with the leading big data experts regarding your enterprise big data
     project. Meet with leading data scientists Flip Kromer and/or Dhruv Bansal to talk shop
     about your project objectives, design, infrastructure, tools, etc. Find out how other compa-
     nies are solving similar problems. Learn best practices and get recommendations — free.




© 2012 Infochimps, Inc. All rights reserved.                                                        8

More Related Content

More from Infochimps, a CSC Big Data Business (12)

Infographic: CIOs & Big Data
Infographic: CIOs & Big DataInfographic: CIOs & Big Data
Infographic: CIOs & Big Data
 
5 Big Data Use Cases for 2013
5 Big Data Use Cases for 20135 Big Data Use Cases for 2013
5 Big Data Use Cases for 2013
 
451 Research Impact Report
451 Research Impact Report451 Research Impact Report
451 Research Impact Report
 
[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects
 
[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics
 
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
 
Taming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel ArchitectureTaming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel Architecture
 
The Other Way of Doing Big Data
The Other Way of Doing Big DataThe Other Way of Doing Big Data
The Other Way of Doing Big Data
 
Real-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the AgencyReal-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the Agency
 
The Power of Elasticsearch
The Power of ElasticsearchThe Power of Elasticsearch
The Power of Elasticsearch
 
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
 
Meet the Infochimps Platform
Meet the Infochimps PlatformMeet the Infochimps Platform
Meet the Infochimps Platform
 

Recently uploaded

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Recently uploaded (20)

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

The Path to TrstRank: Building One Click Twitter Influence Metrics

  • 1. The Path to TrstRank Building One-Click Twitter Influence Metrics Since the launch of Twitter, people have clamored for ways to access and “slice and dice” its data. One of the most common ways people use the Twitter data corpus is to measure a person’s importance and influence. Klout is an example of one product that specializes in this kind of “influencer” data. What is TrstRank? A few years ago, we created our own special version of Klout, one that took advantage of our vast historical record of the TrstRank is an Infochimps relationships to create an accurate number describing how developed dataset and API influential a Twitter user is. It’s called TrstRank and it ranks a user that provides Twitter influence on a scale of 1-10, with 10 being the most influential you metrics. This API provides can get. Twitter influence metrics with the click of a button! TrstRank Coming up with such a number like TrstRank is no small task. measures Twitter user Setting aside the issues of getting the data, there are some very reputation, importance and real Big Data problems surrounding the product that require influence in a far more special tools for getting it done efficiently. And when you’re a robust way than counting the bootstrapped startup, like we were at the time, you have to be number of followers. It is a resourceful if you are going to get by. sophisticated measure of a user’s relative importance The biggest issue with pursuing a new data product like TrstRank within the entire Twitter is the same one any company faces when they decide to venture network. into new territory - the high risks of wasting time and money. Wasting Time One of the first problems you run into as a small team trying your hand at data science is the excess time spent on server and ma- chine configuration, instead of focusing on modeling, algorithms, and manipulating the data. © 2012 Infochimps, Inc. All rights reserved. 1
  • 2. Ramp-up time for even the first phase of a project like TrstRank can be a whole day or more of engineering time. Wasting Money From our earliest days Infochimps has been based on Amazon Web Services’ (AWS) cloud, taking advantage of the flexibility and scalability it provides. With AWS, you pay for what you use, so you are always inclined to eliminate waste. In our early days we even created decision trees for when to shut down a cluster or not, depending on how many hours it was to be up but not used. This can set conflicting goals for the data scientist who would prefer to leave a cluster up overnight, even if it’s unused, so they don’t have to deal with setting everything up again the next day! Enter Ironfan We created Ironfan to solve our own problems of how to save time and money during our data science operations in the cloud. When we came up with the idea for TrstRank, it was a simple operation to spin up a cluster for early analysis and experimenta- tion. We could validate some of our algorithms and ideas on a simple cluster before moving to something more heavyweight. Ironfan and TrstRank, Now Ironfan has continued as a key tool for our monthly TrstRank operation. We continue to scrape Twitter for follower information, and with the updated data every month we crunch the TrstRank numbers again. With Ironfan, we’re able to run a multiple step operation on 8 billion tweets on clusters of 30 m1.xlarge EC2 machines, while only running the resources we need when they’re needed. TrstRank takes 72 hours to complete, with resources being paid for commensurately. Without Ironfan, we’d be looking at 2-3x the costs in time and money! © 2012 Infochimps, Inc. All rights reserved. 2
  • 3. About Infochimps Our mission is to make the world’s data more accessible. Infochimps helps companies understand their data. We provide tools and services that connect their internal data, leverage the power of cloud computing and new technologies such as Hadoop, and provide a wealth of external datasets, which organizations can connect to their own data. Contact Us Infochimps, Inc. 1214 W 6th St. Suite 202 Austin, TX 78703 1-855-DATA-FUN (1-855-328-2386) www.infochimps.com info@infochimps.com Twitter: @infochimps Get a free Big Data consultation Let’s talk Big Data in the enterprise! Get a free conference with the leading big data experts regarding your enterprise big data project. Meet with leading data scientists Flip Kromer and/or Dhruv Bansal to talk shop about your project objectives, design, infrastructure, tools, etc. Find out how other compa- nies are solving similar problems. Learn best practices and get recommendations — free. © 2012 Infochimps, Inc. All rights reserved. 8