SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
What’s Really the Buzz?
Stefan Groschupf
sg@datameer.com




    © Datameer, Inc 2010
How it started...




2    © Datameer, Inc 2010
How it started...




3    © Datameer, Inc 2010
How it started...




4    © Datameer, Inc 2010
Agenda




5   © Datameer, Inc 2010
Street Cred

                            Long time open source contributor




                            http://github.com/sgroschupf/zkclient
                            http://github.com/sgroschupf/aws-tasks




6    © Datameer, Inc 2010
Cubicle Cred
                            Cloud Computing Architect
                            Hadoop consultant at e.g.




                            Co-Founder/CEO Scale Unlimited

                            Co-Founder / CEO Datameer Inc.




7    © Datameer, Inc 2010
Hadoop vs. DB




Hadoop(mergesort)           DB(b-tree/index)

8    © Datameer, Inc 2010
Twitter




9    © Datameer, Inc 2010
Stack




10    © Datameer, Inc 2010
Stack
     AWS
     •     ...
     Hadoop
     •     ...
     Datameer
     •     Spreadsheet compiles into MapReduce Jobs, + much more
     •     Comercial
     Gephi
     •     Graph Visualization, GPL, Netbeans based




11       © Datameer, Inc 2010
Expenses
         What                   Price          Description
      Client Ec2               $65.00       0.085/h * 24 * 31
      S3 Storage                $7.50       0.15/GB/m * 50
         EMR                   $24.00         0.20 * 20 * 6
      Datameer                  ~$20          private beta
        Gephi                   $0.00             GPL
     Development             $8.99 + tax   6pk Pilsner Urquell




12    © Datameer, Inc 2010
Pipeline


                                        DATAMEER


     Client on Ec2                        EMR                 Gephi

                                                   CSV/GEXF
                                   S3




13          © Datameer, Inc 2010
Twitter Client

                         Basic Auth


                                                                                S3
                             TwitterClient       Compression           Upload
                               Thread              Thread              Thread


                                                 EC2 Server




                                        256 MB                 50 MB




14    © Datameer, Inc 2010
Twitter API

     Streaming API
     curl http://stream.twitter.com/1/statuses/sample.json 
     -ustefan:somepwd

     streams, once a while interruped
     5% of public statuses by default
     Search API
     curl http://search.twitter.com/search.json?q=datameer 
     -ustefan:somepwd

     150 requests per hour




15    © Datameer, Inc 2010
Some Twitter Fields

     user_screen_name        text
     user_followers_count    source
     user_friends_count      geo_type
     user_statuses_count     geo_longitude
     user_location           geo_latitude
     created_at




16    © Datameer, Inc 2010
Simple Analytics

     Trending topics
     Tweet spread timing
     Topic Reach
     Topic Location Reach
     Etc.




17    © Datameer, Inc 2010
Demo




18   © Datameer, Inc 2010   http://en.wikipedia.org/wiki/Social_network   http://www.orgnet.com
From Stream to Network

     Understand tweets as network plus metadata (time, geo, etc)
     ReTweet => Citation => Link => Pagerank
     Twitter friends (not accessible from Streaming API)
     •     Technically not realistic to get
     •     Quality?




19       © Datameer, Inc 2010
Social Network Analytics
                             David Krackhardt




                                  Degree
                                  Number of direct connections a node has (Diane)
                                  Betweenness
                                  broker, point of failure, high influence of flow (Heather)
                                  Closeness
                                  Shortest path to all others (Fernando, Garth)



20    © Datameer, Inc 2010
Social Network Analytics
     Centralization
     Centralized network dominated by one or a few very central nodes
     Single point of failure
     Prestige
     Prestige is the term for a node's centrality
     Reach
     Degree to which any member of a network can reach other members
     Boundary Spanners
     Central in overall network, bridging clusters, innovators since information
     comes from multiple clusters
     Path Length
     The distances between pairs of nodes in the network. Average path length
     is the average of these distances between all pairs of nodes.


21    © Datameer, Inc 2010   http://en.wikipedia.org/wiki/Social_network   http://www.orgnet.com
Demo




22   © Datameer, Inc 2010   http://en.wikipedia.org/wiki/Social_network   http://www.orgnet.com
Yes, we hiring too...




23    © Datameer, Inc 2010   http://en.wikipedia.org/wiki/Social_network   http://www.orgnet.com
Resources...

     http://www.slideshare.net/padday/the-real-life-social-network-v2
     (Paul Adams)
     http://xrime.sourceforge.net/
     •     SNA Metrics and Structures
     www.datameer.com
     twitter.com/datameer
     http://github.com/sgroschupf
     sg@datameer.com




24       © Datameer, Inc 2010

Weitere ähnliche Inhalte

Andere mochten auch

Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009yhadoop
 
The Bixo Web Mining Toolkit
The Bixo Web Mining ToolkitThe Bixo Web Mining Toolkit
The Bixo Web Mining ToolkitTom Croucher
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Nov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big DataNov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big DataYahoo Developer Network
 
Upgrading To The New Map Reduce API
Upgrading To The New Map Reduce APIUpgrading To The New Map Reduce API
Upgrading To The New Map Reduce APITom Croucher
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduceHadoop User Group
 
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Hadoop User Group
 
Bay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 IntroBay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 IntroOwen O'Malley
 
Next Generation MapReduce
Next Generation MapReduceNext Generation MapReduce
Next Generation MapReduceOwen O'Malley
 

Andere mochten auch (19)

Searching At Scale
Searching At ScaleSearching At Scale
Searching At Scale
 
Mumak
MumakMumak
Mumak
 
Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009
 
The Bixo Web Mining Toolkit
The Bixo Web Mining ToolkitThe Bixo Web Mining Toolkit
The Bixo Web Mining Toolkit
 
File Context
File ContextFile Context
File Context
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Nov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big DataNov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big Data
 
Nov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.HNov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.H
 
Upgrading To The New Map Reduce API
Upgrading To The New Map Reduce APIUpgrading To The New Map Reduce API
Upgrading To The New Map Reduce API
 
HUG Nov 2010: HDFS Raid - Facebook
HUG Nov 2010: HDFS Raid - FacebookHUG Nov 2010: HDFS Raid - Facebook
HUG Nov 2010: HDFS Raid - Facebook
 
Cloudera Desktop
Cloudera DesktopCloudera Desktop
Cloudera Desktop
 
3 avro hug-2010-07-21
3 avro hug-2010-07-213 avro hug-2010-07-21
3 avro hug-2010-07-21
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
 
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
 
January 2011 HUG: Howl Presentation
January 2011 HUG: Howl PresentationJanuary 2011 HUG: Howl Presentation
January 2011 HUG: Howl Presentation
 
January 2011 HUG: Pig Presentation
January 2011 HUG: Pig PresentationJanuary 2011 HUG: Pig Presentation
January 2011 HUG: Pig Presentation
 
Bay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 IntroBay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 Intro
 
Next Generation MapReduce
Next Generation MapReduceNext Generation MapReduce
Next Generation MapReduce
 

Mehr von Hadoop User Group

Karmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsKarmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsHadoop User Group
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopHadoop User Group
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010Hadoop User Group
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Hadoop User Group
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Hadoop User Group
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop User Group
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupHadoop User Group
 
Flightcaster Presentation Hadoop
Flightcaster  Presentation  HadoopFlightcaster  Presentation  Hadoop
Flightcaster Presentation HadoopHadoop User Group
 

Mehr von Hadoop User Group (17)

Common crawlpresentation
Common crawlpresentationCommon crawlpresentation
Common crawlpresentation
 
Hdfs high availability
Hdfs high availabilityHdfs high availability
Hdfs high availability
 
Cascalog internal dsl_preso
Cascalog internal dsl_presoCascalog internal dsl_preso
Cascalog internal dsl_preso
 
Karmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsKarmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-tools
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Hdfs high availability
Hdfs high availabilityHdfs high availability
Hdfs high availability
 
Pig at Linkedin
Pig at LinkedinPig at Linkedin
Pig at Linkedin
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User Group
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Flightcaster Presentation Hadoop
Flightcaster  Presentation  HadoopFlightcaster  Presentation  Hadoop
Flightcaster Presentation Hadoop
 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 

Kürzlich hochgeladen

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Kürzlich hochgeladen (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

HUG August 2010:Whats the buzz_bay_area

  • 1. What’s Really the Buzz? Stefan Groschupf sg@datameer.com © Datameer, Inc 2010
  • 2. How it started... 2 © Datameer, Inc 2010
  • 3. How it started... 3 © Datameer, Inc 2010
  • 4. How it started... 4 © Datameer, Inc 2010
  • 5. Agenda 5 © Datameer, Inc 2010
  • 6. Street Cred Long time open source contributor http://github.com/sgroschupf/zkclient http://github.com/sgroschupf/aws-tasks 6 © Datameer, Inc 2010
  • 7. Cubicle Cred Cloud Computing Architect Hadoop consultant at e.g. Co-Founder/CEO Scale Unlimited Co-Founder / CEO Datameer Inc. 7 © Datameer, Inc 2010
  • 8. Hadoop vs. DB Hadoop(mergesort) DB(b-tree/index) 8 © Datameer, Inc 2010
  • 9. Twitter 9 © Datameer, Inc 2010
  • 10. Stack 10 © Datameer, Inc 2010
  • 11. Stack AWS • ... Hadoop • ... Datameer • Spreadsheet compiles into MapReduce Jobs, + much more • Comercial Gephi • Graph Visualization, GPL, Netbeans based 11 © Datameer, Inc 2010
  • 12. Expenses What Price Description Client Ec2 $65.00 0.085/h * 24 * 31 S3 Storage $7.50 0.15/GB/m * 50 EMR $24.00 0.20 * 20 * 6 Datameer ~$20 private beta Gephi $0.00 GPL Development $8.99 + tax 6pk Pilsner Urquell 12 © Datameer, Inc 2010
  • 13. Pipeline DATAMEER Client on Ec2 EMR Gephi CSV/GEXF S3 13 © Datameer, Inc 2010
  • 14. Twitter Client Basic Auth S3 TwitterClient Compression Upload Thread Thread Thread EC2 Server 256 MB 50 MB 14 © Datameer, Inc 2010
  • 15. Twitter API Streaming API curl http://stream.twitter.com/1/statuses/sample.json -ustefan:somepwd streams, once a while interruped 5% of public statuses by default Search API curl http://search.twitter.com/search.json?q=datameer -ustefan:somepwd 150 requests per hour 15 © Datameer, Inc 2010
  • 16. Some Twitter Fields user_screen_name text user_followers_count source user_friends_count geo_type user_statuses_count geo_longitude user_location geo_latitude created_at 16 © Datameer, Inc 2010
  • 17. Simple Analytics Trending topics Tweet spread timing Topic Reach Topic Location Reach Etc. 17 © Datameer, Inc 2010
  • 18. Demo 18 © Datameer, Inc 2010 http://en.wikipedia.org/wiki/Social_network http://www.orgnet.com
  • 19. From Stream to Network Understand tweets as network plus metadata (time, geo, etc) ReTweet => Citation => Link => Pagerank Twitter friends (not accessible from Streaming API) • Technically not realistic to get • Quality? 19 © Datameer, Inc 2010
  • 20. Social Network Analytics David Krackhardt Degree Number of direct connections a node has (Diane) Betweenness broker, point of failure, high influence of flow (Heather) Closeness Shortest path to all others (Fernando, Garth) 20 © Datameer, Inc 2010
  • 21. Social Network Analytics Centralization Centralized network dominated by one or a few very central nodes Single point of failure Prestige Prestige is the term for a node's centrality Reach Degree to which any member of a network can reach other members Boundary Spanners Central in overall network, bridging clusters, innovators since information comes from multiple clusters Path Length The distances between pairs of nodes in the network. Average path length is the average of these distances between all pairs of nodes. 21 © Datameer, Inc 2010 http://en.wikipedia.org/wiki/Social_network http://www.orgnet.com
  • 22. Demo 22 © Datameer, Inc 2010 http://en.wikipedia.org/wiki/Social_network http://www.orgnet.com
  • 23. Yes, we hiring too... 23 © Datameer, Inc 2010 http://en.wikipedia.org/wiki/Social_network http://www.orgnet.com
  • 24. Resources... http://www.slideshare.net/padday/the-real-life-social-network-v2 (Paul Adams) http://xrime.sourceforge.net/ • SNA Metrics and Structures www.datameer.com twitter.com/datameer http://github.com/sgroschupf sg@datameer.com 24 © Datameer, Inc 2010