SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
Citizen Activism using Scrubyt and RoR
   Only partially available
    online
     Formatted as web page or
     PDF
   Hard to search
   Can’t subscribe
   Can’t visualize
   Can’t re-use
Publishing Structured            Data Visualization
Feeds                            • Makes it easy to find new
• Ability to subscribe to          patterns.
  interesting data
• Data streams can be ‘mashed’
  in new ways.


Collaborative                    Crowdsourcing
Organization                     • Combines skills and input of
• Tagging, Voting, Sharing         large numbers of people
•   Governments publish
             Governments                       data streams
             publish data
               streams
                                           •   3rd parties create tools for
                                               analysis and oversight
                3rd
Issues are
               Party
                             Citizens
                            monitor data
                                           •   Citizens collaboratively
 resolved
               Tools         streams           monitor their
                                               government
                                           •   Citizens detect issues,
              Issues are
               detected
                                               give feedback
                                           •   Issues are resolved
 Government has little
                  incentive
                  ▪ Usually has disincentive

Why can’t the
                 Don’t want a single
government do     monolithic solution
everything?       ▪ Want to allow evolution of best-
                    of-breed tools


                 Tools created by citizens, for
                  citizens
   Focus:
     US Congress
     California
     Legislature

   Gives grants to
    online
    transparency
    tools
   $3.5 M Seed
A recent US
             Congress bill




Groups for     Groups
   bill       against bill
Votes


Donations
Publishing Structured             Data Visualization
Feeds                             • MAPLight makes relationship
• MAPLight is a mashup of           between money and votes
  data streams from different       visible.
  sources.



Collaborative                     Crowdsourcing
Organization                      • Thousands of journalists,
• Advocacy group tags               advocates, and citizens can
  donating companies as             browse data and flag issues.
  belonging to interest groups.
   Accelerate online transparency
Ideas       Raise Awareness
              With public
              With government
Skills      Raise Money

            Fund External Development:
Funds         Grants
              Contests
Prove
Concept


 Get Publicity     Direct Attention and
                   Money and to Online
                  Tools For Transparency
   Raise
   Awareness

    Show What’s
    Possible
   2003 Directive: Must
    publish travel and
    hospitality expenses
    on the web

   No standards for
    presentation defined
124 Departments
  - All different
Standardize          Stream          Visualize
• Scrape data into   • Publish RSS   • Provide basic
  standard format      feeds           visualization app
                                     • Run contest
1. LEARNING TEMPLATE     2. PRODUCTION SCRAPER

  Input                     Input
  • Example Page            • Any Page with
  • Example Text              Same Format



  Output:                   Output:
  • XML
  • Production Scraper
                            • XML
   Create a system
    where non-coders
    can train a scraper.
PRO                                          CON

   Ability to use ‘learning’                    Learning mode fails hard
    example (sometimes)
                                                 Doesn’t always learn
   Syntax integrates XML
    builder

   Supports all hpricot Xpath
    operations


    Note: For compatibility reasons, this project uses an older version of scrubyt.
                      Issues may be fixed in newer version.
   Create a system
    where non-coders
    can train a scraper.


.... Didn’t work.
Still need coders w/ the following expertise:

 1. XPath XML resolution


 2. Regular Expressions


 3. Firebug
1. Open This Link



2. Paste This Text
...created in the
   background
Go To Next Level
Split Level: Two Types of Links

                    Open This Link
Select Element




Get the XPath
Split Level: Two Types of Links
...created in the
   background
Test Random Reports



    Send Home
   Goal: Finish scraping in one day
       12/124 Completed: 112 to go
       5-20 Volunteers
       5-20 min. per department
       Downloadable app w/ setup instructions
       Integrated examples


   Benefits:
     Excuse to use scrubyt, firebug
     On-site tutorial + guidance
     Easy intro to a Rails App
Jennifer Bell
visiblegovernment.ca

Weitere ähnliche Inhalte

Ähnlich wie VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails

Benefits of Open Government Data
Benefits of Open Government DataBenefits of Open Government Data
Benefits of Open Government DataJennifer Bell
 
[Workshop] Analyzing Your Deliverables: Developing the Optimal Documentation ...
[Workshop] Analyzing Your Deliverables: Developing the Optimal Documentation ...[Workshop] Analyzing Your Deliverables: Developing the Optimal Documentation ...
[Workshop] Analyzing Your Deliverables: Developing the Optimal Documentation ...Scott Abel
 
Analyzing Your Deliverables: Developing the Optimal Documentation Library
Analyzing Your Deliverables: Developing the Optimal Documentation LibraryAnalyzing Your Deliverables: Developing the Optimal Documentation Library
Analyzing Your Deliverables: Developing the Optimal Documentation LibraryScott Abel
 
Contemporary Communication Technologies Presentation View
Contemporary Communication Technologies Presentation ViewContemporary Communication Technologies Presentation View
Contemporary Communication Technologies Presentation ViewDena Gray-Fisher
 
Benefits of Open Government Data (Expanded)
Benefits of Open Government Data (Expanded)Benefits of Open Government Data (Expanded)
Benefits of Open Government Data (Expanded)Jennifer Bell
 
Mac281 Wikinomics And Colloborative Production
Mac281 Wikinomics And Colloborative ProductionMac281 Wikinomics And Colloborative Production
Mac281 Wikinomics And Colloborative ProductionRob Jewitt
 
Government Next: NIC Presentation
Government Next: NIC PresentationGovernment Next: NIC Presentation
Government Next: NIC PresentationTara Hunt
 
Teaching 2.0 Learning & Leading in the Digital Age
Teaching 2.0 Learning & Leading in the Digital AgeTeaching 2.0 Learning & Leading in the Digital Age
Teaching 2.0 Learning & Leading in the Digital AgeMatthew Hayden
 
Technology Trends And Print Service Providers
Technology Trends And  Print Service ProvidersTechnology Trends And  Print Service Providers
Technology Trends And Print Service ProvidersJeffrey Stewart
 
Social Media Training Workshop for Small Business
Social Media Training Workshop for Small BusinessSocial Media Training Workshop for Small Business
Social Media Training Workshop for Small BusinessWeb.com
 
Tim O'Reilly Mashup Camp 2008
Tim O'Reilly Mashup Camp 2008Tim O'Reilly Mashup Camp 2008
Tim O'Reilly Mashup Camp 2008Tim O'Reilly
 
How To Create The Killer Location Aware Social Networking Application
How To Create The Killer Location Aware Social Networking ApplicationHow To Create The Killer Location Aware Social Networking Application
How To Create The Killer Location Aware Social Networking ApplicationMobileMonday Tel-Aviv
 
Mega Collaboration Interface
Mega Collaboration InterfaceMega Collaboration Interface
Mega Collaboration Interfaceguest8c177f
 
Web 2.0 Online Collaboration examples
Web 2.0 Online Collaboration examplesWeb 2.0 Online Collaboration examples
Web 2.0 Online Collaboration examplesR. Sosa
 
Semantic Web research anno 2006:main streams, popular falacies, current statu...
Semantic Web research anno 2006:main streams, popular falacies, current statu...Semantic Web research anno 2006:main streams, popular falacies, current statu...
Semantic Web research anno 2006:main streams, popular falacies, current statu...Frank van Harmelen
 
Facebook Developer Garage Uganda
Facebook Developer Garage UgandaFacebook Developer Garage Uganda
Facebook Developer Garage UgandaLeila Janah
 
David Esrati, The Blogzilla Report- Fact, Fiction Fear: The Monster of the In...
David Esrati, The Blogzilla Report- Fact, Fiction Fear: The Monster of the In...David Esrati, The Blogzilla Report- Fact, Fiction Fear: The Monster of the In...
David Esrati, The Blogzilla Report- Fact, Fiction Fear: The Monster of the In...webcontent2007
 
Usnorthcom Internet Based Collaboration
Usnorthcom Internet Based CollaborationUsnorthcom Internet Based Collaboration
Usnorthcom Internet Based CollaborationDave "Mac" McKinley
 
Gov + Citi-Experts
Gov + Citi-ExpertsGov + Citi-Experts
Gov + Citi-ExpertsCarlosPC_Mx
 

Ähnlich wie VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails (20)

Benefits of Open Government Data
Benefits of Open Government DataBenefits of Open Government Data
Benefits of Open Government Data
 
[Workshop] Analyzing Your Deliverables: Developing the Optimal Documentation ...
[Workshop] Analyzing Your Deliverables: Developing the Optimal Documentation ...[Workshop] Analyzing Your Deliverables: Developing the Optimal Documentation ...
[Workshop] Analyzing Your Deliverables: Developing the Optimal Documentation ...
 
Analyzing Your Deliverables: Developing the Optimal Documentation Library
Analyzing Your Deliverables: Developing the Optimal Documentation LibraryAnalyzing Your Deliverables: Developing the Optimal Documentation Library
Analyzing Your Deliverables: Developing the Optimal Documentation Library
 
Contemporary Communication Technologies Presentation View
Contemporary Communication Technologies Presentation ViewContemporary Communication Technologies Presentation View
Contemporary Communication Technologies Presentation View
 
Benefits of Open Government Data (Expanded)
Benefits of Open Government Data (Expanded)Benefits of Open Government Data (Expanded)
Benefits of Open Government Data (Expanded)
 
Mac281 Wikinomics And Colloborative Production
Mac281 Wikinomics And Colloborative ProductionMac281 Wikinomics And Colloborative Production
Mac281 Wikinomics And Colloborative Production
 
Government Next: NIC Presentation
Government Next: NIC PresentationGovernment Next: NIC Presentation
Government Next: NIC Presentation
 
Teaching 2.0 Learning & Leading in the Digital Age
Teaching 2.0 Learning & Leading in the Digital AgeTeaching 2.0 Learning & Leading in the Digital Age
Teaching 2.0 Learning & Leading in the Digital Age
 
Technology Trends And Print Service Providers
Technology Trends And  Print Service ProvidersTechnology Trends And  Print Service Providers
Technology Trends And Print Service Providers
 
Social Media Training Workshop for Small Business
Social Media Training Workshop for Small BusinessSocial Media Training Workshop for Small Business
Social Media Training Workshop for Small Business
 
Tim O'Reilly Mashup Camp 2008
Tim O'Reilly Mashup Camp 2008Tim O'Reilly Mashup Camp 2008
Tim O'Reilly Mashup Camp 2008
 
How To Create The Killer Location Aware Social Networking Application
How To Create The Killer Location Aware Social Networking ApplicationHow To Create The Killer Location Aware Social Networking Application
How To Create The Killer Location Aware Social Networking Application
 
Mega Collaboration Interface
Mega Collaboration InterfaceMega Collaboration Interface
Mega Collaboration Interface
 
Web 2.0 Online Collaboration examples
Web 2.0 Online Collaboration examplesWeb 2.0 Online Collaboration examples
Web 2.0 Online Collaboration examples
 
Tf gsds
Tf gsdsTf gsds
Tf gsds
 
Semantic Web research anno 2006:main streams, popular falacies, current statu...
Semantic Web research anno 2006:main streams, popular falacies, current statu...Semantic Web research anno 2006:main streams, popular falacies, current statu...
Semantic Web research anno 2006:main streams, popular falacies, current statu...
 
Facebook Developer Garage Uganda
Facebook Developer Garage UgandaFacebook Developer Garage Uganda
Facebook Developer Garage Uganda
 
David Esrati, The Blogzilla Report- Fact, Fiction Fear: The Monster of the In...
David Esrati, The Blogzilla Report- Fact, Fiction Fear: The Monster of the In...David Esrati, The Blogzilla Report- Fact, Fiction Fear: The Monster of the In...
David Esrati, The Blogzilla Report- Fact, Fiction Fear: The Monster of the In...
 
Usnorthcom Internet Based Collaboration
Usnorthcom Internet Based CollaborationUsnorthcom Internet Based Collaboration
Usnorthcom Internet Based Collaboration
 
Gov + Citi-Experts
Gov + Citi-ExpertsGov + Citi-Experts
Gov + Citi-Experts
 

KĂźrzlich hochgeladen

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĂşjo
 

KĂźrzlich hochgeladen (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails

  • 1. Citizen Activism using Scrubyt and RoR
  • 2.
  • 3.  Only partially available online  Formatted as web page or PDF  Hard to search  Can’t subscribe  Can’t visualize  Can’t re-use
  • 4. Publishing Structured Data Visualization Feeds • Makes it easy to find new • Ability to subscribe to patterns. interesting data • Data streams can be ‘mashed’ in new ways. Collaborative Crowdsourcing Organization • Combines skills and input of • Tagging, Voting, Sharing large numbers of people
  • 5. • Governments publish Governments data streams publish data streams • 3rd parties create tools for analysis and oversight 3rd Issues are Party Citizens monitor data • Citizens collaboratively resolved Tools streams monitor their government • Citizens detect issues, Issues are detected give feedback • Issues are resolved
  • 6.
  • 7.  Government has little incentive ▪ Usually has disincentive Why can’t the  Don’t want a single government do monolithic solution everything? ▪ Want to allow evolution of best- of-breed tools  Tools created by citizens, for citizens
  • 8.  Focus:  US Congress  California Legislature  Gives grants to online transparency tools  $3.5 M Seed
  • 9. A recent US Congress bill Groups for Groups bill against bill
  • 11. Publishing Structured Data Visualization Feeds • MAPLight makes relationship • MAPLight is a mashup of between money and votes data streams from different visible. sources. Collaborative Crowdsourcing Organization • Thousands of journalists, • Advocacy group tags advocates, and citizens can donating companies as browse data and flag issues. belonging to interest groups.
  • 12.
  • 13.
  • 14.  Accelerate online transparency Ideas  Raise Awareness  With public  With government Skills  Raise Money  Fund External Development: Funds  Grants  Contests
  • 15. Prove Concept Get Publicity Direct Attention and Money and to Online Tools For Transparency Raise Awareness Show What’s Possible
  • 16.
  • 17.  2003 Directive: Must publish travel and hospitality expenses on the web  No standards for presentation defined
  • 18. 124 Departments - All different
  • 19.
  • 20.
  • 21.
  • 22.
  • 23. Standardize Stream Visualize • Scrape data into • Publish RSS • Provide basic standard format feeds visualization app • Run contest
  • 24.
  • 25. 1. LEARNING TEMPLATE 2. PRODUCTION SCRAPER Input Input • Example Page • Any Page with • Example Text Same Format Output: Output: • XML • Production Scraper • XML
  • 26.
  • 27.
  • 28.  Create a system where non-coders can train a scraper.
  • 29. PRO CON  Ability to use ‘learning’  Learning mode fails hard example (sometimes)  Doesn’t always learn  Syntax integrates XML builder  Supports all hpricot Xpath operations Note: For compatibility reasons, this project uses an older version of scrubyt. Issues may be fixed in newer version.
  • 30.  Create a system where non-coders can train a scraper. .... Didn’t work.
  • 31. Still need coders w/ the following expertise: 1. XPath XML resolution 2. Regular Expressions 3. Firebug
  • 32.
  • 33. 1. Open This Link 2. Paste This Text
  • 34.
  • 35. ...created in the background
  • 36. Go To Next Level
  • 37. Split Level: Two Types of Links Open This Link
  • 39. Split Level: Two Types of Links
  • 40.
  • 41. ...created in the background
  • 42. Test Random Reports Send Home
  • 43.  Goal: Finish scraping in one day  12/124 Completed: 112 to go  5-20 Volunteers  5-20 min. per department  Downloadable app w/ setup instructions  Integrated examples  Benefits:  Excuse to use scrubyt, firebug  On-site tutorial + guidance  Easy intro to a Rails App