3. ďĄ Only partially available
online
ď§ Formatted as web page or
PDF
ďĄ Hard to search
ďĄ Canât subscribe
ďĄ Canât visualize
ďĄ Canât re-use
4. Publishing Structured Data Visualization
Feeds ⢠Makes it easy to find new
⢠Ability to subscribe to patterns.
interesting data
⢠Data streams can be âmashedâ
in new ways.
Collaborative Crowdsourcing
Organization ⢠Combines skills and input of
⢠Tagging, Voting, Sharing large numbers of people
5. ⢠Governments publish
Governments data streams
publish data
streams
⢠3rd parties create tools for
analysis and oversight
3rd
Issues are
Party
Citizens
monitor data
⢠Citizens collaboratively
resolved
Tools streams monitor their
government
⢠Citizens detect issues,
Issues are
detected
give feedback
⢠Issues are resolved
6.
7. ď§ Government has little
incentive
⪠Usually has disincentive
Why canât the
ď§ Donât want a single
government do monolithic solution
everything? ⪠Want to allow evolution of best-
of-breed tools
ď§ Tools created by citizens, for
citizens
8. ďĄ Focus:
ď§ US Congress
ď§ California
Legislature
ďĄ Gives grants to
online
transparency
tools
ďĄ $3.5 M Seed
9. A recent US
Congress bill
Groups for Groups
bill against bill
11. Publishing Structured Data Visualization
Feeds ⢠MAPLight makes relationship
⢠MAPLight is a mashup of between money and votes
data streams from different visible.
sources.
Collaborative Crowdsourcing
Organization ⢠Thousands of journalists,
⢠Advocacy group tags advocates, and citizens can
donating companies as browse data and flag issues.
belonging to interest groups.
12.
13.
14. ďĄ Accelerate online transparency
Ideas ďĄ Raise Awareness
ď§ With public
ď§ With government
Skills ďĄ Raise Money
ďĄ Fund External Development:
Funds ď§ Grants
ď§ Contests
15. Prove
Concept
Get Publicity Direct Attention and
Money and to Online
Tools For Transparency
Raise
Awareness
Show Whatâs
Possible
16.
17. ďĄ 2003 Directive: Must
publish travel and
hospitality expenses
on the web
ďĄ No standards for
presentation defined
23. Standardize Stream Visualize
⢠Scrape data into ⢠Publish RSS ⢠Provide basic
standard format feeds visualization app
⢠Run contest
24.
25. 1. LEARNING TEMPLATE 2. PRODUCTION SCRAPER
Input Input
⢠Example Page ⢠Any Page with
⢠Example Text Same Format
Output: Output:
⢠XML
⢠Production Scraper
⢠XML
26.
27.
28. ďĄ Create a system
where non-coders
can train a scraper.
29. PRO CON
ďĄ Ability to use âlearningâ ďĄ Learning mode fails hard
example (sometimes)
ďĄ Doesnât always learn
ďĄ Syntax integrates XML
builder
ďĄ Supports all hpricot Xpath
operations
Note: For compatibility reasons, this project uses an older version of scrubyt.
Issues may be fixed in newer version.
30. ďĄ Create a system
where non-coders
can train a scraper.
.... Didnât work.
31. Still need coders w/ the following expertise:
1. XPath XML resolution
2. Regular Expressions
3. Firebug
43. ďĄ Goal: Finish scraping in one day
ď§ 12/124 Completed: 112 to go
ď§ 5-20 Volunteers
ď§ 5-20 min. per department
ď§ Downloadable app w/ setup instructions
ď§ Integrated examples
ďĄ Benefits:
ď§ Excuse to use scrubyt, firebug
ď§ On-site tutorial + guidance
ď§ Easy intro to a Rails App