4. If you would like a copy of
these slides the are here:
http://cl.ly/6233b0f56bb686e57b74
(or at http://twitter.com/knowtheory)
Sunday, August 29, 2010
5. Labor Rights
Rest
8
•Eight Hours for Work
•Eight Hours for Rest
•Eight Hours for What We Will! What We Will
8 Work
8
This may not be a
pattern that hackers are
all that familiar with.
Sunday, August 29, 2010
6. We trade our time and
expertise for money at work
for 8+ hours a day at work
Sunday, August 29, 2010
7. But now the 8 hours of our
free time are just as valuable
to companies as our work time.
Sunday, August 29, 2010
8. Who collects your data?
Do you know what data they collect?
What do you get in return?
Sunday, August 29, 2010
9. What do you get for your Data?
• Google: Gmail, Search
• Apple: iTunes Genius
• Amazon: Recommendation
• Last.fm: Rec’s & Neighbors
• Facebook: ??? (Your friends’
families’ crazy rants)
Sunday, August 29, 2010
10. Companies benefit from our
data and can ask and answer
questions about our behavior.
Sunday, August 29, 2010
11. We benefit indirectly,
but why can’t we benefit
directly as well?
Sunday, August 29, 2010
12. We can, if we know
where and how to look.
Sunday, August 29, 2010
16. It would be nice to analyze
our search histories, but...
Google doesn’t provide an API.
Sunday, August 29, 2010
17. But, we can search our
Google Chrome histories!
~/Library/Application Support/Google/
Chrome/Default/History
(make a copy of your History.
sqlite3 dbs are easy to corrupt)
Sunday, August 29, 2010
18. Once we have a datasource
we need to answer yes to
at least one of three questions
about the format of our source.
Sunday, August 29, 2010
19. • Does a DataMapper
Adapter already exist?
• Can you write an adapter?
• Can you write a scraper to
import your data?
Sunday, August 29, 2010
20. Does a DataMapper
Adapter already exist?
Yep! Google Chrome’s History is an
sqlite3 database!
Sunday, August 29, 2010
21. Urls Table
•A example bullet point
CREATE TABLE urls(
id INTEGER PRIMARY KEY,
• Another example here
url
title
LONGVARCHAR,
LONGVARCHAR,
visit_count INTEGER DEFAULT 0 NOT NULL,
• Some more as you want
typed_count
last_visit_time
INTEGER DEFAULT 0 NOT
INTEGER NOT NULL,
NULL,
hidden INTEGER DEFAULT 0 NOT NULL,
favicon_id INTEGER DEFAULT 0 NOT NULL
);
Querying requires us to
map data out of our
source. To do this we
have to tell DataMapper
what the source schema
is.
Sunday, August 29, 2010
22. Url model (naive)
class Url
•A example bullet point
include DataMapper::Resource
property :id, Serial # Integer, :key=>true
• Another example here
property
property
:url,
:title,
String
String
property :visit_count, Integer, :default => 0
• Some more as you want
property
property
:typed_count,
:last_visit_time,
Integer,
Integer,
:default
:required
=>
=>
0
true
property :hidden, Integer, :default => 0
property :favicon_id, Integer, :default => 0
has n, :segments
has n, :visits, :through => :segments
end
Sunday, August 29, 2010
23. Url model (naive)
class Url
•A example bullet point
include DataMapper::Resource
property :id, Serial
• Another exampleInline Validations
property
property
:url,
:title, here String
String
property :visit_count, Integer, :default => 0
• Some more as you want
property
property
:typed_count,
:last_visit_time,
Integer,
Integer,
:default
:required
=>
=>
0
true
property :hidden, Integer, :default => 0
property :favicon_id, Integer, :default => 0
has n, :segments
has n, :visits, :through => :segments
end
Sunday, August 29, 2010
24. Urls Table
•A example bullet point
CREATE TABLE urls(
id INTEGER PRIMARY KEY,
• Another example here
url
title
LONGVARCHAR,
LONGVARCHAR,
visit_count INTEGER DEFAULT 0 NOT NULL,
• Some more as you want
typed_count
last_visit_time
INTEGER DEFAULT 0 NOT
INTEGER NOT NULL,
NULL,
hidden INTEGER DEFAULT 0 NOT NULL,
favicon_id INTEGER DEFAULT 0 NOT NULL
);
Database Constraints
Sunday, August 29, 2010
25. Sanity Check
• A example bullet point
The Schemata Match! now lets test.
>> Url.first(:url => "http://rubykaigi.org/")
=> #<Url @id=1294 @url="http://rubykaigi.org/"
• Another example here
@title="RubyKaigi 2010, August 27-29"
@visit_count=8 ... >
•
>> Url.count
=> 47007
Some more as you want
>> Url.count("visit_count.lt" => 1)
=> 20
>> # wat.
Sunday, August 29, 2010
26. Url model (w/ Sanity)
class Url
lets add some business
• A example bulletrule validations
include DataMapper::Resource
point
property :id, Serial
• Another
property
property example here
:url,
:title,
String,
String
:format => :url
property :visit_count, Integer, :min => 1
• Some more as you want
property
property
:typed_count,
:last_visit_time,
Integer,
Integer,
:default
:required
=>
=>
0
true
property :hidden, Integer, :default => 0
property :favicon_id, Integer, :default => 0
has n, :segments
has n, :visits, :through => :segments
end
Sunday, August 29, 2010
27. Data Manipulation
class Url
•A example
include DataMapper::Resource
bullet point
require ‘dm-types’
property :id, Serial
• Another example here
property
property
:url,
:title,
URI,
String
:format => :url
property :visit_count, Integer, :min => 1
• Some more as you want
property
property
:typed_count,
:last_visit_time,
Integer,
Integer,
:default
:required
=>
=>
0
true
property :hidden, Integer, :default => 0
property :favicon_id, Integer, :default => 0
has n, :segments
has n, :visits, :through => :segments
end
Sunday, August 29, 2010
28. Data Manipulation
>> u = Url.first("url.like" => "%rubykaigi%")
=> #<Url @id=1294 @url=#<Addressable::URI:
• A example bullet point
0x81c7a1b0 URI:http://rubykaigi.com/
@title="RubyKaigi 2010, August 27-29"
• Another example here
@last_visit_time=12927095498867853 ...>
>> u.url
•
rubykaigi.com/> Some more as you want
=> #<Addressable::URI:0x81c7a1b0 URI:http://
>> u.url.host
=> "rubykaigi.com" # oops, .org is canonical
>> u.url.host = "rubykaigi.org"; u.url
=> #<Addressable::URI:0x81ccfdf4 URI:http://
rubykaigi.org/>
Sunday, August 29, 2010
29. Data Manipulation
>> u = Url.first("url.like" => "%rubykaigi%")
=> #<Url @id=1294 @url=#<Addressable::URI:
• A example bullet point
0x81c7a1b0 URI:http://rubykaigi.com/
@title="RubyKaigi 2010, August 27-29"
• Another example here
@last_visit_time=12927095498867853 ...>
>> u.last_visit_time
Some more as you want
=> 12927095498867853 # wtf is this?
•
Sunday, August 29, 2010
30. Urls Table
CREATE TABLE urls(
id •A example bullet point
INTEGER PRIMARY KEY,
url LONGVARCHAR,
title
visit_count
• Another example here
LONGVARCHAR,
INTEGER DEFAULT 0 NOT NULL,
typed_count INTEGER DEFAULT 0 NOT NULL,
hidden
• Some more as you want
last_visit_time INTEGER NOT NULL,
INTEGER DEFAULT 0 NOT NULL,
favicon_id INTEGER DEFAULT 0 NOT NULL
);
Not a lot of clues here...
Okay, it’s an integer time, but it’s also freaking huge:
12927095498867853?
Sunday, August 29, 2010
31. chromium/src/base/time.h
•A example bullet point
// Time represents an absolute point
Another(s/1,000,000) since
example here
//• in time, internally represented as
// microseconds
Some more as with other you want
//• a platform-dependent epoch. Each
// platform's epoch, along
// system-dependent clock interface
// routines, is defined in time_PLATFORM.cc.
Sunday, August 29, 2010
32. chromium/src/base/time_mac.cc
// Core Foundation uses a double second
// • A example bullet point
count since 2001-01-01 00:00:00 UTC.
// The UNIX epoch is 1970-01-01 00:00:00 UTC.
//
//
• Another example here
Windows uses a Gregorian epoch of 1601.
We need to match this internally
//
//
so •
Some more as you want
that our time representations match across
all platforms. See bug 14734.
// irb(main):010:0> Time.at(0).getutc()
// => Thu Jan 01 00:00:00 UTC 1970
// irb(main):011:0> Time.at(-11644473600).getutc()
// => Mon Jan 01 00:00:00 UTC 1601
Examples already in Ruby? Nice.
Sunday, August 29, 2010
33. Url model v2 (lib types)
class Url
•A example
include DataMapper::Resource
bullet point
write ChromeEpochTime
property :id, Serial
property
property
• Another example here
:url,
:title,
URI,
String
:format => :url
property :visit_count, Integer, :min => 1
property
property
• Some more as you want
:typed_count,
:last_visit_time,
Integer,
ChromeEpochTime,
:default
:required
=>
=>
0
true
property :hidden, Integer, :default => 0
property :favicon_id, Integer, :default => 0
has n, :segments
has n, :visits, :through => :segments
end
Sunday, August 29, 2010
34. chrome_epoch_time.rb
module DataMapper
class Property
•A example bullet point
class ChromeEpochTime < Integer
def load(value)
• Another example here
return value unless value.respond_to?(:to_i)
::Time.at((value/10**6)-11644473600)
end
• Some more as you want
def dump(value)
case value
when ::Integer, ::Time then (value.to_i + 11644473600) * 10**6
when ::DateTime then (value.to_time.to_i + 11644473600) * 10**6
end
end
end # class ChromeEpochTime
end # class Property
end # module DataMapper
Sunday, August 29, 2010
35. Data Manipulation
>> u = Url.first("url.like" => "%rubykaigi.com%")
=> #<Url @id=42846 @url=#<Addressable::URI:
• A example bullet point
0x81e232f0 URI:http://rubykaigi.com/
@title="RubyKaigi 2010, August 27-29"
• Another example here
@last_visit_time=Tue Aug 24 12:51:38 +0900
2010 ...>
Some more as you want
>> u.last_visit_time
• Aug 24 12:51:38 0900 2010
=> Tue
Sunday, August 29, 2010
36. Histograms, yay! (Analysis)
•A example bullet point
hour_histogram = example here
• Another Hash.new(0)
Visit.all.map do |v|
• Some more as you want
hour_histogram[v.visit_time.hour] += 1
end
Sunday, August 29, 2010
37. Over what span of time?
•A example bullet point
• Another example here
>> Visit.first.visit_time
• Some more as you want
=> Fri May 28 17:04:39 0900 2010
>> Visit.last.visit_time
=> Thu Aug 26 01:51:32 0900 2010
Sunday, August 29, 2010
39. More Histograms, yay!
•A
example bullet point
• Another example here
ruby_doc = Url.all("url.like" => "%ruby-doc%");
hour_histogram = Hash.new(0)
• Some more as you want
ruby_doc.visits.map do |v|
hour_histogram[v.visit_time.hour] += 1
end
Sunday, August 29, 2010
40. Aggregate Browsing for ruby-doc.org by Hour
50
37.5
25
12.5
0
Midnight 3am 6am 9am Noon 3pm 6pm 9pm
Sunday, August 29, 2010
41. But what happens when
We have a data source
which isn’t well behaved?
Sunday, August 29, 2010
42. "Does Edge have an anti-PS3 bias?"
http://arstechnica.com/civis/viewtopic.php?f=22&t=62024
Last year a thread on Ars Technica titled
"Does Edge have an anti-PS3 bias?"
resulted in a flame war erupted bet ween
PS3 fans and Xbox360 fans over whether
or not PS3 was receiving unfair
treatment, particularly held up against a
game's score on metacritic.com.
Sunday, August 29, 2010
48. Yeah, that’s not pretty.
•A example bullet point
def scores_for(game)
game_page = case
when (game.is_a? String)
begin
Nokogiri::HTML(open(game))
rescue
puts "[FAIL] Failed to open #{game}"
• Another example here
break
end
when (game.is_a? Nokogiri::HTML::Document)
game
else
raise StandardError, "you need to provide either a url, or a nokogiri document"
• Some more as you want
end
page_title = game_page.css('title').text
junk, title, platform, year = page_title.match(/^(.+)s*((#{PLATFORMS.join("|")}): (d+)): Reviews$/).to_a
title.strip!
metascore = game_page.css('table#scoretable img').select{ |i| /Metascore:/ =~ i.attributes['alt'] }.first.attributes['alt'].to_s.split.last
puts "[WIN] #{title} on the #{platform} (#{year}) has a score of #{metascore}"
#review_count = game_page.to_s.match(/based on <b>(d+) reviews/).to_a.last
reviews = game_page.css('div.scoreandreview')
review_count = reviews.size
checksum = game_page.to_s.match(/based on <b>(d+) reviews/).to_a.last.to_i
checksum_message = "Number of Reviews on the page not equal to the claimed number of reviews"
raise StandardError, checksum_message unless review_count == checksum
scores = reviews.map do |review|
score = review.css('div.criticscore').text
pub = review.css('span.publication').text
[score,pub]
end
return { :title =>title.strip, :metascore => metascore, :platform => platform, :publish_year => year, :reviews => scores }
end
Sunday, August 29, 2010
50. Models
class Game
include DataMapper::Resource class ReviewPublisher
include DataMapper::Resource
•A example bullet point
property :id, Serial
property :title, String, :length=>255 property :id, Serial
property :platform, String
property :release_date, DateTime property :name, String, :length => 255
property :esrb_rating, String
• Another example here
property :metascore, Float has n, :reviews, :model => "Game::Review"
property :review_count, Integer has n, :games, :through => :reviews
property :created_at, DateTime
end
property :updated_at, DateTime
class Review
• Some more as you want
include DataMapper::Resource
property :game_id, Integer, :key => true
property :review_publisher_id, Integer, :key => true
property :score, Integer
belongs_to :review_publisher
belongs_to :game
end
class Developer
include DataMapper::Resource
property :id, Serial
property :name, String, :length => 255
has n, :games
end
end
Sunday, August 29, 2010
51. Student’s T-Test (Analysis!)
def t_value(prop1, collection1, prop2, collection2)
c1_std
c1_avg
•A
example bullet point
= collection1.std(prop1)
= collection1.avg(prop1)
• Another example here
c1_count = collection1.count
c2_std = collection2.std(prop2)
• Some more as you want
c2_avg = collection2.avg(prop2)
c2_count = collection2.count
(c1_avg - c2_avg) /
Math.sqrt(
(c1_std**2 / c1_count)+(c2_std**2 / c2_count)
)
end
Sunday, August 29, 2010
52. PS3 Reviewers vs Metascore
•A
example bullet point
outlets = ReviewPublisher.all("games.platform"=>"ps3")
t_scores = outlets.map do |outlet|
• Another example here
t_value(:metascore, outlet.games(:platform=>"ps3"),
:score, outlet.reviews("game.platform"=>"ps3"))
end # .size => 140
significant = t_scores.select do you want
• Some more as |t|
(t > 1.96 or t < -1.96) and not t.infinite?
end
low = significant.select{ |s| s < -1.96} # .size => 20
high = significant.select{ |s| s > 1.96} # .size => 10
Sunday, August 29, 2010
53. Xbox360 Reviewers vs Metascore
•A
example bullet point
outlets = ReviewPublisher.all("games.platform"=>"xbox360")
t_scores = outlets.map do |outlet|
• Another example here
t_value(:metascore, outlet.games(:platform=>"xbox360"),
:score, outlet.reviews("game.platform"=>"xbox360"))
end # .size => 169
• Some more as you want
significant = t_scores.select do |t|
(t > 1.96 or t < -1.96) and not t.infinite?
end
low = significant.select{ |s| s < -1.96} # .size => 37
high = significant.select{ |s| s > 1.96} # .size => 29
Sunday, August 29, 2010
54. What about Edge Magazine?
>>
• A =example bullet point
outlet ReviewPublisher.first("name.like"=>"%Edge%")
• Another
example here
=> #<ReviewPublisher @id=36 @name="Edge Magazine">
>> t = t_value(:metascore, outlet.games
(:platform=>"ps3"), :score, outlet.reviews
• Some more as you want
("game.platform"=>"ps3"))
=> 5.10786212293491
>> t > 1.96
=> true # Edge has a PRO PS3 bias, not Anti!
Sunday, August 29, 2010
55. There are lots of other possibilities!
What would you like to learn?
Sunday, August 29, 2010
56. Learn about DataMapper perhaps?
http://www.datamapper.org
irc://irc.freenode.net#datamapper
Sunday, August 29, 2010
57. Thanks!
@knowtheory
ted@knowtheory.net
Sunday, August 29, 2010