SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Learning from Web Activity
Jake Hofman
Yahoo! Research
November 18, 2010
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 1 / 33
Outline
1 Agenda: Just enough philosophy
2 Case study: Demographic diversity on the Web
3 Conclusion: Lessons learned
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 2 / 33
Agenda
Size (only kind of) matters
Big Data
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 3 / 33
Agenda
Size (only kind of) matters
Big Data
Lots of data means lots to learn (from)
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 3 / 33
Agenda
Size (only kind of) matters
Big Data
But the “big” part isn’t intrinsically interesting
(although large sample sizes are always good)
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 3 / 33
Agenda
Size (only kind of) matters
Big Data
Regardless of size, it’s really about “data jeopardy”
(To what question are these data the answer?)
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 3 / 33
Agenda
Tools
Data tools:
• Shell scripting & Python
Munging, Glue
• R
Modeling, Visualization
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 4 / 33
Agenda
Tools
Big Data tools:
• Hadoop & Pig
Filtering, Aggregating
• Shell scripting & Python
Munging, Glue
• R
Modeling, Visualization
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 4 / 33
Agenda
The clean real story
“We have a habit in writing articles published in
scientific journals to make the work as finished as
possible, to cover all the tracks, to not worry about the
blind alleys or to describe how you had the wrong idea
first, and so on. So there isn’t any place to publish, in
a dignified manner, what you actually did in order to
get to do the work ...”
-Richard Feynman
Nobel Lecture1, 1965
1
http://bit.ly/feynmannobel
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 5 / 33
Outline
1 Agenda: Just enough philosophy
2 Case study: Demographic diversity on the Web
3 Conclusion: Lessons learned
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 6 / 33
Demographic diversity on the Web
The clean story
(covering our tracks)
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 7 / 33
Demographic diversity on the Web
with Irmak Sirer and Sharad Goel
How diverse is the Web?
To what extent do online experiences vary across demographic
groups?
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 8 / 33
Diversity of the Web
Data
• Representative sample of 265,000 individuals in the US, paid
via the Nielsen MegaPanel2
• Log of anonymized, complete browsing activity from June
2009 through May 2010 (URLs viewed, timestamps, etc.)
• Detailed individual and household demographic information
(age, education, income, race, sex, etc.)
2
http://bit.ly/nielsenonline
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 9 / 33
Diversity of the Web
Data
• Transform all demographic attributes to binary variables
e.g., Age → Over/Under 25, Race → White/Non-White,
Sex → Female/Male
• Normalize pageviews to at most three domain levels, sans www
e.g. www.yahoo.com → yahoo.com,
us.mg2.mail.yahoo.com/neo/launch → mail.yahoo.com
• Restrict to top 100k most popular sites
• Aggregate activity at the site, group, and user levels
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 10 / 33
Diversity of the Web
Pig to the rescue
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 11 / 33
Diversity of the Web
Site-level skew
How diverse are site audiences?
• For each site and attribute,
calculate the skew in visitors
(e.g., 93% of pageviews on
foxnews.com are by White
users)
• For each attribute, plot the
distribution of visitor skew
across all sites
Proportion White Visitors
Density
0.0 0.2 0.4 0.6 0.8 1.0
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 12 / 33
Diversity of the Web
Site-level skew
Proportion Female Visitors
Density
0.0 0.2 0.4 0.6 0.8 1.0
Proportion White VisitorsDensity
0.0 0.2 0.4 0.6 0.8 1.0
Proportion College Educated Visitors
Density
0.0 0.2 0.4 0.6 0.8 1.0
Proportion Adult Visitors
Density
0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Visitors With
Household Incomes Under $50,000
Density
0.0 0.2 0.4 0.6 0.8 1.0
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 13 / 33
Diversity of the Web
Site-level skew
Many sites have skew close the average, but there also popular,
highly-skewed sites
Greater Than 90% Less Than 10%
Female
youravon.com
collectionsetc.com
coveritlive.com
needlive.com
White
foxnews.com
wunderground.com
blackplanet.com
mediatakeout.com
College Educated
news.google.com
nytimes.com
slumz.boxden.com
sythe.com
Over 25 Years Old
mail.yahoo.com
apps.facebook.com
nanowrimo.org
cbox.ws
Household Income
Under $50,000
scarleteen.com
boards.adultswim.com
opentable.com
marketwatch.com
Table 1: A selection of popular sites that are homogeneous along various demographic dimensions.
ilyPer−CapitaPageviews
20
30
40
50
60
70
!
!
!
!Non−White
Male
Non−White
Male
No College
Under 25
No College
Under 25
White
Female
White
FemaleCollege
Over 25
College
Over 25
visually apparent from Figure 5, there are significant differ-
ences in how groups distribute their time on the web. These
differences—which, as mentioned above, hold for highly fre-
quented sites such as Facebook and YouTube—are in some
cases even more pronounced for lower traffic sites. For in-
stance, the gaming site pogo.com accounts for less than 1%
of pageviews among both low and high income users, but
low income users spend almost twice as much of their time
there.Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 14 / 33
Diversity of the Web
Site-level skew
Many sites have skew close the average, but there also popular,
highly-skewed sites
Greater Than 90% Less Than 10%
Female
youravon.com
collectionsetc.com
coveritlive.com
needlive.com
White
foxnews.com
wunderground.com
blackplanet.com
mediatakeout.com
College Educated
news.google.com
nytimes.com
slumz.boxden.com
sythe.com
Over 25 Years Old
mail.yahoo.com
apps.facebook.com
nanowrimo.org
cbox.ws
Household Income
Under $50,000
scarleteen.com
boards.adultswim.com
opentable.com
marketwatch.com
Table 1: A selection of popular sites that are homogeneous along various demographic dimensions.
ilyPer−CapitaPageviews
20
30
40
50
60
70
!
!
!
!Non−White
Male
Non−White
Male
No College
Under 25
No College
Under 25
White
Female
White
FemaleCollege
Over 25
College
Over 25
visually apparent from Figure 5, there are significant differ-
ences in how groups distribute their time on the web. These
differences—which, as mentioned above, hold for highly fre-
quented sites such as Facebook and YouTube—are in some
cases even more pronounced for lower traffic sites. For in-
stance, the gaming site pogo.com accounts for less than 1%
of pageviews among both low and high income users, but
low income users spend almost twice as much of their time
there.
This skew persists even when we restrict attention to the top 10k
or 1k sites
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 14 / 33
Diversity of the Web
Sites vs. ZIPs
How do diversity of the online and offline worlds compare?
Proportion Female
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
Proportion White
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
Proportion College Educated
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 15 / 33
Diversity of the Web
Sites vs. ZIPs
How do diversity of the online and offline worlds compare?
Proportion Female
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
Proportion White
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
Proportion College Educated
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
As expected, neighborhoods are more gender-balanced than sites
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 15 / 33
Diversity of the Web
Sites vs. ZIPs
How do diversity of the online and offline worlds compare?
Proportion Female
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
Proportion White
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
Proportion College Educated
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
But sites typically have more racially diverse audiences than
neighborhoods have residents
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 15 / 33
Diversity of the Web
Sites vs. ZIPs
How do diversity of the online and offline worlds compare?
Proportion Female
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
Proportion White
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
Proportion College Educated
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
Skew by education is comparable, with online showing a bias
towards higher education
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 15 / 33
Diversity of the Web
Group-level activity
How does browsing activity vary at the group level?
DailyPer−CapitaPageviews
0
10
20
30
40
50
60
70
q
q
q
qNon−White
Male
Non−White
Male
No College
Under 25
No College
Under 25
White
Female
White
FemaleCollege
Over 25
College
Over 25
Race Education Sex Age
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 16 / 33
Diversity of the Web
Group-level activity
How does browsing activity vary at the group level?
DailyPer−CapitaPageviews
0
10
20
30
40
50
60
70
q
q
q
qNon−White
Male
Non−White
Male
No College
Under 25
No College
Under 25
White
Female
White
FemaleCollege
Over 25
College
Over 25
Race Education Sex Age
Large differences exist even at the aggregate level
(e.g. women on average generate 40% more pageviews than men)
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 16 / 33
Diversity of the Web
Group-level activity
All groups spend more than a third of their time on a handful of
email, search, and social networking sites
PercentofTotalTimeSpentonSite
0.1%
1%
10%
facebook.com
m
ail.yahoo.com
google.com
apps.facebook.com
m
ail.google.com
m
ail.live.com
youtube.com
w
ebm
ail.aol.com
m
w
fb.zynga.com
channel.facebook.com
view
m
orepics.m
yspace.com
search.yahoo.com
m
yspace.com
m
sn.com
am
azon.com
shop.ebay.com
yahoo.com
im
ages.google.com
hom
e.m
yspace.com
m
ail.com
cast.net
bing.com
w
w
w
.yahoo.com
cgi.ebay.com
espn.go.com
m
essaging.m
yspace.com
tw
itter.com
cim
.m
eebo.com
m
y.ebay.com
en.w
ikipedia.org
login.yahoo.com
facebook.m
afiawars.com
m
y.yahoo.com
gam
e3.pogo.com
friends.m
yspace.com
tagged.com
w
orldw
inner.com
m
eebo.com
login.live.com
m
ypoints.com
m
aps.google.com
aol.com
pogo.com
m
w
m
s.zynga.com
new
s.yahoo.com
w
inster.com
netflix.com
fantasysports.yahoo.com
search.aol.com
com
cast.net
alotm
etrics.com
female
male
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 17 / 33
Diversity of the Web
Group-level activity
But different groups distribute their time differently, both on
universally popular and on more niche sites
PercentofTotalTimeSpentonSite
0.1%
1%
10%
facebook.com
m
ail.yahoo.com
google.com
apps.facebook.com
m
ail.google.com
m
ail.live.com
youtube.com
w
ebm
ail.aol.com
m
w
fb.zynga.com
channel.facebook.com
view
m
orepics.m
yspace.com
search.yahoo.com
m
yspace.com
m
sn.com
am
azon.com
shop.ebay.com
yahoo.com
im
ages.google.com
hom
e.m
yspace.com
m
ail.com
cast.net
bing.com
w
w
w
.yahoo.com
cgi.ebay.com
espn.go.com
m
essaging.m
yspace.com
tw
itter.com
cim
.m
eebo.com
m
y.ebay.com
en.w
ikipedia.org
login.yahoo.com
facebook.m
afiawars.com
m
y.yahoo.com
gam
e3.pogo.com
friends.m
yspace.com
tagged.com
w
orldw
inner.com
m
eebo.com
login.live.com
m
ypoints.com
m
aps.google.com
aol.com
pogo.com
m
w
m
s.zynga.com
new
s.yahoo.com
w
inster.com
netflix.com
fantasysports.yahoo.com
search.aol.com
com
cast.net
alotm
etrics.com
female
male
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 17 / 33
Diversity of the Web
Group-level activity
But different groups distribute their time differently, both on
universally popular and on more niche sites
PercentofTotalTimeSpentonSite
0.1%
1%
10%
facebook.com
m
ail.yahoo.com
google.com
apps.facebook.com
m
ail.google.com
m
ail.live.com
youtube.com
w
ebm
ail.aol.com
m
w
fb.zynga.com
channel.facebook.com
view
m
orepics.m
yspace.com
search.yahoo.com
m
yspace.com
m
sn.com
am
azon.com
shop.ebay.com
yahoo.com
im
ages.google.com
hom
e.m
yspace.com
m
ail.com
cast.net
bing.com
w
w
w
.yahoo.com
cgi.ebay.com
espn.go.com
m
essaging.m
yspace.com
tw
itter.com
cim
.m
eebo.com
m
y.ebay.com
en.w
ikipedia.org
login.yahoo.com
facebook.m
afiawars.com
m
y.yahoo.com
gam
e3.pogo.com
friends.m
yspace.com
tagged.com
w
orldw
inner.com
m
eebo.com
login.live.com
m
ypoints.com
m
aps.google.com
aol.com
pogo.com
m
w
m
s.zynga.com
new
s.yahoo.com
w
inster.com
netflix.com
fantasysports.yahoo.com
search.aol.com
com
cast.net
alotm
etrics.com
white
non.white
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 17 / 33
PercentofTotalTimeSpentonSite
0.1%
1%
10%
0.1%
1%
10%
0.1%
1%
10%
0.1%
1%
10%
0.1%
1%
10%
facebook.com
m
ail.yahoo.com
google
.com
apps.facebook.com
m
ail.google
.com
m
ail.live.com
youtube.com
w
ebm
ail.aol.com
m
w
fb.zynga.com
channel.facebook.com
vie
w
m
orepic
s.m
yspace.com
search.yahoo.com
m
yspace.com
m
sn.com
am
azon.com
shop.ebay.com
yahoo.com
im
ages.google
.com
hom
e.m
yspace.com
m
ail.com
cast.net
bin
g.com
w
w
w
.yahoo.com
cgi.ebay.com
espn.go.com
m
essagin
g.m
yspace.com
tw
itter.com
cim
.m
eebo.com
m
y.ebay.com
en.w
ik
ip
edia
.org
lo
gin
.yahoo.com
facebook.m
afia
wars.com
m
y.yahoo.com
gam
e3.pogo.com
frie
nds.m
yspace.com
tagged.com
w
orld
w
in
ner.com
m
eebo.com
lo
gin
.live.com
m
ypoin
ts.com
m
aps.google
.comaol.com
pogo.com
m
w
m
s.zynga.com
new
s.yahoo.com
w
in
ster.com
netflix.com
fantasysports.yahoo.com
search.aol.com
com
cast.net
alo
tm
etric
s.com
AgeSexRaceEducationIncome
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 18 / 33
Diversity of the Web
Individual-level prediction
How well can one predict an individual’s demographics from their
browsing activity?
• Represent each user by the set of sites visited
• Fit linear models to predict majority/minority for each
attribute on 80% of users
• Tune model parameters using a 10% validation set
• Evaluate final performance on held-out 10% test set
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 19 / 33
Diversity of the Web
GNU-fu
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 20 / 33
Diversity of the Web
Individual-level prediction
• Reasonable (∼70-85%)
accuracy and AUC across all
attributes
• Similar performance even
when restricted to top 1k
sites
• Can achieve substantially
better performance when
restricted to “stereotypical”
users (∼80-90%)
College/No College
Under/Over $50,000
Household Income
White/Non−White
Female/Male
Over/Under 25
Years Old
AUC
q
q
q
q
q
.5 .6 .7 .8 .9 1
Accuracy
q
q
q
q
q
.5 .6 .7 .8 .9 1
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 21 / 33
Diversity of the Web
Individual-level prediction
Highly-weighted sites under the fitted models
Large positive weight Large negative weight
Female
winster.com
lancome-usa.com
sports.yahoo.com
espn.go.com
White
marlboro.com
cmt.com
mediatakeout.com
bet.com
College Educated
news.yahoo.com
linkedin.com
youtube.com
myspace.com
Over 25 Years Old
evite.com
classmates.com
addictinggames.com
youtube.com
Household Income
Under $50,000
eharmony.com
tracfone.com
rownine.com
matrixdirect.com
Table 2: A selection of the most predictive (i.e., most highly weighted) sites for each classification task.
College/No College
Under/Over $50,000
Household Income
White/Non−White
Female/Male
Over/Under 25
Years Old
AUC
!
!
!
!
!
Accuracy
!
!
!
!
!
Figure 7, a measure that effectively re-normalizes the ma-
jority and minority classes to have equal size. Intuitively,
AUC is the probability that a model scores a randomly se-
lected positive example higher than a randomly selected neg-
ative one (e.g., the probability that the model correctly dis-
tinguishes between a randomly selected female and male).
Though an uninformative rule would correctly discriminateJake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 22 / 33
Diversity of the Web
Individual-level prediction
Proof of concept browser demo
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 23 / 33
Diversity of the Web
Individual-level prediction
Proof of concept browser demo
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 23 / 33
Diversity of the Web
The real story
(what we actually did)
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 24 / 33
Diversity on the Web
The real story
• Got several hundred GBs of MegaPanel data from Nielsen3
3
Special thanks to Mainak Mazumdar
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 25 / 33
Diversity on the Web
The real story
• Got several hundred GBs of MegaPanel data from Nielsen3
• Discussed possible projects
• Predict user demographics (e.g. real-valued age) from a few
minutes of browsing activity for ad-targeting?
• Infer the number of individuals using the same browser or
behind the same ip?
• Determine number of actual uniques advertisers are receiving?
• . . .
3
Special thanks to Mainak Mazumdar
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 25 / 33
Diversity on the Web
The real story (cont’d)
• Started with predicting real-valued age
• Worked on this for an embarassingly long time
(various methods, feature selection, etc.)
• Turns out to be difficult to do better than within 10 years of
true age, on average
• Settled for classification on binary outcomes (e.g.,
adult/non-adult) over entire history
• Classification worked reasonably well for age and other
attributes
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 26 / 33
Diversity on the Web
The real story (cont’d)
• Became curious about why classification worked well
compared to regression
• Generated descriptive statistics across all attributes at the site
and group levels
• Compared site statistics to ZIP code data from the US Census
• Compared time distribution across groups
• Realized that we now had the largest comprehensive study of
demographic diversity on the web
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 27 / 33
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity

Weitere ähnliche Inhalte

Was ist angesagt?

Extreme Democracy: Strategy
Extreme Democracy: StrategyExtreme Democracy: Strategy
Extreme Democracy: StrategyPaul Schumann
 
Career of mark zuckerberg
Career of mark zuckerbergCareer of mark zuckerberg
Career of mark zuckerbergnayanbanik
 
Period 3 Michael Murtaugh The Negative Effects of Technology on Teenagers
Period 3 Michael Murtaugh The Negative Effects of Technology on TeenagersPeriod 3 Michael Murtaugh The Negative Effects of Technology on Teenagers
Period 3 Michael Murtaugh The Negative Effects of Technology on Teenagersmrsalcido
 
Activism
ActivismActivism
Activismsunnyuf
 
How the Net aids dictatorships
How the Net aids dictatorshipsHow the Net aids dictatorships
How the Net aids dictatorshipsevgeny.morozov
 
A guide-to-using-facebook-in-dissemination
A guide-to-using-facebook-in-disseminationA guide-to-using-facebook-in-dissemination
A guide-to-using-facebook-in-disseminationFHI 360
 
Authoritarian Governments in Cyberspace
Authoritarian Governments in CyberspaceAuthoritarian Governments in Cyberspace
Authoritarian Governments in Cyberspaceevgeny.morozov
 
Independent Journalism: Doing good and doing well.
Independent Journalism: Doing good and doing well.Independent Journalism: Doing good and doing well.
Independent Journalism: Doing good and doing well.Kevin Anderson
 
Going beyond google 2 philadelphia loss conference
Going beyond google 2 philadelphia loss conferenceGoing beyond google 2 philadelphia loss conference
Going beyond google 2 philadelphia loss conferencemikep007
 
Not Evenly Distributed
Not Evenly DistributedNot Evenly Distributed
Not Evenly DistributedJason Griffey
 
Outreach for law librarians
Outreach for law librariansOutreach for law librarians
Outreach for law librariansMeg Kribble
 
MARK ZUCKERBERG presentation
MARK ZUCKERBERG presentationMARK ZUCKERBERG presentation
MARK ZUCKERBERG presentationNaitik Patel
 

Was ist angesagt? (19)

Freakanomics
FreakanomicsFreakanomics
Freakanomics
 
Extreme Democracy: Strategy
Extreme Democracy: StrategyExtreme Democracy: Strategy
Extreme Democracy: Strategy
 
Career of mark zuckerberg
Career of mark zuckerbergCareer of mark zuckerberg
Career of mark zuckerberg
 
Period 3 Michael Murtaugh The Negative Effects of Technology on Teenagers
Period 3 Michael Murtaugh The Negative Effects of Technology on TeenagersPeriod 3 Michael Murtaugh The Negative Effects of Technology on Teenagers
Period 3 Michael Murtaugh The Negative Effects of Technology on Teenagers
 
Activism
ActivismActivism
Activism
 
How the Net aids dictatorships
How the Net aids dictatorshipsHow the Net aids dictatorships
How the Net aids dictatorships
 
Website Usability Study
Website Usability StudyWebsite Usability Study
Website Usability Study
 
FACEBOOK
FACEBOOKFACEBOOK
FACEBOOK
 
Digital media
Digital mediaDigital media
Digital media
 
A guide-to-using-facebook-in-dissemination
A guide-to-using-facebook-in-disseminationA guide-to-using-facebook-in-dissemination
A guide-to-using-facebook-in-dissemination
 
Authoritarian Governments in Cyberspace
Authoritarian Governments in CyberspaceAuthoritarian Governments in Cyberspace
Authoritarian Governments in Cyberspace
 
Independent Journalism: Doing good and doing well.
Independent Journalism: Doing good and doing well.Independent Journalism: Doing good and doing well.
Independent Journalism: Doing good and doing well.
 
Going beyond google 2 philadelphia loss conference
Going beyond google 2 philadelphia loss conferenceGoing beyond google 2 philadelphia loss conference
Going beyond google 2 philadelphia loss conference
 
Not Evenly Distributed
Not Evenly DistributedNot Evenly Distributed
Not Evenly Distributed
 
Outreach for law librarians
Outreach for law librariansOutreach for law librarians
Outreach for law librarians
 
MARK ZUCKERBERG presentation
MARK ZUCKERBERG presentationMARK ZUCKERBERG presentation
MARK ZUCKERBERG presentation
 
Facebook
FacebookFacebook
Facebook
 
TLA 2009
TLA 2009TLA 2009
TLA 2009
 
Mark Zuckerberg
Mark ZuckerbergMark Zuckerberg
Mark Zuckerberg
 

Andere mochten auch

Charter of demands -From the Resident's of Dwarka,Delhi Sub city
Charter of demands -From the Resident's of  Dwarka,Delhi Sub cityCharter of demands -From the Resident's of  Dwarka,Delhi Sub city
Charter of demands -From the Resident's of Dwarka,Delhi Sub cityMadhukar Varshney
 
Maherprofessional c vbio216
Maherprofessional c vbio216Maherprofessional c vbio216
Maherprofessional c vbio216Pat Maher
 
Dwarka Forum Organizes Interaction with Candidates from Matiala Constituency
Dwarka Forum Organizes Interaction with Candidates from Matiala Constituency Dwarka Forum Organizes Interaction with Candidates from Matiala Constituency
Dwarka Forum Organizes Interaction with Candidates from Matiala Constituency Madhukar Varshney
 
Xay dung co so du lieu chi phi san xuat lua
Xay dung co so du lieu chi phi san xuat luaXay dung co so du lieu chi phi san xuat lua
Xay dung co so du lieu chi phi san xuat luaHo Cao Viet
 
C:\Documents And Settings\Lakshmi Menon\My Documents\Twinkling Stars
C:\Documents And Settings\Lakshmi Menon\My Documents\Twinkling StarsC:\Documents And Settings\Lakshmi Menon\My Documents\Twinkling Stars
C:\Documents And Settings\Lakshmi Menon\My Documents\Twinkling StarsLAKSHMI A R
 
Observations made by the group (Edited)
Observations made by the group (Edited)Observations made by the group (Edited)
Observations made by the group (Edited)d3ath2u
 
Intro computer
Intro computerIntro computer
Intro computerprajug2503
 
Los Gusanos De Seda
Los Gusanos De SedaLos Gusanos De Seda
Los Gusanos De Sedabeasolo
 
SEPM Outsourcing
SEPM OutsourcingSEPM Outsourcing
SEPM Outsourcingasherad
 
Introduction to Steens Furniture
Introduction to Steens FurnitureIntroduction to Steens Furniture
Introduction to Steens FurnitureSteens Furniture
 
Kamila Ppt
Kamila PptKamila Ppt
Kamila Pptnagorego
 
Domain Name System
Domain Name SystemDomain Name System
Domain Name SystemJOYJOYJOYJOY
 
Hieu qua san xuat bap lai tren dat lua dbscl
Hieu qua san  xuat bap lai tren dat lua dbsclHieu qua san  xuat bap lai tren dat lua dbscl
Hieu qua san xuat bap lai tren dat lua dbsclHo Cao Viet
 

Andere mochten auch (20)

D.psicologia
D.psicologiaD.psicologia
D.psicologia
 
Blogging at SinauOnline - Open Social Learning
Blogging at SinauOnline - Open Social LearningBlogging at SinauOnline - Open Social Learning
Blogging at SinauOnline - Open Social Learning
 
Charter of demands -From the Resident's of Dwarka,Delhi Sub city
Charter of demands -From the Resident's of  Dwarka,Delhi Sub cityCharter of demands -From the Resident's of  Dwarka,Delhi Sub city
Charter of demands -From the Resident's of Dwarka,Delhi Sub city
 
Maherprofessional c vbio216
Maherprofessional c vbio216Maherprofessional c vbio216
Maherprofessional c vbio216
 
Brite zeynep 2012
Brite zeynep 2012Brite zeynep 2012
Brite zeynep 2012
 
Dwarka Forum Organizes Interaction with Candidates from Matiala Constituency
Dwarka Forum Organizes Interaction with Candidates from Matiala Constituency Dwarka Forum Organizes Interaction with Candidates from Matiala Constituency
Dwarka Forum Organizes Interaction with Candidates from Matiala Constituency
 
Presentatie carrieredagen 2011 linkedin
Presentatie carrieredagen 2011 linkedinPresentatie carrieredagen 2011 linkedin
Presentatie carrieredagen 2011 linkedin
 
Xay dung co so du lieu chi phi san xuat lua
Xay dung co so du lieu chi phi san xuat luaXay dung co so du lieu chi phi san xuat lua
Xay dung co so du lieu chi phi san xuat lua
 
C:\Documents And Settings\Lakshmi Menon\My Documents\Twinkling Stars
C:\Documents And Settings\Lakshmi Menon\My Documents\Twinkling StarsC:\Documents And Settings\Lakshmi Menon\My Documents\Twinkling Stars
C:\Documents And Settings\Lakshmi Menon\My Documents\Twinkling Stars
 
Observations made by the group (Edited)
Observations made by the group (Edited)Observations made by the group (Edited)
Observations made by the group (Edited)
 
Intro computer
Intro computerIntro computer
Intro computer
 
Los Gusanos De Seda
Los Gusanos De SedaLos Gusanos De Seda
Los Gusanos De Seda
 
SEPM Outsourcing
SEPM OutsourcingSEPM Outsourcing
SEPM Outsourcing
 
Introduction to Steens Furniture
Introduction to Steens FurnitureIntroduction to Steens Furniture
Introduction to Steens Furniture
 
Kamila Ppt
Kamila PptKamila Ppt
Kamila Ppt
 
Graduacion 8vo
Graduacion 8voGraduacion 8vo
Graduacion 8vo
 
Lec4
Lec4Lec4
Lec4
 
Domain Name System
Domain Name SystemDomain Name System
Domain Name System
 
Hieu qua san xuat bap lai tren dat lua dbscl
Hieu qua san  xuat bap lai tren dat lua dbsclHieu qua san  xuat bap lai tren dat lua dbscl
Hieu qua san xuat bap lai tren dat lua dbscl
 
Bx CRM
Bx CRMBx CRM
Bx CRM
 

Ähnlich wie Learning from Web Activity

Better Health through Social Networking
Better Health through Social NetworkingBetter Health through Social Networking
Better Health through Social NetworkingCharlene Chausis
 
Library 2.0: the Future of Libraries?
Library 2.0: the Future of Libraries?Library 2.0: the Future of Libraries?
Library 2.0: the Future of Libraries?Ben Ropp
 
J201 Firstdaylecture
J201 FirstdaylectureJ201 Firstdaylecture
J201 Firstdaylecturelinville
 
History & influencers of Digital & Social Media
History & influencers of Digital & Social MediaHistory & influencers of Digital & Social Media
History & influencers of Digital & Social MediaSusan Chesley Fant
 
Hei Leeds 2008 B
Hei Leeds 2008 BHei Leeds 2008 B
Hei Leeds 2008 BRay Poynter
 
Fake news and fact finding
Fake news and fact findingFake news and fact finding
Fake news and fact findingYumonomics
 
Social Media and You (for tweeners/teens)
Social Media and You (for tweeners/teens)Social Media and You (for tweeners/teens)
Social Media and You (for tweeners/teens)Anne Arendt
 
The Digital Museum
The Digital MuseumThe Digital Museum
The Digital MuseumMitch Maxson
 
What We Learned From Social Media
What We Learned From Social MediaWhat We Learned From Social Media
What We Learned From Social MediaAdrian Monck
 
Waterford (Dave Pattern)
Waterford (Dave Pattern)Waterford (Dave Pattern)
Waterford (Dave Pattern)daveyp
 
Bridging the Real and Virtual Worlds: The Next Evolution of Social and Mobile...
Bridging the Real and Virtual Worlds: The Next Evolution of Social and Mobile...Bridging the Real and Virtual Worlds: The Next Evolution of Social and Mobile...
Bridging the Real and Virtual Worlds: The Next Evolution of Social and Mobile...Georgiana Cohen
 
Ithaca College Essay Questions. Online assignment writing service.
Ithaca College Essay Questions. Online assignment writing service.Ithaca College Essay Questions. Online assignment writing service.
Ithaca College Essay Questions. Online assignment writing service.Julie Oden
 
Ltr2 - Gaming and Libraries
Ltr2 - Gaming and LibrariesLtr2 - Gaming and Libraries
Ltr2 - Gaming and Librariesryanoceros
 
Ltr Gaming And Libraries
Ltr Gaming And LibrariesLtr Gaming And Libraries
Ltr Gaming And LibrariesStargazerNJ
 
Amarillo College Creery TLC Library Instruction PowerPoint Fall 2015
Amarillo College Creery TLC Library Instruction PowerPoint Fall 2015Amarillo College Creery TLC Library Instruction PowerPoint Fall 2015
Amarillo College Creery TLC Library Instruction PowerPoint Fall 2015jkcomerford
 
Facebook- Beyond Friends, Family and FarmVille
Facebook- Beyond Friends, Family and FarmVilleFacebook- Beyond Friends, Family and FarmVille
Facebook- Beyond Friends, Family and FarmVilleSelena Garrison
 
Prof. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisProf. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisHendrik Speck
 

Ähnlich wie Learning from Web Activity (20)

Better Health through Social Networking
Better Health through Social NetworkingBetter Health through Social Networking
Better Health through Social Networking
 
Library 2.0: the Future of Libraries?
Library 2.0: the Future of Libraries?Library 2.0: the Future of Libraries?
Library 2.0: the Future of Libraries?
 
J201 Firstdaylecture
J201 FirstdaylectureJ201 Firstdaylecture
J201 Firstdaylecture
 
History & influencers of Digital & Social Media
History & influencers of Digital & Social MediaHistory & influencers of Digital & Social Media
History & influencers of Digital & Social Media
 
Hei Leeds 2008 B
Hei Leeds 2008 BHei Leeds 2008 B
Hei Leeds 2008 B
 
Fake news and fact finding
Fake news and fact findingFake news and fact finding
Fake news and fact finding
 
Social Media and You (for tweeners/teens)
Social Media and You (for tweeners/teens)Social Media and You (for tweeners/teens)
Social Media and You (for tweeners/teens)
 
The Digital Museum
The Digital MuseumThe Digital Museum
The Digital Museum
 
4 hstaff final
4 hstaff final4 hstaff final
4 hstaff final
 
What We Learned From Social Media
What We Learned From Social MediaWhat We Learned From Social Media
What We Learned From Social Media
 
Waterford (Dave Pattern)
Waterford (Dave Pattern)Waterford (Dave Pattern)
Waterford (Dave Pattern)
 
2007 open everything at gnomedex 4.4
2007 open everything at gnomedex 4.42007 open everything at gnomedex 4.4
2007 open everything at gnomedex 4.4
 
Bridging the Real and Virtual Worlds: The Next Evolution of Social and Mobile...
Bridging the Real and Virtual Worlds: The Next Evolution of Social and Mobile...Bridging the Real and Virtual Worlds: The Next Evolution of Social and Mobile...
Bridging the Real and Virtual Worlds: The Next Evolution of Social and Mobile...
 
Ithaca College Essay Questions. Online assignment writing service.
Ithaca College Essay Questions. Online assignment writing service.Ithaca College Essay Questions. Online assignment writing service.
Ithaca College Essay Questions. Online assignment writing service.
 
Ltr2 - Gaming and Libraries
Ltr2 - Gaming and LibrariesLtr2 - Gaming and Libraries
Ltr2 - Gaming and Libraries
 
Ltr Gaming And Libraries
Ltr Gaming And LibrariesLtr Gaming And Libraries
Ltr Gaming And Libraries
 
Amarillo College Creery TLC Library Instruction PowerPoint Fall 2015
Amarillo College Creery TLC Library Instruction PowerPoint Fall 2015Amarillo College Creery TLC Library Instruction PowerPoint Fall 2015
Amarillo College Creery TLC Library Instruction PowerPoint Fall 2015
 
AprilMay
AprilMayAprilMay
AprilMay
 
Facebook- Beyond Friends, Family and FarmVille
Facebook- Beyond Friends, Family and FarmVilleFacebook- Beyond Friends, Family and FarmVille
Facebook- Beyond Friends, Family and FarmVille
 
Prof. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisProf. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network Analysis
 

Mehr von jakehofman

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2jakehofman
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1jakehofman
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networksjakehofman
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classificationjakehofman
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationjakehofman
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1jakehofman
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in Rjakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overviewjakehofman
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systemsjakehofman
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayesjakehofman
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studiesjakehofman
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Sciencejakehofman
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classificationjakehofman
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regressionjakehofman
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experimentsjakehofman
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wranglingjakehofman
 

Mehr von jakehofman (20)

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networks
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classification
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalization
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in R
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overview
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systems
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studies
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classification
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regression
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experiments
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wrangling
 

Kürzlich hochgeladen

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Kürzlich hochgeladen (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Learning from Web Activity

  • 1. Learning from Web Activity Jake Hofman Yahoo! Research November 18, 2010 Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 1 / 33
  • 2. Outline 1 Agenda: Just enough philosophy 2 Case study: Demographic diversity on the Web 3 Conclusion: Lessons learned Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 2 / 33
  • 3. Agenda Size (only kind of) matters Big Data Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 3 / 33
  • 4. Agenda Size (only kind of) matters Big Data Lots of data means lots to learn (from) Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 3 / 33
  • 5. Agenda Size (only kind of) matters Big Data But the “big” part isn’t intrinsically interesting (although large sample sizes are always good) Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 3 / 33
  • 6. Agenda Size (only kind of) matters Big Data Regardless of size, it’s really about “data jeopardy” (To what question are these data the answer?) Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 3 / 33
  • 7. Agenda Tools Data tools: • Shell scripting & Python Munging, Glue • R Modeling, Visualization Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 4 / 33
  • 8. Agenda Tools Big Data tools: • Hadoop & Pig Filtering, Aggregating • Shell scripting & Python Munging, Glue • R Modeling, Visualization Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 4 / 33
  • 9. Agenda The clean real story “We have a habit in writing articles published in scientific journals to make the work as finished as possible, to cover all the tracks, to not worry about the blind alleys or to describe how you had the wrong idea first, and so on. So there isn’t any place to publish, in a dignified manner, what you actually did in order to get to do the work ...” -Richard Feynman Nobel Lecture1, 1965 1 http://bit.ly/feynmannobel Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 5 / 33
  • 10. Outline 1 Agenda: Just enough philosophy 2 Case study: Demographic diversity on the Web 3 Conclusion: Lessons learned Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 6 / 33
  • 11. Demographic diversity on the Web The clean story (covering our tracks) Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 7 / 33
  • 12. Demographic diversity on the Web with Irmak Sirer and Sharad Goel How diverse is the Web? To what extent do online experiences vary across demographic groups? Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 8 / 33
  • 13. Diversity of the Web Data • Representative sample of 265,000 individuals in the US, paid via the Nielsen MegaPanel2 • Log of anonymized, complete browsing activity from June 2009 through May 2010 (URLs viewed, timestamps, etc.) • Detailed individual and household demographic information (age, education, income, race, sex, etc.) 2 http://bit.ly/nielsenonline Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 9 / 33
  • 14. Diversity of the Web Data • Transform all demographic attributes to binary variables e.g., Age → Over/Under 25, Race → White/Non-White, Sex → Female/Male • Normalize pageviews to at most three domain levels, sans www e.g. www.yahoo.com → yahoo.com, us.mg2.mail.yahoo.com/neo/launch → mail.yahoo.com • Restrict to top 100k most popular sites • Aggregate activity at the site, group, and user levels Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 10 / 33
  • 15. Diversity of the Web Pig to the rescue Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 11 / 33
  • 16. Diversity of the Web Site-level skew How diverse are site audiences? • For each site and attribute, calculate the skew in visitors (e.g., 93% of pageviews on foxnews.com are by White users) • For each attribute, plot the distribution of visitor skew across all sites Proportion White Visitors Density 0.0 0.2 0.4 0.6 0.8 1.0 Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 12 / 33
  • 17. Diversity of the Web Site-level skew Proportion Female Visitors Density 0.0 0.2 0.4 0.6 0.8 1.0 Proportion White VisitorsDensity 0.0 0.2 0.4 0.6 0.8 1.0 Proportion College Educated Visitors Density 0.0 0.2 0.4 0.6 0.8 1.0 Proportion Adult Visitors Density 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Visitors With Household Incomes Under $50,000 Density 0.0 0.2 0.4 0.6 0.8 1.0 Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 13 / 33
  • 18. Diversity of the Web Site-level skew Many sites have skew close the average, but there also popular, highly-skewed sites Greater Than 90% Less Than 10% Female youravon.com collectionsetc.com coveritlive.com needlive.com White foxnews.com wunderground.com blackplanet.com mediatakeout.com College Educated news.google.com nytimes.com slumz.boxden.com sythe.com Over 25 Years Old mail.yahoo.com apps.facebook.com nanowrimo.org cbox.ws Household Income Under $50,000 scarleteen.com boards.adultswim.com opentable.com marketwatch.com Table 1: A selection of popular sites that are homogeneous along various demographic dimensions. ilyPer−CapitaPageviews 20 30 40 50 60 70 ! ! ! !Non−White Male Non−White Male No College Under 25 No College Under 25 White Female White FemaleCollege Over 25 College Over 25 visually apparent from Figure 5, there are significant differ- ences in how groups distribute their time on the web. These differences—which, as mentioned above, hold for highly fre- quented sites such as Facebook and YouTube—are in some cases even more pronounced for lower traffic sites. For in- stance, the gaming site pogo.com accounts for less than 1% of pageviews among both low and high income users, but low income users spend almost twice as much of their time there.Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 14 / 33
  • 19. Diversity of the Web Site-level skew Many sites have skew close the average, but there also popular, highly-skewed sites Greater Than 90% Less Than 10% Female youravon.com collectionsetc.com coveritlive.com needlive.com White foxnews.com wunderground.com blackplanet.com mediatakeout.com College Educated news.google.com nytimes.com slumz.boxden.com sythe.com Over 25 Years Old mail.yahoo.com apps.facebook.com nanowrimo.org cbox.ws Household Income Under $50,000 scarleteen.com boards.adultswim.com opentable.com marketwatch.com Table 1: A selection of popular sites that are homogeneous along various demographic dimensions. ilyPer−CapitaPageviews 20 30 40 50 60 70 ! ! ! !Non−White Male Non−White Male No College Under 25 No College Under 25 White Female White FemaleCollege Over 25 College Over 25 visually apparent from Figure 5, there are significant differ- ences in how groups distribute their time on the web. These differences—which, as mentioned above, hold for highly fre- quented sites such as Facebook and YouTube—are in some cases even more pronounced for lower traffic sites. For in- stance, the gaming site pogo.com accounts for less than 1% of pageviews among both low and high income users, but low income users spend almost twice as much of their time there. This skew persists even when we restrict attention to the top 10k or 1k sites Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 14 / 33
  • 20. Diversity of the Web Sites vs. ZIPs How do diversity of the online and offline worlds compare? Proportion Female Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs Proportion White Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs Proportion College Educated Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 15 / 33
  • 21. Diversity of the Web Sites vs. ZIPs How do diversity of the online and offline worlds compare? Proportion Female Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs Proportion White Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs Proportion College Educated Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs As expected, neighborhoods are more gender-balanced than sites Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 15 / 33
  • 22. Diversity of the Web Sites vs. ZIPs How do diversity of the online and offline worlds compare? Proportion Female Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs Proportion White Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs Proportion College Educated Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs But sites typically have more racially diverse audiences than neighborhoods have residents Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 15 / 33
  • 23. Diversity of the Web Sites vs. ZIPs How do diversity of the online and offline worlds compare? Proportion Female Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs Proportion White Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs Proportion College Educated Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs Skew by education is comparable, with online showing a bias towards higher education Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 15 / 33
  • 24. Diversity of the Web Group-level activity How does browsing activity vary at the group level? DailyPer−CapitaPageviews 0 10 20 30 40 50 60 70 q q q qNon−White Male Non−White Male No College Under 25 No College Under 25 White Female White FemaleCollege Over 25 College Over 25 Race Education Sex Age Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 16 / 33
  • 25. Diversity of the Web Group-level activity How does browsing activity vary at the group level? DailyPer−CapitaPageviews 0 10 20 30 40 50 60 70 q q q qNon−White Male Non−White Male No College Under 25 No College Under 25 White Female White FemaleCollege Over 25 College Over 25 Race Education Sex Age Large differences exist even at the aggregate level (e.g. women on average generate 40% more pageviews than men) Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 16 / 33
  • 26. Diversity of the Web Group-level activity All groups spend more than a third of their time on a handful of email, search, and social networking sites PercentofTotalTimeSpentonSite 0.1% 1% 10% facebook.com m ail.yahoo.com google.com apps.facebook.com m ail.google.com m ail.live.com youtube.com w ebm ail.aol.com m w fb.zynga.com channel.facebook.com view m orepics.m yspace.com search.yahoo.com m yspace.com m sn.com am azon.com shop.ebay.com yahoo.com im ages.google.com hom e.m yspace.com m ail.com cast.net bing.com w w w .yahoo.com cgi.ebay.com espn.go.com m essaging.m yspace.com tw itter.com cim .m eebo.com m y.ebay.com en.w ikipedia.org login.yahoo.com facebook.m afiawars.com m y.yahoo.com gam e3.pogo.com friends.m yspace.com tagged.com w orldw inner.com m eebo.com login.live.com m ypoints.com m aps.google.com aol.com pogo.com m w m s.zynga.com new s.yahoo.com w inster.com netflix.com fantasysports.yahoo.com search.aol.com com cast.net alotm etrics.com female male Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 17 / 33
  • 27. Diversity of the Web Group-level activity But different groups distribute their time differently, both on universally popular and on more niche sites PercentofTotalTimeSpentonSite 0.1% 1% 10% facebook.com m ail.yahoo.com google.com apps.facebook.com m ail.google.com m ail.live.com youtube.com w ebm ail.aol.com m w fb.zynga.com channel.facebook.com view m orepics.m yspace.com search.yahoo.com m yspace.com m sn.com am azon.com shop.ebay.com yahoo.com im ages.google.com hom e.m yspace.com m ail.com cast.net bing.com w w w .yahoo.com cgi.ebay.com espn.go.com m essaging.m yspace.com tw itter.com cim .m eebo.com m y.ebay.com en.w ikipedia.org login.yahoo.com facebook.m afiawars.com m y.yahoo.com gam e3.pogo.com friends.m yspace.com tagged.com w orldw inner.com m eebo.com login.live.com m ypoints.com m aps.google.com aol.com pogo.com m w m s.zynga.com new s.yahoo.com w inster.com netflix.com fantasysports.yahoo.com search.aol.com com cast.net alotm etrics.com female male Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 17 / 33
  • 28. Diversity of the Web Group-level activity But different groups distribute their time differently, both on universally popular and on more niche sites PercentofTotalTimeSpentonSite 0.1% 1% 10% facebook.com m ail.yahoo.com google.com apps.facebook.com m ail.google.com m ail.live.com youtube.com w ebm ail.aol.com m w fb.zynga.com channel.facebook.com view m orepics.m yspace.com search.yahoo.com m yspace.com m sn.com am azon.com shop.ebay.com yahoo.com im ages.google.com hom e.m yspace.com m ail.com cast.net bing.com w w w .yahoo.com cgi.ebay.com espn.go.com m essaging.m yspace.com tw itter.com cim .m eebo.com m y.ebay.com en.w ikipedia.org login.yahoo.com facebook.m afiawars.com m y.yahoo.com gam e3.pogo.com friends.m yspace.com tagged.com w orldw inner.com m eebo.com login.live.com m ypoints.com m aps.google.com aol.com pogo.com m w m s.zynga.com new s.yahoo.com w inster.com netflix.com fantasysports.yahoo.com search.aol.com com cast.net alotm etrics.com white non.white Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 17 / 33
  • 29. PercentofTotalTimeSpentonSite 0.1% 1% 10% 0.1% 1% 10% 0.1% 1% 10% 0.1% 1% 10% 0.1% 1% 10% facebook.com m ail.yahoo.com google .com apps.facebook.com m ail.google .com m ail.live.com youtube.com w ebm ail.aol.com m w fb.zynga.com channel.facebook.com vie w m orepic s.m yspace.com search.yahoo.com m yspace.com m sn.com am azon.com shop.ebay.com yahoo.com im ages.google .com hom e.m yspace.com m ail.com cast.net bin g.com w w w .yahoo.com cgi.ebay.com espn.go.com m essagin g.m yspace.com tw itter.com cim .m eebo.com m y.ebay.com en.w ik ip edia .org lo gin .yahoo.com facebook.m afia wars.com m y.yahoo.com gam e3.pogo.com frie nds.m yspace.com tagged.com w orld w in ner.com m eebo.com lo gin .live.com m ypoin ts.com m aps.google .comaol.com pogo.com m w m s.zynga.com new s.yahoo.com w in ster.com netflix.com fantasysports.yahoo.com search.aol.com com cast.net alo tm etric s.com AgeSexRaceEducationIncome Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 18 / 33
  • 30. Diversity of the Web Individual-level prediction How well can one predict an individual’s demographics from their browsing activity? • Represent each user by the set of sites visited • Fit linear models to predict majority/minority for each attribute on 80% of users • Tune model parameters using a 10% validation set • Evaluate final performance on held-out 10% test set Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 19 / 33
  • 31. Diversity of the Web GNU-fu Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 20 / 33
  • 32. Diversity of the Web Individual-level prediction • Reasonable (∼70-85%) accuracy and AUC across all attributes • Similar performance even when restricted to top 1k sites • Can achieve substantially better performance when restricted to “stereotypical” users (∼80-90%) College/No College Under/Over $50,000 Household Income White/Non−White Female/Male Over/Under 25 Years Old AUC q q q q q .5 .6 .7 .8 .9 1 Accuracy q q q q q .5 .6 .7 .8 .9 1 Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 21 / 33
  • 33. Diversity of the Web Individual-level prediction Highly-weighted sites under the fitted models Large positive weight Large negative weight Female winster.com lancome-usa.com sports.yahoo.com espn.go.com White marlboro.com cmt.com mediatakeout.com bet.com College Educated news.yahoo.com linkedin.com youtube.com myspace.com Over 25 Years Old evite.com classmates.com addictinggames.com youtube.com Household Income Under $50,000 eharmony.com tracfone.com rownine.com matrixdirect.com Table 2: A selection of the most predictive (i.e., most highly weighted) sites for each classification task. College/No College Under/Over $50,000 Household Income White/Non−White Female/Male Over/Under 25 Years Old AUC ! ! ! ! ! Accuracy ! ! ! ! ! Figure 7, a measure that effectively re-normalizes the ma- jority and minority classes to have equal size. Intuitively, AUC is the probability that a model scores a randomly se- lected positive example higher than a randomly selected neg- ative one (e.g., the probability that the model correctly dis- tinguishes between a randomly selected female and male). Though an uninformative rule would correctly discriminateJake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 22 / 33
  • 34. Diversity of the Web Individual-level prediction Proof of concept browser demo Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 23 / 33
  • 35. Diversity of the Web Individual-level prediction Proof of concept browser demo Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 23 / 33
  • 36. Diversity of the Web The real story (what we actually did) Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 24 / 33
  • 37. Diversity on the Web The real story • Got several hundred GBs of MegaPanel data from Nielsen3 3 Special thanks to Mainak Mazumdar Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 25 / 33
  • 38. Diversity on the Web The real story • Got several hundred GBs of MegaPanel data from Nielsen3 • Discussed possible projects • Predict user demographics (e.g. real-valued age) from a few minutes of browsing activity for ad-targeting? • Infer the number of individuals using the same browser or behind the same ip? • Determine number of actual uniques advertisers are receiving? • . . . 3 Special thanks to Mainak Mazumdar Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 25 / 33
  • 39. Diversity on the Web The real story (cont’d) • Started with predicting real-valued age • Worked on this for an embarassingly long time (various methods, feature selection, etc.) • Turns out to be difficult to do better than within 10 years of true age, on average • Settled for classification on binary outcomes (e.g., adult/non-adult) over entire history • Classification worked reasonably well for age and other attributes Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 26 / 33
  • 40. Diversity on the Web The real story (cont’d) • Became curious about why classification worked well compared to regression • Generated descriptive statistics across all attributes at the site and group levels • Compared site statistics to ZIP code data from the US Census • Compared time distribution across groups • Realized that we now had the largest comprehensive study of demographic diversity on the web Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 27 / 33