SlideShare a Scribd company logo
1 of 61
Archiving the Mobile
Web
Frank McCown, Monica Yarbrough, &
Keith Enlow
Computer Science Dept
Harding University
WADL 2013
Indianapolis, IN
July 25, 2013
Mobile vs. Stationary Web
Mobile Web-Related Markup
Languages
http://en.wikipedia.org/wiki/File:Mobile_Web_Standards_Evolution_Vector.svg
Smartphone era
Two Types of Mobile Web
Feature Phone Web Smartphone Web
cHTML (iMode), WML,
WAP, etc.
XHTML, HTML5, etc.
Serving Up Mobile Sites
1. Responsive web design
• Same HTML content to desktop and mobile
• CSS media queries alter appearance
<!-- CSS media query on a link element -->
<link rel="stylesheet" media="(max-width: 800px)" href="example.css" />
<!-- CSS media query within a style sheet -->
<style>
@media (max-width: 600px) {
.sidebar { display: none; }
}
</style>
Example of Responsive Web
Design
Serving Up Mobile Sites
1. Responsive web design
• Same HTML content to desktop and mobile
• CSS media queries alter appearance
2. Redirect mobile user agent to mobile site
• Client-side redirection
• Server-side redirection
Client-Side Redirection
• JavaScript detects mobile user agent
// From www.harding.edu
var ua = navigator.userAgent.toLowerCase();
if (queryString.match('version=mobile') ||
ua.match(/IEMobile|Windows CE|NetFront|PlayStation|like Mac OS
Z|MIDP|UP.Browser|Symbian|
Nintendo|BlackBerry|mobile/i)) {
if (!ua.match('ipad')) {
if (window.location.pathname.match('.html'))
window.location = window.location.pathname.replace('.html', '.m.html');
else
window.location = window.location.pathname + 'index.m.html';
}
}
Client-Side Redirection
Server-Side Redirection
• Server routes mobile user agent to different page
Apache Example:
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT}
(android|bbd+|meego).+mobile|avantgo|badda/|blackberry|blazer|etc…|zte-) [NC]
RewriteRule ^$ http://detectmobilebrowser.com/mobile [R,L]
https://developers.google.com/webmasters/smartphone-sites/details
Server-Side Redirection
Serving Up Mobile Sites
1. Responsive web design
• Same HTML content to desktop and mobile
• CSS media queries alter appearance
2. Redirect mobile user agent to mobile site
• Client-side redirection
• Server-side redirection
3. User-agent content negotiation
• Dynamically serving different HTML for the same URL
User-Agent Content
Negotiation
• Server serves up different content
for same URL
• Use Vary: User-Agent
header in response
• Best method for serving content
quickly
Archiving Mobile Sites
1. Responsive web design
• Easy: Crawl like normal
• Use client tools to view page formatted for mobile
2. Redirect mobile user agent to mobile site
• Need to crawl with mobile user agent
• Need JavaScript-enabled crawler to handle client-side
redirection
3. User-agent content negotiation
• Need to crawl with mobile user agent
• Need to distinguish mobile vs. desktop for same URL
How are we doing
archiving mobile sites so
far?
Earliest
archived
page
Earliest 2007 archived
page: WML
Finally some
news!
Really???
Great…
Only desktop
version is
archived!
Mobile Finder
By Monica Yarbrough
Google’s Suggestions for SEO
• Vary HTTP Header
• Annotations within the HTML:
• On desktop page:
• <link rel=“alternate” media=“only screen and (max-width:
640px)” href=“http://m.example.com/page-1” >
• On mobile page:
• <link rel=“canonical” href=“http://www.example.com/page-1”
>
• Media queries
https://developers.google.com/webmasters/smartphone-sites/
How Mobile Finder Works
• Use both desktop and mobile useragents
• Look for:
• Redirect
• Different content
• Different stylesheets
• Media queries
How Mobile Finder Works
• Change the url to fit common mobile url patterns
ex: www.t-mobile.com m.t-mobile.com
PhantomJs
• Headless WebKit (browser)
• Well-known and widely used
• Used to get the content of a page
• Takes snapshots of the sites it visits
• Scriptable with coffeescript or javascript
Web Service
• Query string with 2 parameters
• url (required)
• useragent (optional)
• http://cs.harding.edu/mobilefinder/service.php?url=URL&u
seragent=USER_AGENT
• Default useragent = Mozilla/5.0 (iPhone; U; CPU iPhone OS
4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like
Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7
(compatible; mediaqueries/1.0; +http://cs.harding.edu)
Results
<MobileFinder>
<url>http://www.cnn.com/</url>
<mobileUrl>http://www.cnn.com/</mobileUrl>
<reason>
<code>400</code>
<message>differing content</message>
</reason>
<useragent> Mozilla/5.0 (Android; Linux armv7l; rv:9.0)
Gecko/20111216 Firefox/9.0 Fennec/9.0</useragent>
<timeAccessed>2013-07-20 15:23:42</timeAccessed>
<error/>
<MobileFinder/>
Limitations
• Crashing
• Inconsistent results
• Problems executing javascript redirection
• Falsely fails when it actually gets the content
• Fails to get url of page accessed
• Slow
Limitations
• Client-side Redirects
www.golferen.no/wip4/ (right)
www.ng.kz/ (below)
Analysis Results
• Accuracy (of 100 random hand-checked results)
• 96 % accurate overall
• 1 % inaccurately record not found when there is in fact a
mobile version
• 3 % inaccurately say mobile found when there is not a
mobile version
Nytimes desktop
vs mobile
Rakuten.co.jp
desktop vs mobile
Are Google’s Suggestions
Used?
• 28 % found a mobile version following Google’s
suggestions
• 85 % found as having some sort of mobile version
Are Google’s Suggestions
Used?
• 28 % found following Google’s suggestions
• Of the 82% that were found as not following the
rules:
• 93% missing vary HTTP header
• 89% missing alternate and canonical links
Are Google’s Suggestions
Used?
• 28 % found following Google’s suggestions
• 85 % found as having some sort of mobile version
• Redirect: 35%
• “Significantly” different content: 28%
• Stylesheets alone: 9%
• Stylesheets and media queries: 11%
• Media queries alone: 6%
• Differing urls (trial and error): 11%
End Result
• As a whole, mobile web pages do not adhere to
Google’s standards
• There are no truly consistent ways for finding a
mobile version of a site
Keith Enlow
Heritrix Mobile
Introduction
• Heritrix 3.1
• Mobile Finder Web Service
• 2 Options
• Crawl desktop web pages (default)
• Crawl mobile web pages using Mobile finder and
exclude mobile web pages that use media queries.
Experiment
• Decision Making Heritrix
• Web Service (Mobile Finder) Heritrix
• Modified Heritrix 3.1 to include two options for crawling
• Option 0: Crawl with desktop user agent
• Option 1: Crawl with mobile user agent using Mobile Finder
• Added built in mobile user agent adapted from Google Bot
• Crawled a small set of URLs
• Used Mobile Finder to find if the given URL has mobile
version
• Wrote a small script to discover differences between the
mobile and desktop versions
<property name="userAgentTemplate"
value="Mozilla/5.0 (compatible; heritrix/@VERISON@+
@OPERATOR_CONTACT_URL@)"/>
<property name="userAgentTemplateMobile"
value="Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us)
AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117
Safari/6531.22.7 (compatible; heritrix/@VERSION@+
@OPERATOR_CONTACT_URL@"/>
<!-- Option # = Description
0 [Default] Crawl using desktop user agent
1 Crawl using mobile user agent + Mobile Finder Web Service --
>
<property name="CrawlOption" value="0" />
URLs Crawled
Desktop URL Mobile URL
• www.huffingtonpost.com
• www.foxnews.com
• www.nbcnews.com
• www.whitehouse.gov
• www.nasa.gov
• www.ssa.gov
• www.cornell.edu
• www.stanford.edu
• www.mit.edu
• m.huffpost.com
• foxnews.mobi
• www.nbcnews.com
• m.whitehouse.gov
• mobile.nasa.gov
• www.ssa.gov/mobile
• m.cornell.edu/#home
• m.stanford.edu
• m.mit.edu /
mobile.mit.edu
Redirection/Delivery
• 200 Response (server side redirect)
• 302 “Temporary” relocation
• 301 “Permanent” relocation
• JavaScript Redirection (client side redirect)
• Media Queries
• Style Sheets
Tiny Limits
• No JavaScript Engine
• Heritrix is unable to perform and execute JavaScript
code
• Unable to catch client side redirection and will instead
continue to crawl the desktop version of the web page.
Note: The Mobile Finder Web Service will find the mobile page and therefore Heritrix will
continue the crawl.
• www.nasa.gov
• www.ssa.gov
• www.cornell.edu
Hufington Fox News NBC News NASA SSA White House Stanford Cornell MIT
56774 12703 8894 4960 2380 8121 2351 2901 120
2134 110 3545 63 53 570 116 94 124
Total Link Count
HTML Distribution
Huffington Fox News NBC News NASA SSA White House Stanford Cornell MIT
11550 2681 2302 851 20 3251 385 596 12
493 35 488 18 0 76 16 31 26
JavaScript Distribution
Huffington Fox News NBC News NASA SSA White House Stanford Cornell MIT
245 107 46 589 12 83 104 525 2
33 4 14 8 0 13 4 8 0
CSS Distribution
Huffington Fox News NBC News NASA SSA White House Stanford Cornell MIT
587 301 72 304 1 154 214 86 3
36 3 17 1 0 19 8 4 3
Image Distribution
Huffington Fox News NBC NASA SSA
White
House Stanford Cornell MIT
38671 8893 5852 2908 17 4187 1460 1484 87
1227 59 2769 28 0 436 74 4 89
Acknowledgements
• Internet Archive aided in Mobile Finder work
• Funded by NSF grant 1008492

More Related Content

Similar to Archiving the Mobile Web

Best Practices for Mobile Sites
Best Practices for Mobile SitesBest Practices for Mobile Sites
Best Practices for Mobile SitesNavneet Kaushal
 
The Magic of Mobile SEO - ##CZLNY
The Magic of Mobile SEO - ##CZLNYThe Magic of Mobile SEO - ##CZLNY
The Magic of Mobile SEO - ##CZLNYJohn Shehata
 
Mobile SEO (English Version)
Mobile SEO (English Version)Mobile SEO (English Version)
Mobile SEO (English Version)ssuserd60633
 
Survey of Mobile
Survey of MobileSurvey of Mobile
Survey of Mobileamyhannah84
 
Mobile Web for Libraries
Mobile Web for LibrariesMobile Web for Libraries
Mobile Web for Librariesamyhannah84
 
C3 2014 Main Stage John Shehata
C3 2014 Main Stage John ShehataC3 2014 Main Stage John Shehata
C3 2014 Main Stage John ShehataConductor
 
Why Mobile, Why Now? | John Shehata – Executive Director of Search & Social M...
Why Mobile, Why Now? | John Shehata – Executive Director of Search & Social M...Why Mobile, Why Now? | John Shehata – Executive Director of Search & Social M...
Why Mobile, Why Now? | John Shehata – Executive Director of Search & Social M...Conductor
 
Advanced Mobile SEO - John Shehata - C3 2014
Advanced Mobile SEO - John Shehata - C3 2014Advanced Mobile SEO - John Shehata - C3 2014
Advanced Mobile SEO - John Shehata - C3 2014John Shehata
 
Responsive Web Design - Tom Robertshaw
Responsive Web Design - Tom RobertshawResponsive Web Design - Tom Robertshaw
Responsive Web Design - Tom RobertshawMeet Magento Spain
 
The Death of the Desktop: The Future For Mobile SEO
The Death of the Desktop: The Future For Mobile SEOThe Death of the Desktop: The Future For Mobile SEO
The Death of the Desktop: The Future For Mobile SEORegan McGregor
 
JQuery mobile
JQuery mobileJQuery mobile
JQuery mobileGary Yeh
 
Multi screen HTML5
Multi screen HTML5Multi screen HTML5
Multi screen HTML5Ron Reiter
 
Mobile Website or Responsive Design? The Answer is NEITHER.
Mobile Website or Responsive Design? The Answer is NEITHER.Mobile Website or Responsive Design? The Answer is NEITHER.
Mobile Website or Responsive Design? The Answer is NEITHER.TWG
 
Should I Build a Separate Mobile Site or a Responsive Site? Neither! with Der...
Should I Build a Separate Mobile Site or a Responsive Site? Neither! with Der...Should I Build a Separate Mobile Site or a Responsive Site? Neither! with Der...
Should I Build a Separate Mobile Site or a Responsive Site? Neither! with Der...FITC
 
Web 2.0 & 3.0 technologies & SoLoMo
Web 2.0 & 3.0 technologies & SoLoMoWeb 2.0 & 3.0 technologies & SoLoMo
Web 2.0 & 3.0 technologies & SoLoMoJake Aull
 
SMX London 2014 - Best Practices for Mobile SEO - Shawn Dragann
SMX London 2014 - Best Practices for Mobile SEO - Shawn DragannSMX London 2014 - Best Practices for Mobile SEO - Shawn Dragann
SMX London 2014 - Best Practices for Mobile SEO - Shawn DragannIdea Evolver
 
Mobile SEO Best Practices & Tips
Mobile SEO Best Practices & TipsMobile SEO Best Practices & Tips
Mobile SEO Best Practices & TipsNavneet Kaushal
 

Similar to Archiving the Mobile Web (20)

Best Practices for Mobile Sites
Best Practices for Mobile SitesBest Practices for Mobile Sites
Best Practices for Mobile Sites
 
Barry mobile-seo
Barry mobile-seoBarry mobile-seo
Barry mobile-seo
 
Going mobile
Going mobileGoing mobile
Going mobile
 
The Magic of Mobile SEO - ##CZLNY
The Magic of Mobile SEO - ##CZLNYThe Magic of Mobile SEO - ##CZLNY
The Magic of Mobile SEO - ##CZLNY
 
Mobile SEO (English Version)
Mobile SEO (English Version)Mobile SEO (English Version)
Mobile SEO (English Version)
 
Survey of Mobile
Survey of MobileSurvey of Mobile
Survey of Mobile
 
Mobile Web for Libraries
Mobile Web for LibrariesMobile Web for Libraries
Mobile Web for Libraries
 
C3 2014 Main Stage John Shehata
C3 2014 Main Stage John ShehataC3 2014 Main Stage John Shehata
C3 2014 Main Stage John Shehata
 
Why Mobile, Why Now? | John Shehata – Executive Director of Search & Social M...
Why Mobile, Why Now? | John Shehata – Executive Director of Search & Social M...Why Mobile, Why Now? | John Shehata – Executive Director of Search & Social M...
Why Mobile, Why Now? | John Shehata – Executive Director of Search & Social M...
 
Advanced Mobile SEO - John Shehata - C3 2014
Advanced Mobile SEO - John Shehata - C3 2014Advanced Mobile SEO - John Shehata - C3 2014
Advanced Mobile SEO - John Shehata - C3 2014
 
Responsive Web Design - Tom Robertshaw
Responsive Web Design - Tom RobertshawResponsive Web Design - Tom Robertshaw
Responsive Web Design - Tom Robertshaw
 
The Death of the Desktop: The Future For Mobile SEO
The Death of the Desktop: The Future For Mobile SEOThe Death of the Desktop: The Future For Mobile SEO
The Death of the Desktop: The Future For Mobile SEO
 
JQuery mobile
JQuery mobileJQuery mobile
JQuery mobile
 
Multi screen HTML5
Multi screen HTML5Multi screen HTML5
Multi screen HTML5
 
Mobile Website or Responsive Design? The Answer is NEITHER.
Mobile Website or Responsive Design? The Answer is NEITHER.Mobile Website or Responsive Design? The Answer is NEITHER.
Mobile Website or Responsive Design? The Answer is NEITHER.
 
Should I Build a Separate Mobile Site or a Responsive Site? Neither! with Der...
Should I Build a Separate Mobile Site or a Responsive Site? Neither! with Der...Should I Build a Separate Mobile Site or a Responsive Site? Neither! with Der...
Should I Build a Separate Mobile Site or a Responsive Site? Neither! with Der...
 
Web 2.0 & 3.0 technologies & SoLoMo
Web 2.0 & 3.0 technologies & SoLoMoWeb 2.0 & 3.0 technologies & SoLoMo
Web 2.0 & 3.0 technologies & SoLoMo
 
Power Mobile Apps with Sitecore
Power Mobile Apps with SitecorePower Mobile Apps with Sitecore
Power Mobile Apps with Sitecore
 
SMX London 2014 - Best Practices for Mobile SEO - Shawn Dragann
SMX London 2014 - Best Practices for Mobile SEO - Shawn DragannSMX London 2014 - Best Practices for Mobile SEO - Shawn Dragann
SMX London 2014 - Best Practices for Mobile SEO - Shawn Dragann
 
Mobile SEO Best Practices & Tips
Mobile SEO Best Practices & TipsMobile SEO Best Practices & Tips
Mobile SEO Best Practices & Tips
 

Recently uploaded

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

Archiving the Mobile Web

  • 1. Archiving the Mobile Web Frank McCown, Monica Yarbrough, & Keith Enlow Computer Science Dept Harding University WADL 2013 Indianapolis, IN July 25, 2013
  • 4. Two Types of Mobile Web Feature Phone Web Smartphone Web cHTML (iMode), WML, WAP, etc. XHTML, HTML5, etc.
  • 5.
  • 6. Serving Up Mobile Sites 1. Responsive web design • Same HTML content to desktop and mobile • CSS media queries alter appearance <!-- CSS media query on a link element --> <link rel="stylesheet" media="(max-width: 800px)" href="example.css" /> <!-- CSS media query within a style sheet --> <style> @media (max-width: 600px) { .sidebar { display: none; } } </style>
  • 8. Serving Up Mobile Sites 1. Responsive web design • Same HTML content to desktop and mobile • CSS media queries alter appearance 2. Redirect mobile user agent to mobile site • Client-side redirection • Server-side redirection
  • 9. Client-Side Redirection • JavaScript detects mobile user agent // From www.harding.edu var ua = navigator.userAgent.toLowerCase(); if (queryString.match('version=mobile') || ua.match(/IEMobile|Windows CE|NetFront|PlayStation|like Mac OS Z|MIDP|UP.Browser|Symbian| Nintendo|BlackBerry|mobile/i)) { if (!ua.match('ipad')) { if (window.location.pathname.match('.html')) window.location = window.location.pathname.replace('.html', '.m.html'); else window.location = window.location.pathname + 'index.m.html'; } }
  • 11. Server-Side Redirection • Server routes mobile user agent to different page Apache Example: RewriteEngine On RewriteBase / RewriteCond %{HTTP_USER_AGENT} (android|bbd+|meego).+mobile|avantgo|badda/|blackberry|blazer|etc…|zte-) [NC] RewriteRule ^$ http://detectmobilebrowser.com/mobile [R,L] https://developers.google.com/webmasters/smartphone-sites/details
  • 13. Serving Up Mobile Sites 1. Responsive web design • Same HTML content to desktop and mobile • CSS media queries alter appearance 2. Redirect mobile user agent to mobile site • Client-side redirection • Server-side redirection 3. User-agent content negotiation • Dynamically serving different HTML for the same URL
  • 14. User-Agent Content Negotiation • Server serves up different content for same URL • Use Vary: User-Agent header in response • Best method for serving content quickly
  • 15. Archiving Mobile Sites 1. Responsive web design • Easy: Crawl like normal • Use client tools to view page formatted for mobile 2. Redirect mobile user agent to mobile site • Need to crawl with mobile user agent • Need JavaScript-enabled crawler to handle client-side redirection 3. User-agent content negotiation • Need to crawl with mobile user agent • Need to distinguish mobile vs. desktop for same URL
  • 16. How are we doing archiving mobile sites so far?
  • 17.
  • 25. Google’s Suggestions for SEO • Vary HTTP Header • Annotations within the HTML: • On desktop page: • <link rel=“alternate” media=“only screen and (max-width: 640px)” href=“http://m.example.com/page-1” > • On mobile page: • <link rel=“canonical” href=“http://www.example.com/page-1” > • Media queries https://developers.google.com/webmasters/smartphone-sites/
  • 26. How Mobile Finder Works • Use both desktop and mobile useragents • Look for: • Redirect • Different content • Different stylesheets • Media queries
  • 27. How Mobile Finder Works • Change the url to fit common mobile url patterns ex: www.t-mobile.com m.t-mobile.com
  • 28. PhantomJs • Headless WebKit (browser) • Well-known and widely used • Used to get the content of a page • Takes snapshots of the sites it visits • Scriptable with coffeescript or javascript
  • 29. Web Service • Query string with 2 parameters • url (required) • useragent (optional) • http://cs.harding.edu/mobilefinder/service.php?url=URL&u seragent=USER_AGENT • Default useragent = Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; mediaqueries/1.0; +http://cs.harding.edu)
  • 30. Results <MobileFinder> <url>http://www.cnn.com/</url> <mobileUrl>http://www.cnn.com/</mobileUrl> <reason> <code>400</code> <message>differing content</message> </reason> <useragent> Mozilla/5.0 (Android; Linux armv7l; rv:9.0) Gecko/20111216 Firefox/9.0 Fennec/9.0</useragent> <timeAccessed>2013-07-20 15:23:42</timeAccessed> <error/> <MobileFinder/>
  • 31. Limitations • Crashing • Inconsistent results • Problems executing javascript redirection • Falsely fails when it actually gets the content • Fails to get url of page accessed • Slow
  • 33. Analysis Results • Accuracy (of 100 random hand-checked results) • 96 % accurate overall • 1 % inaccurately record not found when there is in fact a mobile version • 3 % inaccurately say mobile found when there is not a mobile version
  • 36. Are Google’s Suggestions Used? • 28 % found a mobile version following Google’s suggestions • 85 % found as having some sort of mobile version
  • 37. Are Google’s Suggestions Used? • 28 % found following Google’s suggestions • Of the 82% that were found as not following the rules: • 93% missing vary HTTP header • 89% missing alternate and canonical links
  • 38. Are Google’s Suggestions Used? • 28 % found following Google’s suggestions • 85 % found as having some sort of mobile version • Redirect: 35% • “Significantly” different content: 28% • Stylesheets alone: 9% • Stylesheets and media queries: 11% • Media queries alone: 6% • Differing urls (trial and error): 11%
  • 39. End Result • As a whole, mobile web pages do not adhere to Google’s standards • There are no truly consistent ways for finding a mobile version of a site
  • 41. Introduction • Heritrix 3.1 • Mobile Finder Web Service • 2 Options • Crawl desktop web pages (default) • Crawl mobile web pages using Mobile finder and exclude mobile web pages that use media queries.
  • 42. Experiment • Decision Making Heritrix • Web Service (Mobile Finder) Heritrix • Modified Heritrix 3.1 to include two options for crawling • Option 0: Crawl with desktop user agent • Option 1: Crawl with mobile user agent using Mobile Finder • Added built in mobile user agent adapted from Google Bot • Crawled a small set of URLs • Used Mobile Finder to find if the given URL has mobile version • Wrote a small script to discover differences between the mobile and desktop versions
  • 43. <property name="userAgentTemplate" value="Mozilla/5.0 (compatible; heritrix/@VERISON@+ @OPERATOR_CONTACT_URL@)"/> <property name="userAgentTemplateMobile" value="Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; heritrix/@VERSION@+ @OPERATOR_CONTACT_URL@"/> <!-- Option # = Description 0 [Default] Crawl using desktop user agent 1 Crawl using mobile user agent + Mobile Finder Web Service -- > <property name="CrawlOption" value="0" />
  • 44. URLs Crawled Desktop URL Mobile URL • www.huffingtonpost.com • www.foxnews.com • www.nbcnews.com • www.whitehouse.gov • www.nasa.gov • www.ssa.gov • www.cornell.edu • www.stanford.edu • www.mit.edu • m.huffpost.com • foxnews.mobi • www.nbcnews.com • m.whitehouse.gov • mobile.nasa.gov • www.ssa.gov/mobile • m.cornell.edu/#home • m.stanford.edu • m.mit.edu / mobile.mit.edu
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54. Redirection/Delivery • 200 Response (server side redirect) • 302 “Temporary” relocation • 301 “Permanent” relocation • JavaScript Redirection (client side redirect) • Media Queries • Style Sheets
  • 55. Tiny Limits • No JavaScript Engine • Heritrix is unable to perform and execute JavaScript code • Unable to catch client side redirection and will instead continue to crawl the desktop version of the web page. Note: The Mobile Finder Web Service will find the mobile page and therefore Heritrix will continue the crawl. • www.nasa.gov • www.ssa.gov • www.cornell.edu
  • 56. Hufington Fox News NBC News NASA SSA White House Stanford Cornell MIT 56774 12703 8894 4960 2380 8121 2351 2901 120 2134 110 3545 63 53 570 116 94 124 Total Link Count
  • 57. HTML Distribution Huffington Fox News NBC News NASA SSA White House Stanford Cornell MIT 11550 2681 2302 851 20 3251 385 596 12 493 35 488 18 0 76 16 31 26
  • 58. JavaScript Distribution Huffington Fox News NBC News NASA SSA White House Stanford Cornell MIT 245 107 46 589 12 83 104 525 2 33 4 14 8 0 13 4 8 0
  • 59. CSS Distribution Huffington Fox News NBC News NASA SSA White House Stanford Cornell MIT 587 301 72 304 1 154 214 86 3 36 3 17 1 0 19 8 4 3
  • 60. Image Distribution Huffington Fox News NBC NASA SSA White House Stanford Cornell MIT 38671 8893 5852 2908 17 4187 1460 1484 87 1227 59 2769 28 0 436 74 4 89
  • 61. Acknowledgements • Internet Archive aided in Mobile Finder work • Funded by NSF grant 1008492

Editor's Notes

  1. iPhone introduced in the United States on June 29, 2007