2. “...a mysterious El Dorado of content
wealth, rich beyond measure when
compared to ordinary fare delivered by
search engines such as Google.”
Spencer, B. (2007). Harnessing the deep web: A practical plan for locating free specialty databases on the web.
Reference Services Review, 35(1), 71-83. Retrieved from www.scopus.com
3. Search Engines
● Web Crawlers
● Linked Data
● Indexing
What search engines do you use?
What are they good at finding?
4. Static vs. Dynamic Web Pages
● Dimensionality
○ One dimensional (static)
■ https://en.wikipedia.org/wiki/Metadata
○ Two dimensional (dynamic)
■ Ideas?
As of 2013, the ability to effectively index
dynamic pages remains an unsolved problem.
5. Problems with Search Engines
● Obstacles to effective indexing of dynamic
pages:
○ Dimensionality
○ No links
○ Paywalls
○ Scripting
6. Why care about the Deep Web?
“Not only its estimated size is hundreds of
times larger than than of the surface web, but
also it provides users with high quality
information.”
Zheng, Q., Wu, Z., Cheng, X., Jiang, L., & Liu, J. (2013). Learning to crawl deep
web. Information Systems, 38(6), 801-819. doi:10.1016/j.is.2013.02.001
How much larger is the Deep Web than the
Surface Web?
7. Surface Web = 19 terabytes
Deep Web = 7,500 terabytes