This document discusses understanding websites and their structure. It explains that a website is hosted on a web server and accessible online through a URL. It notes the importance of websites for internet marketing, business presence, and local search. The document then describes website structure including layout templates, URL patterns, and linkage between pages. It provides examples of template types and how URL patterns are related to templates. Finally, it discusses analyzing a website's structure through random page sampling and modeling the templates, patterns, and links between pages.
2. Trainings by Vidya Bhagwat
• Websites :
A website is hosted on at least one web server,
accessible via a network such as the Internet or a private local
area network through an Internet address known as a
Uniform resource locator. All publicly accessible websites
collectively constitute the.
3. Importance Of Websites:
• Internet marketing comes
of age
• Internet marketing is now a
major, multi-billion dollar
industry.
• Despite some concerns,
many consumers now have
the skills and the
confidence to transact
purchases using the web.
4. • An Internet "presence" has now become essential .
• A modern, well presented website is now expected for most
businesses and organizations.
• A website should explain the products and services offered. It
should also provide background and general contact
information.
Trainings by Vidya Bhagwat
5. • Local business is affected as well
• Many small business operators have been disappointed with the
results achieved by their websites.
• Sites have been created but few if any business has resulted.
• There are a number of reasons:
• unrealistic expectations;
• poor website construction (not search engine friendly);
• poor targeting.
• Local search is growing in importance. Local search is the ability to
search for and find businesses and organizations in the local area, that
is, in close proximity geographically.
Trainings by Vidya Bhagwat
• This will vary from business to business.
6. Website Structure Understanding :
• Website Structure Understanding and its Applications.
• Website structure understanding can be treated as a reverse
engineering for the purpose of automatically discovering the
layout templates and URL patterns of a website, and
understanding how these templates and patterns are integrated
to organize the website. The study of this problem has had a
great impact to many applications which can leverage such site-level
knowledge to help web search and data mining.
Trainings by Vidya Bhagwat
8. • What’s Website Structure?
In this project, the website structure consists of
three components: layout templates, URL patterns, and
linkage structure.
Trainings by Vidya Bhagwat
• Layout Template:
Most web pages consist of HTML elements like
table, menu, button, image, and input box. The layout of a
web page describes what HTML elements are included in the
page, as well as how these elements are visually distributed in
page rendering. Essentially, a page layout is represented by a
so called DOM (Document Object Model) tree. In this project,
a layout template is considered as a group of pages which
have very similar layouts (DOM trees).
9. • In a website, pages are generated based on distinguishable
templates according to their functions. That is to say, visually
similar pages usually have same function. In this way, user can
easily identify a page’s function at a glance. Following are
several typical layout templates identified from the ASP.NET
Forums. Their functions are to show a) a list of discussion
thread, b) a list of thread posts, and c) user profile,
respectively.
Trainings by Vidya Bhagwat
10. • It is noticed that one layout templates can have more than
one related URL pattern. For example, a bookseller website
usually designs one template to show a list of
books, and provides different query parameters to generate
such a list. Various query parameters in this scenario will lead
to different URL patterns, but the search results are shown
with the same template. Another common case is duplicate
pages, i.e., pages with the same content (and very likely the
same layout) but different URLs.
Trainings by Vidya Bhagwat
11. Trainings by Vidya Bhagwat
• Link Structure :
Based on the layout templates and URL patterns,
we can construct a directed graph to represent the website
organization structure. That is, each layout template is
considered as a node in a graph, and two nodes are linked if
there are hyperlinks between the pages belonging to the two
nodes. The link direction is the same as the related
hyperlinks. And each link is characterized with the URL pattern
of the corresponding hyperlink URLs. Again, it should be
noticed that there could be multiple links from one node to
another if the corresponding hyperlinks have more than one
URL pattern.
• Fig. 2 gives an illustrative example of the sub-graph
constructed based on the layout templates and URL patterns
above.
12. Trainings by Vidya Bhagwat
• Random Sampling :
The goal of random sampling is to provide a
snapshot of a website by downloading only a relatively small
number of pages. The sampling quality is the foundation of
the whole mining process. To keep the downloaded pages as
diverse as possible, in practice the sampling process adopts a
strategy combining both breadth-first and depth-first, and can
quickly retrieve pages at deep levels within a few steps.
13. • Inspired by this observation, in this project, DOM path is
utilized to characterize the layout of a webpage. As shown in
Fig. 5, a DOM path is a path from a leaf node to the root of
the DOM tree. The leaf node indicates the component type,
and the path-to-root approximately describes the visual
location of that component in page rendering.
• Given a set of HTML pages, all unique DOM paths are
extracted to form a feature space. Each page is represented
by a point in the feature space, and the layout similarity of
two pages can be estimated. A bottom-up strategy is then
utilized to group similar pages, and each cluster is considered
as a layout template.
Trainings by Vidya Bhagwat
14. • URL Pattern Discovery :
A URL is not an ordinary string but has a syntax
structure scheme strictly defined by W3C standards. Based on
a syntax structure, a URL string can be represented by a group
of key-value pairs. Fig. 6 gives an example URL, its syntax
structure, and the corresponding key-value pairs.
It is noticed that different URL components (or
keys) usually have different functions and play different roles
in a website. In general, keys denoting directories, functions,
and document types are with only a few values, which should
be explicitly recorded in a URL pattern. By contrast, keys
denoting parameters such as user names are with quite
Trainings by Vidya Bhagwat
diverse values, which should be generalized in the pattern.
15. • It is noticed that different URL components (or keys) usually
have different functions and play different roles in a website.
In general, keys denoting directories, functions, and
document types are with only a few values, which should be
explicitly recorded in a URL pattern. By contrast, keys
denoting parameters such as user names are with quite
diverse values, which should be generalized in the
pattern. Based on this observation, a top-down recursive split
process is proposed in this project to construct a pattern tree
to characterize a set of URLs. Fig. 7 gives an example pattern
tree based on URLs from Trainings www.by Vidya wretch.Bhagwat
cc. Algorithm details
please refer to.
16. • Website Designing India have assisted hundreds of businesses
to build or update a website custom to their requirements.
You get more than just a website with our Website Designing
Services. You can update your website content easily, take
credit card payments online, and use lots of tools like poll
managers, news managers, photo galleries, and form builders.
Whether you're looking for an ecommerce web design
company or a web development company that showcases
your business, our website designing & development services
give you control over your site with no technical skills needed.
Trainings by Vidya Bhagwat
17. Domain Name :
• This article is about domain names in the Internet. For other
Trainings by Vidya Bhagwat
uses, see Domain.
• A domain name is a unique name that identifies a website. It
is an identification string that defines a realm of
administrative autonomy, authority or control on the Internet.
Domain names are formed by the rules Domain Name System
(DNS). Any name registered in the DNS is a domain name. The
functional description of domain names is presented in the
Domain Name System article. Broader usage and industry
aspects are captured here.
18. • Domain names are used in various networking contexts and
application-specific naming and addressing purposes. In
general, a domain name represents an Internet Protocol (IP)
resource, such as a personal computer used to access the
Internet, a server computer hosting a web site, or the web
site itself or any other service communicated via the Internet.
In 2010, the number of active domains reached 196 million.
Trainings by Vidya Bhagwat
19. Use In Web Site Hosting
• The domain name is a component of a Uniform Resource
Locator (URL) used to access web sites, for example:
• URL: http://www.example.net/index.html
• Top-level domain name: net
• Second-level domain name: example.net
Trainings by Vidya Bhagwat
20. • Host name: www.example.net
• A domain name may point to multiple IP addresses in order to
provide server redundancy for the cybernetic services to be
delivered; such multi-address capability is used to manage the
traffic of large, popular web sites. More commonly, however,
one server computer, at a given IP address, may also host web
sites in different domains. Such address overloading enables
virtual web hosting, commonly used by large web hosting
services to conserve IP address space. IP-address overloading
is possible through a feature in the HTTP version 1.1 protocol,
but not in the HTTP version 1.0 protocol, which requires that a
request identify the domain name being referred for
Trainings by Vidya Bhagwat
connection.
21. Contact Information
• To obtain further information about any of our databases,
services, or programs, contact NCBI:
Pub Med Customer Service:
• Send an Email for help with technical issues, searching, or
Trainings by Vidya Bhagwat
content assistance
• Call 1-888-FIND-NLM (1-888-346-3656) for help with
searching or content assistance only
• General Information: info@ncbi.nlm.nih.gov
• Questions about and technical support for NCBI and its
programs and services
• BLAST: blast-help@ncbi.nlm.nih.gov
• Technical questions on running or interpreting BLAST
sequence comparison searches