SlideShare ist ein Scribd-Unternehmen logo
1 von 20
BY
K.RAJASEKHAR REDDY
        (08Q61A0528)
Contents:
 Introduction
 Wrappers
 Clustering
 System Description
 Working
 Types
 Advantages and Disadvantages
 Conclusion
Introduction:
STAVIES is a system for
 Information Extraction through
 Automatic Web Wrapper Using
 clustering Techniques.
STAVIES is used in:
 Automatic Information Discovery.


 Extraction of structured web data.
WRAPPERS
 Piece of software to extract the
 useful information from web data
 sources.

 Data extracted is referred as Structural
 Tokens.
Categories of Wrappers:
 Site Specific:
    Extracts information from a web
 pages
    or family of web pages.
 Generic wrappers:
   Can be applied to almost any page
   regardless of the structures.
CLUSTERING
Process of recognizing input data
 set in such a way that data points in
 same cluster are similar other than
 in different clusters.
Quality Evaluation Measures:
 Cluster Compactness:
 Evaluates how the subsets of input are redistributed
 by clustering system, compared with whole input set.
 Cluster Separation:
 Indicates overall dissimilarity among the output
 clusters.
System Description
 Two modules


     1.Transformation module

     2.Extraction module
Phases:
 Preparation Phase:
   1.Validation correction and XHTML
   generation.

    2.Tree transformation and Terminal
      node selecton
• Segmentation Phase:
   1. Nodes Comparison.

   2. Hierarchical clustering.

   3. Cluster Evaluation and Target area
      Discover.

   4. Boundary selection.
• Information Retrieval Phase:

    1. Information Extraction component.
Working:
Experimental Results:
Types:
 OMINI



 MDR
Advantages:
 Executes in less than 0.4 sec.


 No human assistance is required.


 High performance.
Disadvantage:
 Hard to implement in free texts and
 non-template pages.
Conclusion
 STAVIES saves precious time and effort.
 Tested successfully in more than 63,000
  HTML pages from 50 different web
 data sources.
THANK YOU.
Queries????

Weitere ähnliche Inhalte

Ähnlich wie stavies

Web Services: Encapsulation, Reusability, and Simplicity
Web Services: Encapsulation, Reusability, and SimplicityWeb Services: Encapsulation, Reusability, and Simplicity
Web Services: Encapsulation, Reusability, and Simplicityhannonhill
 
Cloud data management
Cloud data managementCloud data management
Cloud data managementambitlick
 
Web clustering engines
Web clustering enginesWeb clustering engines
Web clustering enginesYash Darak
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using wekaPrashant Menon
 
Automatic Analyzing System for Packet Testing and Fault Mapping
Automatic Analyzing System for Packet Testing and Fault MappingAutomatic Analyzing System for Packet Testing and Fault Mapping
Automatic Analyzing System for Packet Testing and Fault MappingIRJET Journal
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataeXascale Infolab
 
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICESKEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICESMykola Novik
 
Cluster computing ppt
Cluster computing pptCluster computing ppt
Cluster computing pptDC Graphics
 
clustering_classification.ppt
clustering_classification.pptclustering_classification.ppt
clustering_classification.pptHODECE21
 
Cassandra Applications Benchmarking
Cassandra Applications BenchmarkingCassandra Applications Benchmarking
Cassandra Applications Benchmarkingniallmilton
 
Database Analysis, OLAP, Aggregate Functions
Database Analysis, OLAP, Aggregate FunctionsDatabase Analysis, OLAP, Aggregate Functions
Database Analysis, OLAP, Aggregate FunctionsSaifur Rahman
 
Data Virtualization Deployments: How to Manage Very Large Deployments
Data Virtualization Deployments: How to Manage Very Large DeploymentsData Virtualization Deployments: How to Manage Very Large Deployments
Data Virtualization Deployments: How to Manage Very Large DeploymentsDenodo
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data gridBogdan Dina
 

Ähnlich wie stavies (20)

Web Services: Encapsulation, Reusability, and Simplicity
Web Services: Encapsulation, Reusability, and SimplicityWeb Services: Encapsulation, Reusability, and Simplicity
Web Services: Encapsulation, Reusability, and Simplicity
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Cloud data management
Cloud data managementCloud data management
Cloud data management
 
Clusters
ClustersClusters
Clusters
 
Web clustering engines
Web clustering enginesWeb clustering engines
Web clustering engines
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 
Automatic Analyzing System for Packet Testing and Fault Mapping
Automatic Analyzing System for Packet Testing and Fault MappingAutomatic Analyzing System for Packet Testing and Fault Mapping
Automatic Analyzing System for Packet Testing and Fault Mapping
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
 
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICESKEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
 
Cluster computing ppt
Cluster computing pptCluster computing ppt
Cluster computing ppt
 
clustering_classification.ppt
clustering_classification.pptclustering_classification.ppt
clustering_classification.ppt
 
50120140504006
5012014050400650120140504006
50120140504006
 
Cassandra Applications Benchmarking
Cassandra Applications BenchmarkingCassandra Applications Benchmarking
Cassandra Applications Benchmarking
 
Database Analysis, OLAP, Aggregate Functions
Database Analysis, OLAP, Aggregate FunctionsDatabase Analysis, OLAP, Aggregate Functions
Database Analysis, OLAP, Aggregate Functions
 
Data Virtualization Deployments: How to Manage Very Large Deployments
Data Virtualization Deployments: How to Manage Very Large DeploymentsData Virtualization Deployments: How to Manage Very Large Deployments
Data Virtualization Deployments: How to Manage Very Large Deployments
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data grid
 
cluster computing
cluster computingcluster computing
cluster computing
 
Intro to Databases
Intro to DatabasesIntro to Databases
Intro to Databases
 
Promostat original
Promostat originalPromostat original
Promostat original
 

Mehr von Akhil Kumar

Edp section of solids
Edp  section of solidsEdp  section of solids
Edp section of solidsAkhil Kumar
 
Edp projection of solids
Edp  projection of solidsEdp  projection of solids
Edp projection of solidsAkhil Kumar
 
Edp projection of planes
Edp  projection of planesEdp  projection of planes
Edp projection of planesAkhil Kumar
 
Edp projection of lines
Edp  projection of linesEdp  projection of lines
Edp projection of linesAkhil Kumar
 
Edp ortographic projection
Edp  ortographic projectionEdp  ortographic projection
Edp ortographic projectionAkhil Kumar
 
Edp intersection
Edp  intersectionEdp  intersection
Edp intersectionAkhil Kumar
 
Edp ellipse by gen method
Edp  ellipse by gen methodEdp  ellipse by gen method
Edp ellipse by gen methodAkhil Kumar
 
Edp development of surfaces of solids
Edp  development of surfaces of solidsEdp  development of surfaces of solids
Edp development of surfaces of solidsAkhil Kumar
 
Edp typical problem
Edp  typical problemEdp  typical problem
Edp typical problemAkhil Kumar
 
Edp st line(new)
Edp  st line(new)Edp  st line(new)
Edp st line(new)Akhil Kumar
 
graphical password authentication
graphical password authenticationgraphical password authentication
graphical password authenticationAkhil Kumar
 

Mehr von Akhil Kumar (20)

Edp section of solids
Edp  section of solidsEdp  section of solids
Edp section of solids
 
Edp scales
Edp  scalesEdp  scales
Edp scales
 
Edp projection of solids
Edp  projection of solidsEdp  projection of solids
Edp projection of solids
 
Edp projection of planes
Edp  projection of planesEdp  projection of planes
Edp projection of planes
 
Edp projection of lines
Edp  projection of linesEdp  projection of lines
Edp projection of lines
 
Edp ortographic projection
Edp  ortographic projectionEdp  ortographic projection
Edp ortographic projection
 
Edp isometric
Edp  isometricEdp  isometric
Edp isometric
 
Edp intersection
Edp  intersectionEdp  intersection
Edp intersection
 
Edp excerciseeg
Edp  excerciseegEdp  excerciseeg
Edp excerciseeg
 
Edp ellipse by gen method
Edp  ellipse by gen methodEdp  ellipse by gen method
Edp ellipse by gen method
 
Edp development of surfaces of solids
Edp  development of surfaces of solidsEdp  development of surfaces of solids
Edp development of surfaces of solids
 
Edp curves2
Edp  curves2Edp  curves2
Edp curves2
 
Edp curve1
Edp  curve1Edp  curve1
Edp curve1
 
Edp typical problem
Edp  typical problemEdp  typical problem
Edp typical problem
 
Edp st line(new)
Edp  st line(new)Edp  st line(new)
Edp st line(new)
 
graphical password authentication
graphical password authenticationgraphical password authentication
graphical password authentication
 
yii framework
yii frameworkyii framework
yii framework
 
cloud computing
cloud computingcloud computing
cloud computing
 
WORDPRESS
WORDPRESSWORDPRESS
WORDPRESS
 
AJAX
AJAXAJAX
AJAX
 

Kürzlich hochgeladen

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Kürzlich hochgeladen (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

stavies