SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Carlos Oliveira / May 31, 2012
Agenda
 Oracle Text Overview
    Introduction
    Oracle Text Overview
    Types of Index
    Text Query Application
    Document Presentation and Highlighting
    Document Samples
    Oracle Text Indexing Process
    Indexing Classes
    Examples
    Contains Operators
    POC
    Training & Reference
    Questions
Introduction
I am a forward-looking Information Systems Architect with a
solid Oracle DBA background comprising the daily
infrastructure tasks of the DBA, several projects as a Data
Modeler, and performance management projects.

I Started on the mainframe business, and soon had a deep dive
in application development for Oracle databases. After
acquiring an Oracle certification, I worked on performance
enhancement for applications using Oracle databases, and later
worked several years as an infrastructure DBA, later I worked
on data modeling projects and more recently a performance
management project, on both application and database layers.
“The limits of my language
mean the limits of my world.”



Ludwig Wittgenstein
What is Oracle Text
•An option the database that extends the text indexes
•It is a free option for Oracle DB (EE, SE, and PE)
•Has cataloging, referencing and classification features
•Deals with tags, such as HTML or XML
•Extends indexing for:
     •Documents stored in tables or referenced
     •PDF, MS Word, XML, text, ...
     •using data types as BLOB, BFILE, CLOB, long, ...
     •even web pages, stored or referenced
Oracle Text Overview
Types of Index
Type of                                                                               Query
Index     Description                                                                 Operator
CONTEXT   Use this index to build a text retrieval application when your text consists of  CONTAINS
          large coherent documents. You can index documents of different formats such
          as Microsoft Word, HTML, XML, or plain text.
          You can customize your index in a variety of ways.
CTXCAT    Use this index type to improve mixed query performance. Suitable for querying CATSEARCH
          small text fragments with structured criteria like dates, item names, and prices
          that are stored across columns.
CTXRULE   Use to build a document classification application. You create this index on a   MATCHES
          table of queries, where each query has a classification.
          Single documents (plain text, HTML, or XML) can be classified by using the
          MATCHES operator.
Text Query Application
Document Presentation and
                Highlighting
Output                                                Procedure
Plain text version, no highlights                     CTX_DOC.FILTER
HTML version of document, no highlights               CTX_DOC.FILTER
Highlighted document, plain text version              CTX_DOC.MARKUP
Highlighted document, HTML version                    CTX_DOC.MARKUP
Highlight offset information for plain text version   CTX_DOC.HIGHLIGHT
Highlight offset information for HTML version         CTX_DOC.HIGHLIGHT
Theme summaries and gist of document.                 CTX_DOC.GIST
List of themes in document.                           CTX_DOC.THEMES
Document Samples
Oracle Text Indexing Process
Indexing Classes
Class         Description
Datastore     How are your documents stored?

Filter        How can the documents be converted to plaintext?

Lexer         What language is being indexed?

Wordlist      How should stem and fuzzy queries be expanded?

Storage       How should the index data be stored?

Stop List     What words or themes are not to be indexed?

Section Group How are documents sections defined?
Example Parameters
EXEC ctx_ddl.drop_preference('address_lx');   begin
EXEC ctx_ddl.drop_preference('address_wl');   ctx_ddl.create_preference('address_lx','BASIC_LEXER');
EXEC ctx_ddl.drop_preference('address_st');   -- removes diacritics
EXEC ctx_ddl.drop_stoplist('address_sl');     ctx_ddl.set_attribute('address_lx','base_letter','YES');
                                              ctx_ddl.create_preference('address_wl','BASIC_WORDLIST');
                                              ctx_ddl.create_stoplist('address_sl', 'BASIC_STOPLIST');
                                                  ctx_ddl.add_stopclass('address_sl', 'NUMBERS');
                                              ctx_ddl.add_stopword('address_sl', 'a');
                                              ...
                                              ctx_ddl.add_stopword('address_sl', 'vocĂŞs');
                                              ctx_ddl.create_preference('address_st', 'BASIC_STORAGE');
                                              ctx_ddl.set_attribute('address_st','i_table_clause','TABLESPACE IDM2');
                                              ctx_ddl.set_attribute('address_st','k_table_clause','TABLESPACE IDM2');
                                              ctx_ddl.set_attribute('address_st','r_table_clause','TABLESPACE IDM2');
                                              ctx_ddl.set_attribute('address_st','n_table_clause','TABLESPACE IDM2');
                                              ctx_ddl.set_attribute('address_st',‘i_index_clause','TABLESPACE IDM2');
                                              ctx_ddl.set_attribute('address_st','p_table_clause','TABLESPACE IDM2');
                                              end;
                                              /
Example DDL and Query
DROP INDEX dbapp.IDX_st_address_3;                   SET DEFINE OFF;
CREATE INDEX dbapp.IDX_st_address_3 ON               SELECT NOM_st_address
dbapp.st_address(NOM_st_address)                     FROM dbapp.st_address
INDEXTYPE IS CTXSYS.CONTEXT                          WHERE CONTAINS (NOM_st_address, 'ST&MAJOR&OSCAR&STONE', 1) > 0;
PARAMETERS ('LEXER address_lx
WORDLIST address_wl                              Plan
STOPLIST address_sl
STORAGE address_st')                             SELECT STATEMENT CHOOSE Cost: 18 Bytes: 118 Cardinality: 1
PARALLEL 8;                                             2 TABLE ACCESS BY INDEX ROWID dbapp.st_address_3 Cost: 18 Bytes: 118 Cardinality: 1
COMMIT;
                                                                    1 DOMAIN INDEX dbapp.st_address_3 Cost: 15
BEGIN
                                                     NOM_st_address
  SYS.DBMS_STATS.GATHER_TABLE_STATS (
                                                     --------------------------------------------------
  OwnName        => 'dbapp'
                                                     OSCAR STONE MAJOR
  ,TabName      => 'st_address'
  ,Estimate_Percent => NULL                          SET DEFINE OFF;
  ,Method_Opt => 'FOR ALL INDEXED COLUMNS SIZE       SELECT NOM_st_address
AUTO '                                               FROM dbapp.st_address
  ,Degree       => 8                                 WHERE CONTAINS (NOM_st_address, 'ST&OSCAR&STONE', 1) > 0;
  ,Cascade      => TRUE
  ,No_Invalidate => FALSE);                          NOM_st_address
END;                                                 --------------------------------------------------
/                                                    JOSE OSCAR STONE
                                                     ........
                                                     OSCAR STONE MAJOR
                                                     OSCAR WEBBER STONE

                                                     28 rows selected.
CONTAINS Operators
•EQUIValence (=)
                             Query Expression            Order of Evaluation
•NEAR (;)
•weight (*), threshold (>)   w1 | w2 & w3                (w1) | (w2 & w3)
•MINUS (-)
•NOT (~)                     w1 & w2 | w3                (w1 & w2) | w3
•WITHIN
•AND (&)                     ?w1, w2 | w3 & w4           (?w1), (w2 | (w3 & w4))
•OR (|)
•ACCUMulate ( , )
                             abc = def ghi & jkl = mno   ((abc = def) ghi) &
                                                         (jkl=mno)
•Wildcard Characters
•ABOUT                       dog and cat WITHIN body     dog and (cat WITHIN body)
•stem ($)
•Fuzzy
•soundex (!)
Training
Resources at Oracle website

• Text Application Developer's Guide
http://docs.oracle.com/cd/B10501_01/text.920/a96517/toc.htm

• Text Reference
http://docs.oracle.com/cd/B10501_01/text.920/a96518/toc.htm
Thank you




Carlos Oliveira / May 31, 2012

Weitere ähnliche Inhalte

KĂźrzlich hochgeladen

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vĂĄzquez
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 

KĂźrzlich hochgeladen (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

Empfohlen

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceChristy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Empfohlen (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Oracle Text Public

  • 1. Carlos Oliveira / May 31, 2012
  • 2. Agenda  Oracle Text Overview  Introduction  Oracle Text Overview  Types of Index  Text Query Application  Document Presentation and Highlighting  Document Samples  Oracle Text Indexing Process  Indexing Classes  Examples  Contains Operators  POC  Training & Reference  Questions
  • 3. Introduction I am a forward-looking Information Systems Architect with a solid Oracle DBA background comprising the daily infrastructure tasks of the DBA, several projects as a Data Modeler, and performance management projects. I Started on the mainframe business, and soon had a deep dive in application development for Oracle databases. After acquiring an Oracle certification, I worked on performance enhancement for applications using Oracle databases, and later worked several years as an infrastructure DBA, later I worked on data modeling projects and more recently a performance management project, on both application and database layers.
  • 4. “The limits of my language mean the limits of my world.” Ludwig Wittgenstein
  • 5. What is Oracle Text •An option the database that extends the text indexes •It is a free option for Oracle DB (EE, SE, and PE) •Has cataloging, referencing and classification features •Deals with tags, such as HTML or XML •Extends indexing for: •Documents stored in tables or referenced •PDF, MS Word, XML, text, ... •using data types as BLOB, BFILE, CLOB, long, ... •even web pages, stored or referenced
  • 7. Types of Index Type of Query Index Description Operator CONTEXT Use this index to build a text retrieval application when your text consists of CONTAINS large coherent documents. You can index documents of different formats such as Microsoft Word, HTML, XML, or plain text. You can customize your index in a variety of ways. CTXCAT Use this index type to improve mixed query performance. Suitable for querying CATSEARCH small text fragments with structured criteria like dates, item names, and prices that are stored across columns. CTXRULE Use to build a document classification application. You create this index on a MATCHES table of queries, where each query has a classification. Single documents (plain text, HTML, or XML) can be classified by using the MATCHES operator.
  • 9. Document Presentation and Highlighting Output Procedure Plain text version, no highlights CTX_DOC.FILTER HTML version of document, no highlights CTX_DOC.FILTER Highlighted document, plain text version CTX_DOC.MARKUP Highlighted document, HTML version CTX_DOC.MARKUP Highlight offset information for plain text version CTX_DOC.HIGHLIGHT Highlight offset information for HTML version CTX_DOC.HIGHLIGHT Theme summaries and gist of document. CTX_DOC.GIST List of themes in document. CTX_DOC.THEMES
  • 12. Indexing Classes Class Description Datastore How are your documents stored? Filter How can the documents be converted to plaintext? Lexer What language is being indexed? Wordlist How should stem and fuzzy queries be expanded? Storage How should the index data be stored? Stop List What words or themes are not to be indexed? Section Group How are documents sections defined?
  • 13. Example Parameters EXEC ctx_ddl.drop_preference('address_lx'); begin EXEC ctx_ddl.drop_preference('address_wl'); ctx_ddl.create_preference('address_lx','BASIC_LEXER'); EXEC ctx_ddl.drop_preference('address_st'); -- removes diacritics EXEC ctx_ddl.drop_stoplist('address_sl'); ctx_ddl.set_attribute('address_lx','base_letter','YES'); ctx_ddl.create_preference('address_wl','BASIC_WORDLIST'); ctx_ddl.create_stoplist('address_sl', 'BASIC_STOPLIST'); ctx_ddl.add_stopclass('address_sl', 'NUMBERS'); ctx_ddl.add_stopword('address_sl', 'a'); ... ctx_ddl.add_stopword('address_sl', 'vocĂŞs'); ctx_ddl.create_preference('address_st', 'BASIC_STORAGE'); ctx_ddl.set_attribute('address_st','i_table_clause','TABLESPACE IDM2'); ctx_ddl.set_attribute('address_st','k_table_clause','TABLESPACE IDM2'); ctx_ddl.set_attribute('address_st','r_table_clause','TABLESPACE IDM2'); ctx_ddl.set_attribute('address_st','n_table_clause','TABLESPACE IDM2'); ctx_ddl.set_attribute('address_st',‘i_index_clause','TABLESPACE IDM2'); ctx_ddl.set_attribute('address_st','p_table_clause','TABLESPACE IDM2'); end; /
  • 14. Example DDL and Query DROP INDEX dbapp.IDX_st_address_3; SET DEFINE OFF; CREATE INDEX dbapp.IDX_st_address_3 ON SELECT NOM_st_address dbapp.st_address(NOM_st_address) FROM dbapp.st_address INDEXTYPE IS CTXSYS.CONTEXT WHERE CONTAINS (NOM_st_address, 'ST&MAJOR&OSCAR&STONE', 1) > 0; PARAMETERS ('LEXER address_lx WORDLIST address_wl Plan STOPLIST address_sl STORAGE address_st') SELECT STATEMENT CHOOSE Cost: 18 Bytes: 118 Cardinality: 1 PARALLEL 8; 2 TABLE ACCESS BY INDEX ROWID dbapp.st_address_3 Cost: 18 Bytes: 118 Cardinality: 1 COMMIT; 1 DOMAIN INDEX dbapp.st_address_3 Cost: 15 BEGIN NOM_st_address SYS.DBMS_STATS.GATHER_TABLE_STATS ( -------------------------------------------------- OwnName => 'dbapp' OSCAR STONE MAJOR ,TabName => 'st_address' ,Estimate_Percent => NULL SET DEFINE OFF; ,Method_Opt => 'FOR ALL INDEXED COLUMNS SIZE SELECT NOM_st_address AUTO ' FROM dbapp.st_address ,Degree => 8 WHERE CONTAINS (NOM_st_address, 'ST&OSCAR&STONE', 1) > 0; ,Cascade => TRUE ,No_Invalidate => FALSE); NOM_st_address END; -------------------------------------------------- / JOSE OSCAR STONE ........ OSCAR STONE MAJOR OSCAR WEBBER STONE 28 rows selected.
  • 15. CONTAINS Operators •EQUIValence (=) Query Expression Order of Evaluation •NEAR (;) •weight (*), threshold (>) w1 | w2 & w3 (w1) | (w2 & w3) •MINUS (-) •NOT (~) w1 & w2 | w3 (w1 & w2) | w3 •WITHIN •AND (&) ?w1, w2 | w3 & w4 (?w1), (w2 | (w3 & w4)) •OR (|) •ACCUMulate ( , ) abc = def ghi & jkl = mno ((abc = def) ghi) & (jkl=mno) •Wildcard Characters •ABOUT dog and cat WITHIN body dog and (cat WITHIN body) •stem ($) •Fuzzy •soundex (!)
  • 16. Training Resources at Oracle website • Text Application Developer's Guide http://docs.oracle.com/cd/B10501_01/text.920/a96517/toc.htm • Text Reference http://docs.oracle.com/cd/B10501_01/text.920/a96518/toc.htm
  • 17. Thank you Carlos Oliveira / May 31, 2012