SlideShare a Scribd company logo
1 of 29
Download to read offline
Patent Search: An important new test bed for IR

                    J. Tait, M. Lupu1
     H. Berger, G. Roda, M. Dittenbach, A. Pesenhofer2
                 E. Graf, K. van Rijsbergen3

                 1 InformationRetrieval Facility
                        Vienna, Austria
                         2 Matrixware

                        Vienna, Austria
                   3 University of Glasgow

                 Dept. of Computing Science
                        Glasgow, UK


               DIR 2009 / Feb. 2-3, 2009
Patent Search.




  Patent search is a highly specialized form of information search.
  It is characterized by its
      target data
      type of information needs
      legal and economic implications
Target data


  Data for patent retrieval comes mainly from:
      patent databases from patent authorities (EPO, USPTO,
      JPO, SIPO, WIPO, etc.)
      scientific publications
      prior art databases (IP.com)




  A new acronym
  SIPO: State Intellectual Property Office of the Peoples’ Republic of
  China
Target data


  Characteristics of patent documents
      multilingual and ’legalese’
      non uniform formats
      some are OCR’d
      figures, images, chemical formulas, DNA sequences
      include references to patent and non-patent literature




  A new acronym
  NPL: Non-Patent Literature
Information Needs.


  K.H. Atkinson, Towards a more rational patent search paradigm:



  depending on what group is doing the asking, the types of patent
  search requested may include simple patentability, clearance to
  market a product, validity, opposition to a patent being sought by
  another, infringement watch, creating IP landscapes for business
  development or R&D, infringement defense, litigation, prosecution
  support, and creation of portfolios for assignments, investments,
  mergers and acquisitions [ . . . ]
Legal and economic implications.


      patents are legal documents
      patent portfolios are assets for enterprises
      a single patent search can be worth several days of work




  High recall searches
  Missing even a single relevant document can have severe financial
  and economic impact. For example, when a granted patent
  becomes invalidated because of a document omitted at application
  time.
Introduction
                             Patent Search
                     A modern IR test bed
              Promoting take up of research
                                Conclusion




We have characterized the patent search problem by describing its
target data, types of information needs, legal and economic
implications.


Next:
    evaluating IR techniques in the patent domain
         previous initiatives in the area of patent retrieval
         the CLEF-IP and TREC-Chem initiatives
    promoting take-up of research




                                 Tait et al.   Patent Search: An important new test bed for IR
Test collections


  Test collections in Information Retrieval play a pivotal role in the
  evaluation of retrieval models.



  Domain-specific test collections already exist for:
       Web pages
       news stories
       legal documents
       blogs
       genomics
       patents
Pioneering work in patent retrieval.

  Patent retrieval task at the NTCIR Workshop1 since 2001.
         produced test collections primarily targeting Japanese patents
         retrieval tasks
             ad-hoc (goal: find patents on a given topic)
             invalidity search (goal: find patents invalidating a given claim)
             patent classification according to the F-term system



  Two new acronyms
  F-term (abbreviation of File-forming term) is the classification
  system used in Japan as a complement to IPC (International
  Patent Classification)



    1
        http://research.nii.ac.jp/ntcir
Evaluation tracks.




  The IRF has engaged in two pilot evaluation tracks on patent
  retrieval
      CLEF-IP
      www.ir-facility.org/the_irf/clef-ip09-track
      TREC-Chem
      www.ir-facility.org/the_irf/trec_chem.htm
CLEF-Intellectual Property Initiative.

  CLEF-IP
         coordinated by the IRF
         part of the Cross-Language Evaluation Forum2
         will focus on the task of prior art search
         European patents as target data
         automatic extraction of relevance assessments



  Prior art search
  Prior art search consists in identifying all information (including
  NPL) that might be relevant to a patent’s claim of novelty.



    2
        http://www.clef-campaign.org
Prior art search.


  The most common type of patent search. Performed at various
  stages of the patent life-cycle and with different intentions:
      before filing an application (novelty search or patentability
      search) to determine whether the invention fulfills the
      requirements of
           novelty
           inventive step
      before grant - results go into a search report attached to
      patent
      invalidity search: post-grant search used to unveil prior art
      that invalidates a patent’s claims of originality
Target data.



  The CLEF-IP evaluation track will restrict target data to patents.

  Target data:
      comprising 16 years (filing date between 1985 and 2000) of
      EPO patents
      1.9 million patent documents corresponding to 1 million
      patents
      75 GB, in XML format
      documents are in English, German, and French
Automatic extraction of relevance assessments.



  The data resulting from prior art searches is saved in the EPO or
  USPTO databases as:
      citations in patent applications
      citations in search report
      citations in opposition’s legal files

  The CLEF-IP track is going to extract this information (as much
  as possible) automatically in order to form a large set of topics.
Prior art from opposition procedures.




      According to the European patent law, a granted patent may
      be opposed.
      It is often the case that opponent provides new prior art that
      invalidates claim of originality of the invention.
      Patents cited in opposition procedures are very relevant prior
      art documents.
      They are the results of a very thorough invalidity search.
Crowdsourcing extraction of relevance assessments.



         Need to extract citations from documents arising from
         opposition procedures
         These documents are only are available as scanned images3
         Will be using crowdsourcing for extracting these citations.




  A new word from business jargon
  Crowdsourcing.




    3
        at http://www.epoline.org
Relevance and evaluation measures.


  Labels used in search reports:

    label   means that cited document is
      X     relevant when taken alone
      Y     relevant in combination with other documents
      A     relevant but not prejudicial to novelty or inventive step




  How to use these labels for defining new evaluation measures?
Challenges.




  As a result of the CLEF-IP track we expect to obtain new insights
  on:
      how to represent information need given by a patent
      query reformulation
      evaluation metrics for patent retrieval
      using machine translation for improving retrieval effectiveness
TREC Chemistry track.



     Ad-hoc search
     Target data:
         academic papers (Royal Society of Chemistry)
         chemical patent documents (class C in the IPC)
     Will use automatic extraction of citations for relevance
     assessments
     Challenges:
         chemical names and structures
         chemical interactions, relations, transformations, properties
Introduction
                        Patent Search     Pioneering work at NTCIR
                A modern IR test bed      CLEF-IP
         Promoting take up of research    TREC-Chem
                           Conclusion




The IRF is contributing to the creation of new patent test
collections by organizing two tracks within the CLEF and
TREC evaluation campaigns.

In addition to the TREC and CLEF contributions, the IRF,
together with Matrixware, is promoting several initiatives
aimed at facilitating and improving the patent retrieval
process.




                            Tait et al.   Patent Search: An important new test bed for IR
Introduction     The IRF
                              Patent Search     Matrixware
                      A modern IR test bed      Promoting research
               Promoting take up of research    Providing the tools
                                 Conclusion     Current University Projects



Promoting take up of research


  Next:
      presentation of the IRF and Matrixware
      promoting take up of research
          the IRF symposium
          the PaIR workshop
      providing the tools
      funding research in the area of patent retrieval




                                  Tait et al.   Patent Search: An important new test bed for IR
IRF: the Information Retrieval Facility.




    New international not-for-profit
    foundation, based in Vienna,
    Its mission:
        to bridge the gap between the needs of
        the industry and the academic know-how
        to promote and facilitate research in
        large scale information retrieval
        maintain a facility that enables large
        scale information retrieval and in-depth
        data processing
Matrixware.




    Founded 2005 in Vienna
    80 Employees
    > 15 Academic Partners Worldwide
    Implements solutions for access to patent
    information
Promoting research.



  Matrixware and the IRF have engaged in several initiatives aimed
  at promoting research and raising awareness in the area of patent
  retrieval.
      the Information Retrieval Facility Symposium
      an annual symposium held in Vienna to foster knowledge
      exchange between IR experts and IP professionals
      the PaIR workshop
      a workshop on Patent Information retrieval hosted by the
      CIKM conference
Providing the tools.




  Successful IR research conventionally depends on three elements:
    1   the availability of test collections
    2   access to suitable software systems on which to run
        experiments
    3   access to sufficiently powerful hardware


  The IRF, supported by Matrixware, is providing all three of these.
Current University Projects.




      Accessibility of Information (Glasgow)
      Large Scale Logical Retrieval (Glasgow)
      Semantic Analysis of Patent Data (Sheffield and Nijmegen)
      Language Modeling for Patent Retrieval (Umass Amherst)
      OCR for patents (Umass Amherst)
Concluding remarks




     Patent retrieval is an interesting and important open
     challenge for IR researchers.
     The IRF and Matrixware have engaged in several projects
     aimed at promoting research in this area.
Introduction
                               Patent Search     Concluding remarks
                       A modern IR test bed      Invitation
                Promoting take up of research    Closing
                                  Conclusion



Invitation.



  You are invited to:
      join one of the evaluation tracks
           CLEF-IP
           TREC-Chem
      participate in the PaIR workshop
      participate in the Information Retrieval Facility Symposium




                                   Tait et al.   Patent Search: An important new test bed for IR
Thank you for your attention.

More Related Content

What's hot

II-PIC 2017: China: Life after the Patent Tsunami
II-PIC 2017: China: Life after the Patent TsunamiII-PIC 2017: China: Life after the Patent Tsunami
II-PIC 2017: China: Life after the Patent TsunamiDr. Haxel Consult
 
Patent database with one example
Patent database with one examplePatent database with one example
Patent database with one examplePallavi Belkar
 
Patent Process: Filing to Grant
Patent Process: Filing to GrantPatent Process: Filing to Grant
Patent Process: Filing to GrantAshwani Dhingra
 
Patent search analysis and report
Patent search analysis and reportPatent search analysis and report
Patent search analysis and reportYash Patel
 
Freedom to operate: Biosciences innovations and intellectual property manage...
Freedom to operate: Biosciences innovations and intellectual property manage...Freedom to operate: Biosciences innovations and intellectual property manage...
Freedom to operate: Biosciences innovations and intellectual property manage...ILRI
 
Patents 101: How to Do a Patent Search
Patents 101: How to Do a Patent SearchPatents 101: How to Do a Patent Search
Patents 101: How to Do a Patent SearchKristina Gomez
 
II-PIC 2017: Product presentation Lighthouse IP
II-PIC 2017: Product presentation Lighthouse IPII-PIC 2017: Product presentation Lighthouse IP
II-PIC 2017: Product presentation Lighthouse IPDr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult
 
How to do an effective patent search
How to do an effective patent searchHow to do an effective patent search
How to do an effective patent searchBjörn Jürgens
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsdgarijo
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesdgarijo
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and CeremonyArchiver
 

What's hot (16)

II-PIC 2017: China: Life after the Patent Tsunami
II-PIC 2017: China: Life after the Patent TsunamiII-PIC 2017: China: Life after the Patent Tsunami
II-PIC 2017: China: Life after the Patent Tsunami
 
Patent database with one example
Patent database with one examplePatent database with one example
Patent database with one example
 
Patent Process: Filing to Grant
Patent Process: Filing to GrantPatent Process: Filing to Grant
Patent Process: Filing to Grant
 
Patent database
Patent databasePatent database
Patent database
 
Patent search analysis and report
Patent search analysis and reportPatent search analysis and report
Patent search analysis and report
 
Freedom to operate: Biosciences innovations and intellectual property manage...
Freedom to operate: Biosciences innovations and intellectual property manage...Freedom to operate: Biosciences innovations and intellectual property manage...
Freedom to operate: Biosciences innovations and intellectual property manage...
 
Patent Search
Patent SearchPatent Search
Patent Search
 
Patents 101: How to Do a Patent Search
Patents 101: How to Do a Patent SearchPatents 101: How to Do a Patent Search
Patents 101: How to Do a Patent Search
 
Patent analysis
Patent analysisPatent analysis
Patent analysis
 
II-PIC 2017: Product presentation Lighthouse IP
II-PIC 2017: Product presentation Lighthouse IPII-PIC 2017: Product presentation Lighthouse IP
II-PIC 2017: Product presentation Lighthouse IP
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
How to do an effective patent search
How to do an effective patent searchHow to do an effective patent search
How to do an effective patent search
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
 
Patent search
Patent searchPatent search
Patent search
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and Ceremony
 

Viewers also liked

Pharmaceutical patents in india – compulsory licensing, health emergency & af...
Pharmaceutical patents in india – compulsory licensing, health emergency & af...Pharmaceutical patents in india – compulsory licensing, health emergency & af...
Pharmaceutical patents in india – compulsory licensing, health emergency & af...Rahul Dev
 
Compulsory licensing by surendra
Compulsory licensing by surendraCompulsory licensing by surendra
Compulsory licensing by surendraAnumulaSurendra
 
Introduction to patent search
Introduction to patent searchIntroduction to patent search
Introduction to patent searchPatSnap
 
Intellectual Property Rights
Intellectual Property RightsIntellectual Property Rights
Intellectual Property Rightsharshhanu
 

Viewers also liked (8)

Pharmaceutical patents in india – compulsory licensing, health emergency & af...
Pharmaceutical patents in india – compulsory licensing, health emergency & af...Pharmaceutical patents in india – compulsory licensing, health emergency & af...
Pharmaceutical patents in india – compulsory licensing, health emergency & af...
 
Compulsory licensing by surendra
Compulsory licensing by surendraCompulsory licensing by surendra
Compulsory licensing by surendra
 
CL PPT
CL PPTCL PPT
CL PPT
 
PCT
PCTPCT
PCT
 
Introduction to patent search
Introduction to patent searchIntroduction to patent search
Introduction to patent search
 
Indian patent act
Indian patent actIndian patent act
Indian patent act
 
The patent act
The patent actThe patent act
The patent act
 
Intellectual Property Rights
Intellectual Property RightsIntellectual Property Rights
Intellectual Property Rights
 

Similar to Patent Search: An important new test bed for IR

Intellectual Property Serrvices Outsourcing- India Company Overview
Intellectual Property Serrvices Outsourcing- India Company OverviewIntellectual Property Serrvices Outsourcing- India Company Overview
Intellectual Property Serrvices Outsourcing- India Company OverviewEPatents IP Services
 
Patent search analysis and report
Patent search analysis and reportPatent search analysis and report
Patent search analysis and reportYash Patel
 
Patent search from product specification final
Patent search from product specification finalPatent search from product specification final
Patent search from product specification finalIIITA
 
Methods to improve Freedom to Operate analysis
Methods to improve Freedom to Operate analysisMethods to improve Freedom to Operate analysis
Methods to improve Freedom to Operate analysisDauverC
 
A Survey Of Automated Hierarchical Classification Of Patents
A Survey Of Automated Hierarchical Classification Of PatentsA Survey Of Automated Hierarchical Classification Of Patents
A Survey Of Automated Hierarchical Classification Of PatentsCourtney Esco
 
PatAnalyse Presentation
PatAnalyse PresentationPatAnalyse Presentation
PatAnalyse Presentationzhiv12
 
PatAnalyse presentation
PatAnalyse presentationPatAnalyse presentation
PatAnalyse presentationvictor_zh
 
Process Protection Lieu Final
Process Protection Lieu FinalProcess Protection Lieu Final
Process Protection Lieu FinalFITT
 
Chi ham ip-workshop_databases_demo_chile
Chi ham ip-workshop_databases_demo_chileChi ham ip-workshop_databases_demo_chile
Chi ham ip-workshop_databases_demo_chileFundación COPEC - UC
 
CambridgeIP: Case Studies Of Recent Client Engagements
CambridgeIP: Case Studies Of Recent Client EngagementsCambridgeIP: Case Studies Of Recent Client Engagements
CambridgeIP: Case Studies Of Recent Client EngagementsCambridgeIP Ltd
 
mHealth Israel_ IP Strategy in China_Ehrlich & Fenster
mHealth Israel_ IP Strategy in China_Ehrlich & FenstermHealth Israel_ IP Strategy in China_Ehrlich & Fenster
mHealth Israel_ IP Strategy in China_Ehrlich & FensterLevi Shapiro
 
Data Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Thailand
 
PIIP_Patent search & Analysis_20160829
PIIP_Patent search & Analysis_20160829PIIP_Patent search & Analysis_20160829
PIIP_Patent search & Analysis_20160829YiTien Liao
 
Advancing Global Innovation: The Role of PCT Practice and Strategy
Advancing Global Innovation: The Role of PCT Practice and Strategy Advancing Global Innovation: The Role of PCT Practice and Strategy
Advancing Global Innovation: The Role of PCT Practice and Strategy spkowalski
 
FITT Toolbox: Protection
FITT Toolbox: ProtectionFITT Toolbox: Protection
FITT Toolbox: ProtectionFITT
 

Similar to Patent Search: An important new test bed for IR (20)

Intellectual Property Serrvices Outsourcing- India Company Overview
Intellectual Property Serrvices Outsourcing- India Company OverviewIntellectual Property Serrvices Outsourcing- India Company Overview
Intellectual Property Serrvices Outsourcing- India Company Overview
 
Patent search analysis and report
Patent search analysis and reportPatent search analysis and report
Patent search analysis and report
 
Patent search from product specification final
Patent search from product specification finalPatent search from product specification final
Patent search from product specification final
 
Methods to improve Freedom to Operate analysis
Methods to improve Freedom to Operate analysisMethods to improve Freedom to Operate analysis
Methods to improve Freedom to Operate analysis
 
A Survey Of Automated Hierarchical Classification Of Patents
A Survey Of Automated Hierarchical Classification Of PatentsA Survey Of Automated Hierarchical Classification Of Patents
A Survey Of Automated Hierarchical Classification Of Patents
 
PatAnalyse Presentation
PatAnalyse PresentationPatAnalyse Presentation
PatAnalyse Presentation
 
PatAnalyse presentation
PatAnalyse presentationPatAnalyse presentation
PatAnalyse presentation
 
Process Protection Lieu Final
Process Protection Lieu FinalProcess Protection Lieu Final
Process Protection Lieu Final
 
Chi ham ip-workshop_databases_demo_chile
Chi ham ip-workshop_databases_demo_chileChi ham ip-workshop_databases_demo_chile
Chi ham ip-workshop_databases_demo_chile
 
OTN - Mining the patent system to improve research and its commercialization ...
OTN - Mining the patent system to improve research and its commercialization ...OTN - Mining the patent system to improve research and its commercialization ...
OTN - Mining the patent system to improve research and its commercialization ...
 
CambridgeIP: Case Studies Of Recent Client Engagements
CambridgeIP: Case Studies Of Recent Client EngagementsCambridgeIP: Case Studies Of Recent Client Engagements
CambridgeIP: Case Studies Of Recent Client Engagements
 
An introduction to patent data
An introduction to patent dataAn introduction to patent data
An introduction to patent data
 
UNH Law/WIPO Summer School: 2017 Patent Information and its Usefulness
UNH Law/WIPO Summer School: 2017 Patent Information and its Usefulness UNH Law/WIPO Summer School: 2017 Patent Information and its Usefulness
UNH Law/WIPO Summer School: 2017 Patent Information and its Usefulness
 
mHealth Israel_ IP Strategy in China_Ehrlich & Fenster
mHealth Israel_ IP Strategy in China_Ehrlich & FenstermHealth Israel_ IP Strategy in China_Ehrlich & Fenster
mHealth Israel_ IP Strategy in China_Ehrlich & Fenster
 
Data Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk Management
 
PIIP_Patent search & Analysis_20160829
PIIP_Patent search & Analysis_20160829PIIP_Patent search & Analysis_20160829
PIIP_Patent search & Analysis_20160829
 
Pablo Benalcazar: Modern Tools on Patent Thicket Identification
Pablo Benalcazar: Modern Tools on Patent Thicket IdentificationPablo Benalcazar: Modern Tools on Patent Thicket Identification
Pablo Benalcazar: Modern Tools on Patent Thicket Identification
 
Advancing Global Innovation: The Role of PCT Practice and Strategy
Advancing Global Innovation: The Role of PCT Practice and Strategy Advancing Global Innovation: The Role of PCT Practice and Strategy
Advancing Global Innovation: The Role of PCT Practice and Strategy
 
R5 a報告
R5 a報告R5 a報告
R5 a報告
 
FITT Toolbox: Protection
FITT Toolbox: ProtectionFITT Toolbox: Protection
FITT Toolbox: Protection
 

More from Giovanna Roda

Distributed Computing for Everyone
Distributed Computing for EveryoneDistributed Computing for Everyone
Distributed Computing for EveryoneGiovanna Roda
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopGiovanna Roda
 
Introduction to Hadoop part 2
Introduction to Hadoop part 2Introduction to Hadoop part 2
Introduction to Hadoop part 2Giovanna Roda
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1Giovanna Roda
 
The need for new paradigms in IT services provisioning
The need for new paradigms in IT services provisioningThe need for new paradigms in IT services provisioning
The need for new paradigms in IT services provisioningGiovanna Roda
 
Apache Spark™ is here to stay
Apache Spark™ is here to stayApache Spark™ is here to stay
Apache Spark™ is here to stayGiovanna Roda
 
Chances and Challenges in Comparing Cross-Language Retrieval Tools
Chances and Challenges in Comparing Cross-Language Retrieval ToolsChances and Challenges in Comparing Cross-Language Retrieval Tools
Chances and Challenges in Comparing Cross-Language Retrieval ToolsGiovanna Roda
 

More from Giovanna Roda (7)

Distributed Computing for Everyone
Distributed Computing for EveryoneDistributed Computing for Everyone
Distributed Computing for Everyone
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Introduction to Hadoop part 2
Introduction to Hadoop part 2Introduction to Hadoop part 2
Introduction to Hadoop part 2
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1
 
The need for new paradigms in IT services provisioning
The need for new paradigms in IT services provisioningThe need for new paradigms in IT services provisioning
The need for new paradigms in IT services provisioning
 
Apache Spark™ is here to stay
Apache Spark™ is here to stayApache Spark™ is here to stay
Apache Spark™ is here to stay
 
Chances and Challenges in Comparing Cross-Language Retrieval Tools
Chances and Challenges in Comparing Cross-Language Retrieval ToolsChances and Challenges in Comparing Cross-Language Retrieval Tools
Chances and Challenges in Comparing Cross-Language Retrieval Tools
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Patent Search: An important new test bed for IR

  • 1. Patent Search: An important new test bed for IR J. Tait, M. Lupu1 H. Berger, G. Roda, M. Dittenbach, A. Pesenhofer2 E. Graf, K. van Rijsbergen3 1 InformationRetrieval Facility Vienna, Austria 2 Matrixware Vienna, Austria 3 University of Glasgow Dept. of Computing Science Glasgow, UK DIR 2009 / Feb. 2-3, 2009
  • 2. Patent Search. Patent search is a highly specialized form of information search. It is characterized by its target data type of information needs legal and economic implications
  • 3. Target data Data for patent retrieval comes mainly from: patent databases from patent authorities (EPO, USPTO, JPO, SIPO, WIPO, etc.) scientific publications prior art databases (IP.com) A new acronym SIPO: State Intellectual Property Office of the Peoples’ Republic of China
  • 4. Target data Characteristics of patent documents multilingual and ’legalese’ non uniform formats some are OCR’d figures, images, chemical formulas, DNA sequences include references to patent and non-patent literature A new acronym NPL: Non-Patent Literature
  • 5. Information Needs. K.H. Atkinson, Towards a more rational patent search paradigm: depending on what group is doing the asking, the types of patent search requested may include simple patentability, clearance to market a product, validity, opposition to a patent being sought by another, infringement watch, creating IP landscapes for business development or R&D, infringement defense, litigation, prosecution support, and creation of portfolios for assignments, investments, mergers and acquisitions [ . . . ]
  • 6. Legal and economic implications. patents are legal documents patent portfolios are assets for enterprises a single patent search can be worth several days of work High recall searches Missing even a single relevant document can have severe financial and economic impact. For example, when a granted patent becomes invalidated because of a document omitted at application time.
  • 7. Introduction Patent Search A modern IR test bed Promoting take up of research Conclusion We have characterized the patent search problem by describing its target data, types of information needs, legal and economic implications. Next: evaluating IR techniques in the patent domain previous initiatives in the area of patent retrieval the CLEF-IP and TREC-Chem initiatives promoting take-up of research Tait et al. Patent Search: An important new test bed for IR
  • 8. Test collections Test collections in Information Retrieval play a pivotal role in the evaluation of retrieval models. Domain-specific test collections already exist for: Web pages news stories legal documents blogs genomics patents
  • 9. Pioneering work in patent retrieval. Patent retrieval task at the NTCIR Workshop1 since 2001. produced test collections primarily targeting Japanese patents retrieval tasks ad-hoc (goal: find patents on a given topic) invalidity search (goal: find patents invalidating a given claim) patent classification according to the F-term system Two new acronyms F-term (abbreviation of File-forming term) is the classification system used in Japan as a complement to IPC (International Patent Classification) 1 http://research.nii.ac.jp/ntcir
  • 10. Evaluation tracks. The IRF has engaged in two pilot evaluation tracks on patent retrieval CLEF-IP www.ir-facility.org/the_irf/clef-ip09-track TREC-Chem www.ir-facility.org/the_irf/trec_chem.htm
  • 11. CLEF-Intellectual Property Initiative. CLEF-IP coordinated by the IRF part of the Cross-Language Evaluation Forum2 will focus on the task of prior art search European patents as target data automatic extraction of relevance assessments Prior art search Prior art search consists in identifying all information (including NPL) that might be relevant to a patent’s claim of novelty. 2 http://www.clef-campaign.org
  • 12. Prior art search. The most common type of patent search. Performed at various stages of the patent life-cycle and with different intentions: before filing an application (novelty search or patentability search) to determine whether the invention fulfills the requirements of novelty inventive step before grant - results go into a search report attached to patent invalidity search: post-grant search used to unveil prior art that invalidates a patent’s claims of originality
  • 13. Target data. The CLEF-IP evaluation track will restrict target data to patents. Target data: comprising 16 years (filing date between 1985 and 2000) of EPO patents 1.9 million patent documents corresponding to 1 million patents 75 GB, in XML format documents are in English, German, and French
  • 14. Automatic extraction of relevance assessments. The data resulting from prior art searches is saved in the EPO or USPTO databases as: citations in patent applications citations in search report citations in opposition’s legal files The CLEF-IP track is going to extract this information (as much as possible) automatically in order to form a large set of topics.
  • 15. Prior art from opposition procedures. According to the European patent law, a granted patent may be opposed. It is often the case that opponent provides new prior art that invalidates claim of originality of the invention. Patents cited in opposition procedures are very relevant prior art documents. They are the results of a very thorough invalidity search.
  • 16. Crowdsourcing extraction of relevance assessments. Need to extract citations from documents arising from opposition procedures These documents are only are available as scanned images3 Will be using crowdsourcing for extracting these citations. A new word from business jargon Crowdsourcing. 3 at http://www.epoline.org
  • 17. Relevance and evaluation measures. Labels used in search reports: label means that cited document is X relevant when taken alone Y relevant in combination with other documents A relevant but not prejudicial to novelty or inventive step How to use these labels for defining new evaluation measures?
  • 18. Challenges. As a result of the CLEF-IP track we expect to obtain new insights on: how to represent information need given by a patent query reformulation evaluation metrics for patent retrieval using machine translation for improving retrieval effectiveness
  • 19. TREC Chemistry track. Ad-hoc search Target data: academic papers (Royal Society of Chemistry) chemical patent documents (class C in the IPC) Will use automatic extraction of citations for relevance assessments Challenges: chemical names and structures chemical interactions, relations, transformations, properties
  • 20. Introduction Patent Search Pioneering work at NTCIR A modern IR test bed CLEF-IP Promoting take up of research TREC-Chem Conclusion The IRF is contributing to the creation of new patent test collections by organizing two tracks within the CLEF and TREC evaluation campaigns. In addition to the TREC and CLEF contributions, the IRF, together with Matrixware, is promoting several initiatives aimed at facilitating and improving the patent retrieval process. Tait et al. Patent Search: An important new test bed for IR
  • 21. Introduction The IRF Patent Search Matrixware A modern IR test bed Promoting research Promoting take up of research Providing the tools Conclusion Current University Projects Promoting take up of research Next: presentation of the IRF and Matrixware promoting take up of research the IRF symposium the PaIR workshop providing the tools funding research in the area of patent retrieval Tait et al. Patent Search: An important new test bed for IR
  • 22. IRF: the Information Retrieval Facility. New international not-for-profit foundation, based in Vienna, Its mission: to bridge the gap between the needs of the industry and the academic know-how to promote and facilitate research in large scale information retrieval maintain a facility that enables large scale information retrieval and in-depth data processing
  • 23. Matrixware. Founded 2005 in Vienna 80 Employees > 15 Academic Partners Worldwide Implements solutions for access to patent information
  • 24. Promoting research. Matrixware and the IRF have engaged in several initiatives aimed at promoting research and raising awareness in the area of patent retrieval. the Information Retrieval Facility Symposium an annual symposium held in Vienna to foster knowledge exchange between IR experts and IP professionals the PaIR workshop a workshop on Patent Information retrieval hosted by the CIKM conference
  • 25. Providing the tools. Successful IR research conventionally depends on three elements: 1 the availability of test collections 2 access to suitable software systems on which to run experiments 3 access to sufficiently powerful hardware The IRF, supported by Matrixware, is providing all three of these.
  • 26. Current University Projects. Accessibility of Information (Glasgow) Large Scale Logical Retrieval (Glasgow) Semantic Analysis of Patent Data (Sheffield and Nijmegen) Language Modeling for Patent Retrieval (Umass Amherst) OCR for patents (Umass Amherst)
  • 27. Concluding remarks Patent retrieval is an interesting and important open challenge for IR researchers. The IRF and Matrixware have engaged in several projects aimed at promoting research in this area.
  • 28. Introduction Patent Search Concluding remarks A modern IR test bed Invitation Promoting take up of research Closing Conclusion Invitation. You are invited to: join one of the evaluation tracks CLEF-IP TREC-Chem participate in the PaIR workshop participate in the Information Retrieval Facility Symposium Tait et al. Patent Search: An important new test bed for IR
  • 29. Thank you for your attention.