Presentation of Georg Schiwi, Documentation Information Manager at the European Patent Office.
The EPO holds one of the largest digital repositories of public knowledge in the world. This vast store is accessed daily by thousands of users and its usage is constantly increasing. Each year about 40 Terabytes, the equivalent of 40 million books, are downloaded from the EPO search collection both by internal and external users. This figure is a perfect illustration of EPO‘s unique contribution to the knowledge economy. The presentation will give an overview on the patent and non-patent collection that is used by examiners for prior-art search. In a second part, the move from a paper documentation collection to an electronic one and the particular challenges in this process will be outlined.
The EPO document collection:A technical treasure chest
1. The EPO document collection: A technical treasure chest Georg Schiwy Directorate Information Acquisition 06 June 2008
2.
3. Documentation at EPO - Blessing or Curse ? > 371 million records in 117 databases > 78.8 million Patent and NPL facsimile > 66 million unique Patent documents > 57 million Patent abstracts > 21.7 million full text Patent documents > 3,5 million full text NPL documents > 6,000 NPL titles and growing daily.... The EPO document collection Patents and documentation
4. Patent acquisition approach Bibliographic data: extended bibliographic data Full text: searchable full text, one patent document in an official language Title: searchable Drawing Abstract: - original language - English language Image: facsimile The EPO document collection Documentation overview
6. Patent data quality requirements "Global Patent Data Coverage" On Internet at the following address: http://www.epo.org/gpdc The EPO document collection Managing and maintaining our data Data Quality Timeliness Correctness Completeness
7.
8. EPO Non-Patent Literature (NPL) Resources Databases of Secondary publishers INSPEC,COMPDX,BIOSIS, MEDLINE,IHS... The EPO document collection Managing and maintaining our data Standards Books, Thesis, Technical reports, Monographs Journals Conference Proceedings Company Disclosures Encyclopaedias, Dictionaries
11. Patent Classification C12N15/82 A Further divided in 30 fields in ECLA C C12 C12N C12N15 71 000 subgroups (IPC) 8 Sections (A…H) Class Subclass Main group The EPO document collection Classification
12. Patent classification - Example The EPO document collection Classification Parking space problem:
15. Access and Tools: The Patent Granting Workbench Yesterday The EPO document collection The patent granting workbench
16. Access and Tools: The Patent Granting Workbench Today: SEA The EPO document collection The patent granting workbench
17. Access and Tools: The Patent Granting Workbench Today: Chemical formulas The EPO document collection The patent granting workbench
18. Sequence data capture Access and Tools: The Patent Granting Workbench Today: Sequence data capture The EPO document collection The patent granting workbench
19. Access and Tools: The Patent Granting Workbench Early OCR Input Output New scanned applications Structured text XML / PDF EPOQUE Machine translation Today: Early OCR Pre- classification Examination The EPO document collection The patent granting workbench OCR Conversion Quality Control Storage
20. Access and Tools: The Patent Granting Workbench Today: Machine Translation German - English - French - Spanish The EPO document collection The patent granting workbench
21.
22. Added value: "intelligent" patent document collection Bibliographic data Original abstract Facsimile Images Full Text Original title Classification Abstract EN Title EN Citations Sequences Numerical Values extraction Machine Translation Text summariser Chemical formulae Flow Chart searches targeted routing The EPO document collection The patent granting workbench
24. How to get from A to B? The EPO document collection The Paperless project A B
25.
26.
27.
28.
29. Project progress: 22 million documents in 6 years The EPO document collection The Paperless project
30. Status - Individual scanned documents The EPO document collection The Paperless project
31.
32.
33. Corrections - Example: GB The EPO document collection The Paperless project should be GB191222653
34.
35. "The EPO is promoting a knowledge-based society in Europe as one of the world’s leading providers of technical information." Thank you for your attention! Any questions? Georg Schiwy Information Acquisition [email_address]