SlideShare ist ein Scribd-Unternehmen logo
1 von 13
current

  BHL



  ubio.org
future
process

           disambiguate /
identify                    ID lookup
              reconcile
process

                   disambiguate /
  identify                               ID lookup
                      reconcile


Mature & scalable, well defined and standardized
process

                    disambiguate /
identify                                  ID lookup
                       reconcile


           in progress, needs API & standard
process

                  disambiguate /
identify                                  ID lookup
                     reconcile


           GNI has API, needs standards
current response
<entity>
	 	 <nameString>Abietineae</nameString>
	 	 <namebankID>8401003</namebankID>
	 	 <weblinks>
	 	 	 <website>
	 	 	 	 <title>Tropicos</title>
	 	 	 	 <link>http://mobot.mobot.org/W3T/Search/vast.html</link>
	 	 	 	 <logo>http://names.ubio.org/tools/image/tropicos.png</logo>
	 	 	 	 <links>
	 	 	 	 	 <link nameString="Abietineae Eichler">http://
mobot.mobot.org/cgi-bin/search_vast?onda=N50205444</link>
	 	 	 	 </links>
	 	 	 </website>
	 	 </weblinks>
	 </entity>
issues
the TF API is doing jobs it shouldn’t do..

Namebank is a large but outdated dataset

“taxonfinder” has no idea what a namebank ID actually is, it only knows strings

current code is completely dependent on www.ubio.org and is not scalable
why change?
scaling - we can run 10,000 taxonfinding processes using any algorithm
that supports the standard. Super fast indexing of BHL

future-proofing for devs - any new namefinding tool can take advantage
of the API and doesn’t need to write a webservice or API of it’s own

future-proofing for BHL - any new namefinding tool can be added with
one parameter
(&client=taxonfinder | &client=neti)

reliability - existing TF API goes down when Rod runs a screen scraping
tool on ubio.org.
new API spec
API specs
Request
input (string)
type (text , url)
format (xml=default, json)
Response
XML Response
A response example that corresponds to the xml schema:
<names xmlns="http://globalnames.org/namefinder" xmlns:dwc="http://rs.tdwg.org/dwc/terms/">
  <name>
    <verbatim>T. rotundata</verbatim>
    <dwc:scientificName>Tillandsia rotundata</dwc:scientificName>
    <!--   0-100   -->
    <score>100</score>
    <offset start="4550" end="4573" />
  </name>
</names>
New API
you give us text, we give you strings and offsets. This is the limit of
what a “namefinding” tool can and should do

separately you also need IDs.. Namebank, EOL, tropicos, gn*, GBIF...

once you know Mus musculus is EOL ID “9872332” you don’t need to know
that again. If a book on mice has 40,000 instances of Mus musculus, you
need to know where they are, but not the NameBank ID 40,000 times..
(this is a scaling problem..)



Where do we get these? GNI has 19.3m names & IDs.
issues

misspellings etc need to be “reconciled”

this definitely isn’t the job of a name finding tool
next?
      we could make a tool that hacks together IDs and names..
                ... but that’s not dev time well spent

we could participate in a process to check off the latter two categories
            of the name finding -> ID resolution process
                             ... yes we can


                  Let’s make a spec, build some APIs.


                    silver lining - we can start now

Weitere ähnliche Inhalte

Andere mochten auch (10)

Dog Breeds
Dog BreedsDog Breeds
Dog Breeds
 
Devops @ Woods Hole Informatics talks
Devops @ Woods Hole Informatics talksDevops @ Woods Hole Informatics talks
Devops @ Woods Hole Informatics talks
 
Cu00927 c gestion excepciones java try catch finally ejemplos ejercicios
Cu00927 c gestion excepciones java try catch finally ejemplos ejerciciosCu00927 c gestion excepciones java try catch finally ejemplos ejercicios
Cu00927 c gestion excepciones java try catch finally ejemplos ejercicios
 
Formulas en excel
Formulas en excelFormulas en excel
Formulas en excel
 
Woodpeckers
WoodpeckersWoodpeckers
Woodpeckers
 
Presentation about the Master of Science: Communication Technologies, Systems...
Presentation about the Master of Science: Communication Technologies, Systems...Presentation about the Master of Science: Communication Technologies, Systems...
Presentation about the Master of Science: Communication Technologies, Systems...
 
Aforismos
AforismosAforismos
Aforismos
 
Modulando nuestro oscilador_de_radiofrecuencia
Modulando nuestro oscilador_de_radiofrecuenciaModulando nuestro oscilador_de_radiofrecuencia
Modulando nuestro oscilador_de_radiofrecuencia
 
Oscilador de radiofrecuencia
Oscilador de radiofrecuenciaOscilador de radiofrecuencia
Oscilador de radiofrecuencia
 
Practicando morse con_nuestro_oscilador_de_radiofrecuencia
Practicando morse con_nuestro_oscilador_de_radiofrecuenciaPracticando morse con_nuestro_oscilador_de_radiofrecuencia
Practicando morse con_nuestro_oscilador_de_radiofrecuencia
 

Ähnlich wie Scaling Namefinding

Get your Hero Groove On - Heroes Reborn
Get your Hero Groove On - Heroes RebornGet your Hero Groove On - Heroes Reborn
Get your Hero Groove On - Heroes RebornCaleb Jenkins
 
A tale of two proxies
A tale of two proxiesA tale of two proxies
A tale of two proxiesSensePost
 
Persistently identifying website content
Persistently identifying website contentPersistently identifying website content
Persistently identifying website contentAndy Powell
 
SADI SWSIP '09 'cause you can't always GET what you want!
SADI SWSIP '09  'cause you can't always GET what you want!SADI SWSIP '09  'cause you can't always GET what you want!
SADI SWSIP '09 'cause you can't always GET what you want!Mark Wilkinson
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Building Event-driven Serverless Applications
Building Event-driven Serverless ApplicationsBuilding Event-driven Serverless Applications
Building Event-driven Serverless ApplicationsAmazon Web Services
 
Backend as a Service
Backend as a ServiceBackend as a Service
Backend as a Serviceapiomat
 
Implementing Authorization
Implementing AuthorizationImplementing Authorization
Implementing AuthorizationTorin Sandall
 
API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards 25-6-2014API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards 25-6-2014openi_ict
 
API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards   25-6-2014API Athens Meetup - API standards   25-6-2014
API Athens Meetup - API standards 25-6-2014Michael Petychakis
 
Using Semantics to personalize medical research
Using Semantics to personalize medical researchUsing Semantics to personalize medical research
Using Semantics to personalize medical researchMark Wilkinson
 
Open Source Information Gathering Brucon Edition
Open Source Information Gathering Brucon EditionOpen Source Information Gathering Brucon Edition
Open Source Information Gathering Brucon EditionChris Gates
 
Alitora Innovation Networks
Alitora Innovation NetworksAlitora Innovation Networks
Alitora Innovation Networksalitora
 
Advanced Web Development
Advanced Web DevelopmentAdvanced Web Development
Advanced Web DevelopmentRobert J. Stein
 
Prophet - Beijing Perl Workshop
Prophet - Beijing Perl WorkshopProphet - Beijing Perl Workshop
Prophet - Beijing Perl WorkshopJesse Vincent
 
API's - Successes to Replicate. Pitfalls to Avoid.
API's - Successes to Replicate. Pitfalls to Avoid.API's - Successes to Replicate. Pitfalls to Avoid.
API's - Successes to Replicate. Pitfalls to Avoid.Inman News
 
How I Built Bill, the AI-Powered Chatbot That Reads Our Docs for Fun , by Tod...
How I Built Bill, the AI-Powered Chatbot That Reads Our Docs for Fun , by Tod...How I Built Bill, the AI-Powered Chatbot That Reads Our Docs for Fun , by Tod...
How I Built Bill, the AI-Powered Chatbot That Reads Our Docs for Fun , by Tod...Nordic APIs
 
2012 03 27_philly_jug_rewrite_static
2012 03 27_philly_jug_rewrite_static2012 03 27_philly_jug_rewrite_static
2012 03 27_philly_jug_rewrite_staticLincoln III
 

Ähnlich wie Scaling Namefinding (20)

Get your Hero Groove On - Heroes Reborn
Get your Hero Groove On - Heroes RebornGet your Hero Groove On - Heroes Reborn
Get your Hero Groove On - Heroes Reborn
 
A tale of two proxies
A tale of two proxiesA tale of two proxies
A tale of two proxies
 
Persistently identifying website content
Persistently identifying website contentPersistently identifying website content
Persistently identifying website content
 
SADI SWSIP '09 'cause you can't always GET what you want!
SADI SWSIP '09  'cause you can't always GET what you want!SADI SWSIP '09  'cause you can't always GET what you want!
SADI SWSIP '09 'cause you can't always GET what you want!
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Building Event-driven Serverless Applications
Building Event-driven Serverless ApplicationsBuilding Event-driven Serverless Applications
Building Event-driven Serverless Applications
 
Backend as a Service
Backend as a ServiceBackend as a Service
Backend as a Service
 
Implementing Authorization
Implementing AuthorizationImplementing Authorization
Implementing Authorization
 
API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards 25-6-2014API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards 25-6-2014
 
API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards   25-6-2014API Athens Meetup - API standards   25-6-2014
API Athens Meetup - API standards 25-6-2014
 
Using Semantics to personalize medical research
Using Semantics to personalize medical researchUsing Semantics to personalize medical research
Using Semantics to personalize medical research
 
Yahoo for the Masses
Yahoo for the MassesYahoo for the Masses
Yahoo for the Masses
 
Open Source Information Gathering Brucon Edition
Open Source Information Gathering Brucon EditionOpen Source Information Gathering Brucon Edition
Open Source Information Gathering Brucon Edition
 
Walter api
Walter apiWalter api
Walter api
 
Alitora Innovation Networks
Alitora Innovation NetworksAlitora Innovation Networks
Alitora Innovation Networks
 
Advanced Web Development
Advanced Web DevelopmentAdvanced Web Development
Advanced Web Development
 
Prophet - Beijing Perl Workshop
Prophet - Beijing Perl WorkshopProphet - Beijing Perl Workshop
Prophet - Beijing Perl Workshop
 
API's - Successes to Replicate. Pitfalls to Avoid.
API's - Successes to Replicate. Pitfalls to Avoid.API's - Successes to Replicate. Pitfalls to Avoid.
API's - Successes to Replicate. Pitfalls to Avoid.
 
How I Built Bill, the AI-Powered Chatbot That Reads Our Docs for Fun , by Tod...
How I Built Bill, the AI-Powered Chatbot That Reads Our Docs for Fun , by Tod...How I Built Bill, the AI-Powered Chatbot That Reads Our Docs for Fun , by Tod...
How I Built Bill, the AI-Powered Chatbot That Reads Our Docs for Fun , by Tod...
 
2012 03 27_philly_jug_rewrite_static
2012 03 27_philly_jug_rewrite_static2012 03 27_philly_jug_rewrite_static
2012 03 27_philly_jug_rewrite_static
 

Kürzlich hochgeladen

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Scaling Namefinding

  • 1. current BHL ubio.org
  • 3. process disambiguate / identify ID lookup reconcile
  • 4. process disambiguate / identify ID lookup reconcile Mature & scalable, well defined and standardized
  • 5. process disambiguate / identify ID lookup reconcile in progress, needs API & standard
  • 6. process disambiguate / identify ID lookup reconcile GNI has API, needs standards
  • 7. current response <entity> <nameString>Abietineae</nameString> <namebankID>8401003</namebankID> <weblinks> <website> <title>Tropicos</title> <link>http://mobot.mobot.org/W3T/Search/vast.html</link> <logo>http://names.ubio.org/tools/image/tropicos.png</logo> <links> <link nameString="Abietineae Eichler">http:// mobot.mobot.org/cgi-bin/search_vast?onda=N50205444</link> </links> </website> </weblinks> </entity>
  • 8. issues the TF API is doing jobs it shouldn’t do.. Namebank is a large but outdated dataset “taxonfinder” has no idea what a namebank ID actually is, it only knows strings current code is completely dependent on www.ubio.org and is not scalable
  • 9. why change? scaling - we can run 10,000 taxonfinding processes using any algorithm that supports the standard. Super fast indexing of BHL future-proofing for devs - any new namefinding tool can take advantage of the API and doesn’t need to write a webservice or API of it’s own future-proofing for BHL - any new namefinding tool can be added with one parameter (&client=taxonfinder | &client=neti) reliability - existing TF API goes down when Rod runs a screen scraping tool on ubio.org.
  • 10. new API spec API specs Request input (string) type (text , url) format (xml=default, json) Response XML Response A response example that corresponds to the xml schema: <names xmlns="http://globalnames.org/namefinder" xmlns:dwc="http://rs.tdwg.org/dwc/terms/">   <name>     <verbatim>T. rotundata</verbatim>     <dwc:scientificName>Tillandsia rotundata</dwc:scientificName>     <!--   0-100   -->     <score>100</score>     <offset start="4550" end="4573" />   </name> </names>
  • 11. New API you give us text, we give you strings and offsets. This is the limit of what a “namefinding” tool can and should do separately you also need IDs.. Namebank, EOL, tropicos, gn*, GBIF... once you know Mus musculus is EOL ID “9872332” you don’t need to know that again. If a book on mice has 40,000 instances of Mus musculus, you need to know where they are, but not the NameBank ID 40,000 times.. (this is a scaling problem..) Where do we get these? GNI has 19.3m names & IDs.
  • 12. issues misspellings etc need to be “reconciled” this definitely isn’t the job of a name finding tool
  • 13. next? we could make a tool that hacks together IDs and names.. ... but that’s not dev time well spent we could participate in a process to check off the latter two categories of the name finding -> ID resolution process ... yes we can Let’s make a spec, build some APIs. silver lining - we can start now

Hinweis der Redaktion

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n