The document describes Biothings.api, a framework for building biomedical APIs that interact with Elasticsearch. It generalizes code from existing APIs like MyGene and MyVariant. The framework includes handlers for common HTTP requests, classes for constructing Elasticsearch queries, and a project template to easily set up new API projects using the framework. The utility of the framework is demonstrated by rebuilding the MyVariant.info API using it with only a small amount of additional code required.
2. Motivation
• Isolate the common aspects of MyGene and
MyVariant codebases and make them
available in a separate framework:
biothings.api
• Allows easier development of additional
biothings APIs (Disease, Drug/Chemical, GO,
Species… -> JSON, aggregate on a single field)
• Allows easier maintenance and development
of current biothings (gene, variant).
3. System Overview
• The tornado HTTP server consists of handlers that contain the code to run
when a particular URL pattern is matched, e.g. /variant/, or /metadata
• The biothing codebase essentially contains the connection between the
appropriate Tornado HTTP Request Handler for a request and the elasticsearch
query that executes that request
4. Biothings – HTTP Handling
• tornado.web.RequestHandler: base tornado class for HTTP request handling. Important class methods:
get/post, get_arguments, write
• biothings.www.helper.BaseHandler: contains methods common to all biothings RequestHandlers.
Important class methods: get_query_params, return_json
• biothings.www.api.handlers.QueryHandler: contains methods to implement the biothings query
endpoint. Important class methods: get, post, _examine_kwargs
• biothings.www.api.handlers.BiothingHandler: contains methods to implement the biothings annotation
endpoint. Important class methods: get, post, _examine_kwargs
• biothings.www.api.handlers.MetaDataHandler: contains methods to implement the metadata endpoint
• biothings.www.api.handlers.StatusHandler: contains methods to implement a status endpoint for AWS
ELB
5. Biothings – HTTP Handling
• biothings.www.api.handlers.BiothingHandler:
– GET request (e.g. /variant/chr6:g.152708291G>A)
– POST request (e.g. /variant/)
6. Biothings – HTTP Handling
• biothings.www.api.handlers.QueryHandler:
– GET request (e.g. /query?q=_exists_:dbsnp)
– POST request (e.g. /query/)
7. Biothings – Elasticsearch query
• biothings.www.api.es.ESQuery – contains the python code
for constructing the elasticsearch query and formatting the resulting data
– query(q, **kwargs) – Contains the elasticsearch query to run with data obtained from a
GET or POST to the /query/ endpoint.
– get_biothing(bid, **kwargs) – Contains the elasticsearch query to run with data
obtained from a GET to the /annotation/ endpoint.
– mget_biothings(bid_list, **kwargs) – Contains the elasticsearch query to run with data
obtained from a POST to the /annotation/ endpoint.
– _cleaned_res(res) – Contains the code to format the return object for get_biothing and
mget_biothings.
– _cleaned_res2(res) – Contains the code to format the return object for query.
– _get_biothingdoc(hit) – Contains the code to format a single biothing object from any
elasticsearch query. Called by _cleaned_res and _cleaned_res2.
– _modify_biothingdoc(doc) – Contains the code to modify a biothing_doc. Called in
_get_biothingdoc. Currently empty -> for subclassing.
8. Biothings - Settings
• Problem: Until now, we have left out the problem of how to
refer to things that MUST be project specific (e.g., the name
of the elasticsearch index to search, the type of the
document, etc). How do we do this?
• Solution: We make a settings module in biothings that all
code within biothings refers to. That module looks for an
environment variable called BIOTHING_SETTINGS with the
name of a module that can be imported to set project specific
variables.
– export BIOTHING_SETTINGS = ‘biothings.config’
• Similar to Django.
10. Biothings – Project template
• At this point, we have the tools necessary to easily create and
subclass 4 types of biothings handlers (BiothingHandler,
QueryHandler, MetaDataHandler, StatusHandler), and the
elasticsearch query class (ESQuery)
• Could definitely stop here and have a useful tool, but we
wanted to make it even easier to create a new project (also
enforces a uniform project structure across all biothings APIs).
• To do this we have a project template folder containing the
project directory structure and some skeleton code:
– config.py,
– URL patterns to Handlers connection
– Handlers to ESQuery connection
11. Biothings - Project template
• To create the actual project directory from the
template, we wrote a small function: start-project.py
– Usage: python start-project.py <path-to-project-
directory> <biothing-object-name>
– python start-project.py ~ variant
• Any folder or file in the template directory will be
created in the project directory. The contents of any
file are passed through the python String.template
function before they are created in the project
directory.
18. Recreating MyVariant.info using biothings.api
• Recreated current MyVariant.info service using the
biothings.api framework
– Very little extra code required (~100 lines)
– Less than a day of time to create the web front end from start.
– https://github.com/cyrus0824/myvariant.info_new
• Seems disingenuous to gauge the utility of a tool by recreating
a codebase if that tool was itself created from the codebase
=> Should try implementing other APIs, especially
MyGene.info (has more varied gene specific query options),
and modify biothings as needed.
19. Future work
• Integrate data load and data index functions into
biothings
• Documentation! – Projects like this need very good
documentation to be of any use to an API developer
(on the level of tornado’s excellent documentation:
http://www.tornadoweb.org/en/stable/web.html)
• Auto-generate clients (python client, R client)
• Auto-generate ansible-playbook to create cluster
hardware on AWS
• One-click API…