David Wobrock presented on Botify's custom 'Big Data' JSON DSL API. The API allows customers to query, join, and aggregate multiple datasets containing SEO data through a custom JSON query language. This unified API approach enables growth by giving customers access to Botify's full dataset to address business needs. Key lessons learned include the steep learning curve for the custom DSL, the importance of monitoring API usage, and improving developer relations to support API integration and adoption.
2. Who am I?
- French/German living in Paris
- Senior Lead API Engineer @ Botify
- French Tech Start-up in Paris
- Worked a bit around APIs
https://twitter.com/davidwobrock
https://github.com/David-Wobrock
https://www.linkedin.com/in/david-wobrock/ 2
3. Plan
1. What’s a “custom ‘Big Data’ JSON DSL API”?
2. Why this choice? How it is enabling growth
3. Architecture
4. Learned lessons
3
4. ● custom => in-house implementation
● ‘Big Data’ => query, join and aggregate multiple
datasets, each containing millions, or even
billions, of rows
● JSON DSL => the APIs payload is a JSON that
must contain certain keys we decided on - this is
our Domain-Specific Language (DSL)
Modestly named Botify Query Language (BQL)
● API => a REST-like server endpoint
4
Definition
custom ‘Big Data’ JSON DSL API
5. In order to understand the example:
Botify is a technical SEO platform.
Search Engine Optimization
We ingest a lot of SEO data, to aggregate that and
provide insights and automations around that.
5
Context about Botify
6. What does this DSL look like?
Works in the manner of an
analytics tool.
What data sources do I need?
On what timeframe?
Define aggregated metrics by
a set of dimensions.
Allows filtering and sorting.
6
8. 8
Interesting design, but why doing this?
SEO data is a bit complex and large
- Hundreds of millions of crawled and analyzed URLs for one website
- Billions of HTTP log lines URLs
- Millions of 3rd party events (Google Analytics, Google Search Console, …)
=> complex datamodel (> 1000 metrics)
=> rather large tables
What are the business requirements going with that large data?
9. 9
Our added value
What enables SEO growth for our customers is the data.
We need to give back the data to the customers.
This is done through an API.
In a raw form, but also an enhanced version. Joining different datasets to get more value.
Our goals:
- make the entire raw datamodel available
- allow crossing the different datasets
- be able to serve data is a reasonable time
10. Enabling Growth For our customers
Allow through one API to get the entire
datasets they could need.
Allow expressing any business use case.
=> unique and powerful tool that does the job
10
11. 11
Crawl & analyze
50 millions pages
Ingest 250 millions rows
HTTP Apache servers
Fetch 70 millions keywords
used to access the website
Retrieve 100 millions visits
from Analytics integration
What product pages gained most
traffic since my website migration?
What pages are generating most
traffic but are badly linked in my
site structure?
What is the average loading time of
the pages that were access through
brand keywords on Google?
13. Tackling the technical challenge
One API to rule them all 💍👁 (at Botify, not all APIs)
- One unified and abstract interface for all SEO data access
=> ability to introspect the query and understand the use case
- Store data in different backends, denormalize and precompute
=> be able to choose the most efficient database depending on the use case
13
14. Optimising queries?
Oh yes please!
Optimisations goes in two ways:
1. Performance => be quick
2. Cost => be cheap (enough)
Best of both worlds!
14
15. Let’s see how it works
Roll up your sleeves, we are diving deep! 🤿
15
BQL JSON
Auth,
permissions...
Schemas
Query Parsing
Parsing Infos
Query
introspection
Backend
transformer(s)
Backend
connection(s)
Backend specific
query(ies) (SQL?)
Backend(s)
Result
transformation
16. Single Point of Truth
Our own application is one large API user.
=> a lot of use cases are covered and the API functional aspects are battle-tested.
Devs, Support and Customer Service speak the same language.
We are able to grow and focus efforts on one central piece of software:
- increase maintainability and share ownership
- add feature and improve performances
16
17. Cube.JS
Open Source Analytical API Platform - https://cube.dev/
- Query language is very similar
- Supports many backends
- Defines schemas
Very similar!
Differences?
17
19. Learning curve
Customers but also internally, onboarding is no easy task.
A language/DSL and multiple concepts have to be learned to be able to fully exploit BQL.
Mitigation:
- internal and customer trainings, also by developers
- documentation with many examples
19
20. Monitoring/tooling is key
When having one API that everybody calls, monitoring is key.
One must understand how the API is used, and by whom
Avoid breaking the API for used cases.
Monitor exactly what calls are made, and have a specifically-tailored monitoring of all your API calls.
That way, you can have fine-grained reporting about usage, backends and response time evolution.
20
21. Developer Relations
Often, our customers want to connect with this API directly.
There, with the steep learning curve, we need to provide support and improve the Developer
Relations.
Developers will integrate our API into the customers workflows and BI/analytics tools.
Creating a synergy around the API and exchanges is crucial for adoption.
21
22. Non-trivial approach to an API makes our customers
use a custom DSL to:
● match business needs
● match technical needs
We allow our clients to get SEO insights of their
business, and try to smooth the learning curve, so
that they get value as fast as possible.
22
Summary