As a Node.js developer chances are you’ve had to tackle the issue of how to accept filter criteria from user input in your web APIs. This seemingly mundane problem can quickly become quite complicated. How do you ingest this input safely? How do you ensure this does not become a leaky abstraction? How do you handle complex filter graphs while minimizing the amount of boilerplate you have to write and maintain?
In this talk, we’ll explore several options for accepting filter criteria in web APIs, discuss their pros and cons, and present a new tool for solving this issue.
6. Agenda
• Web API Filtering
• Common Approaches
• Challenges
• A New Tool
7. Introducing spleen
A dynamic filter expression dialect, library,
and toolset.
(...because finding available names on NPM is an exercise in futility)
8. Agenda
• Web API Filtering
• Common Approaches
• Challenges
• A New Tool
15. Common Approaches
Query String Parameters with Custom Operators
GET api.somehrms.com/v1/employees?managerId=eq:2
Equal To
GET api.somehrms.com/v1/employees?title=neq:Physicist
Not Equal To
GET api.somehrms.com/v1/employees?salary=gt:30000
Greater Than
GET api.somehrms.com/v1/employees?age=lte:40
Less Than Equal To
GET api.somehrms.com/v1/employees?name=like:E*
Like Pattern
16. Common Approaches
Query String Parameters with Custom Operators
What about conjunctions?
managerId == 2 AND salary >= 30000 OR name like “E*”
17. Common Approaches
Query String Parameters with Custom Operators
GET api.somehrms.com/v1/employees
?managerId=eq:2
&salary=and:gte:30000
&name=or:like:E*
18. Common Approaches
Query String Parameters with Custom Operators
GET api.somehrms.com/v1/employees
?managerId=eq:2
&salary=and:gte:30000
&name=or:like:E*
managerId == 2 AND salary >= 30000 OR name like “E*”
salary >= 30000 AND managerId == 2 OR name like “E*”
name like “E*” OR managerId == 2 AND salary >= 30000
19. Common Approaches
Query String Parameters with Custom Operators
GET api.somehrms.com/v1/employees
?managerId=eq:2
&salary=and:gte:30000
&name=or:like:E*
managerId == 2 AND salary >= 30000 OR name like “E*”
salary >= 30000 AND managerId == 2 OR name like “E*”
name like “E*” OR managerId == 2 AND salary >= 30000
managerId == 2 OR name like “E*” AND salary >= 30000
20. Common Approaches
Query String Parameter with SQL Query
GET api.somehrms.com/v1/employees
?filter=managerId=2+AND+salary>=30000+OR+name+like+”E%25”
21. Common Approaches
Query String Parameter with SQL Query
GET api.somehrms.com/v1/employees
?filter=managerId=2+AND+salary>=30000+OR+name+like+”E*”
• Leaks implementation details
• Unsafe
23. Common Approaches
Off-the-Shelf Architectures
• GraphQL
• Falcor
• OData
----------------------------------------------------------------------------------------
• A LOT more than just filtering collections!
24. Common Approaches
Off-the-Shelf Architectures
• GraphQL
• Falcor
• OData
----------------------------------------------------------------------------------------
• A LOT more than just filtering collections!
• Legacy systems?
25. Common Approaches
Off-the-Shelf Architectures
• GraphQL
• Falcor
• OData
----------------------------------------------------------------------------------------
• A LOT more than just filtering collections!
• Legacy systems?
• Opinionated
26. Common Approaches
Off-the-Shelf Architectures
• GraphQL
• Falcor
• OData
----------------------------------------------------------------------------------------
• A LOT more than just filtering collections!
• Legacy systems?
• Opinionated
• Non-trivial to implement
32. Challenges
• Robustness
Different comparison operators
Conjunctive (AND) and disjunctive (OR) logical operators
Logical groups
• Proper abstraction
33. Challenges
• Robustness
Different comparison operators
Conjunctive (AND) and disjunctive (OR) logical operators
Logical groups
• Proper abstraction
• Idiomatic
34. Challenges
• Robustness
Different comparison operators
Conjunctive (AND) and disjunctive (OR) logical operators
Logical groups
• Proper abstraction
• Idiomatic
• Opinions
35. Challenges
• Robustness
Different comparison operators
Conjunctive (AND) and disjunctive (OR) logical operators
Logical groups
• Proper abstraction
• Idiomatic
• Opinions
• Validation
36. Challenges
• Robustness
Different comparison operators
Conjunctive (AND) and disjunctive (OR) logical operators
Logical groups
• Proper abstraction
• Idiomatic
• Opinions
• Validation
• Vector for SQL injection attack?
37. Challenges
• Robustness
Different comparison operators
Conjunctive (AND) and disjunctive (OR) logical operators
Logical groups
• Proper abstraction
• Idiomatic
• Opinions
• Validation
• Vector for SQL injection attack?
• Vector for DoS’ing the database?
Lots of expensive comparisons against non-indexed fields
Inefficient ordering of clauses
38. Challenges
• Robustness
Different comparison operators
Conjunctive (AND) and disjunctive (OR) logical operators
Logical groups
• Proper abstraction
• Idiomatic
• Opinions
• Validation
• Vector for SQL injection attack?
• Vector for DoS’ing the database?
Lots of expensive comparisons against non-indexed fields
Inefficient ordering of clauses
• Complexity
39. Agenda
• Web API Filtering
• Common Approaches
• Challenges
• A New Tool
40. Introducing spleen
A dynamic filter expression dialect, library,
and toolset.
(...because finding available names on NPM is an exercise in futility)
41. Introducing spleen
A dynamic filter expression dialect, library,
and toolset.
(...because finding available names on NPM is an exercise in futility)
43. Goals for the spleen Dialect
• Human readable
• Terse
44. Goals for the spleen Dialect
• Human readable
• Terse
• Reference complex structures (nested JSON objects)
45. Goals for the spleen Dialect
• Human readable
• Terse
• Reference complex structures (nested JSON objects)
• Support for a variety of common comparisons
46. Goals for the spleen Dialect
• Human readable
• Terse
• Reference complex structures (nested JSON objects)
• Support for a variety of common comparisons
• Conjunctive and disjunctive logical operators
47. Goals for the spleen Dialect
• Human readable
• Terse
• Reference complex structures (nested JSON objects)
• Support for a variety of common comparisons
• Conjunctive and disjunctive logical operators
• Logical grouping
48. Goals for the spleen Dialect
• Human readable
• Terse
• Reference complex structures (nested JSON objects)
• Support for a variety of common comparisons
• Conjunctive and disjunctive logical operators
• Logical grouping
• Works in a query string parameter
49. The spleen Dialect
Field references are JSON pointers (RFC 6901)
/foo/bar/0
{
foo: {
bar: [‘a‘, ‘b‘, ‚‘c‘]
}
}
Result: ‘a‘
50. The spleen Dialect
Comparison operators:
eq: equal to
neq: not equal to
gt: greater than
gte: greater than or equal to
lt: less than
lte: less than or equal to
between: value is greater than and equal to x by less than or equal to y
nbetween: value is less than x or greater than y
in: value is in an array of values
nin: value is not in an array of values
like: string value is like the given pattern
nlike: string value is not like the given pattern
51. The spleen Dialect
Logical operators:
and: conjunctive logical operator
or: disjunctive logical operator
(: open logical group
): close logical group
52. The spleen Dialect Examples
/foo eq 42
/foo/bar gt 42
/foo eq 42 and /bar/baz between 0,500
/foo eq 42
and (/bar/baz nbetween 0,500 or /qux like “_abc*”)
and (/quux in [1,2.3] or /corge gte 312)
53. Introducing spleen
A dynamic filter expression dialect, library,
and toolset.
(...because finding available names on NPM is an exercise in futility)
55. The spleen Library
• Not a framework.
• Available on NPM (npm install spleen –S)
56. The spleen Library
• Not a framework.
• Available on NPM (npm install spleen –S)
• Parses spleen expressions
57. The spleen Library
• Not a framework.
• Available on NPM (npm install spleen –S)
• Parses spleen expressions
• Build spleen expressions
58. The spleen Library
• Not a framework.
• Available on NPM (npm install spleen –S)
• Parses spleen expressions
• Build spleen expressions
• Instances of spleen.Filter serve as an abstraction
59. The spleen Library
• Not a framework.
• Available on NPM (npm install spleen –S)
• Parses spleen expressions
• Build spleen expressions
• Instances of spleen.Filter serve as an abstraction
• Match objects
60. The spleen Library
• Not a framework.
• Available on NPM (npm install spleen –S)
• Parses spleen expressions
• Build spleen expressions
• Instances of spleen.Filter serve as an abstraction
• Match objects
• Prioritize filter clauses
64. Introducing spleen
A dynamic filter expression dialect, library,
and toolset.
(...because finding available names on NPM is an exercise in futility)
70. Database Query Conversion Plugins
• Whitelist or blacklist queryable fields
• Require fields to be present in the filter
71. Database Query Conversion Plugins
• Whitelist or blacklist queryable fields
• Require fields to be present in the filter
• Specify an identifier
72. Database Query Conversion Plugins
• Whitelist or blacklist queryable fields
• Require fields to be present in the filter
• Specify an identifier
• Parameterize (prevent SQL injection)
73. Database Query Conversion Plugins
• Whitelist or blacklist queryable fields
• Require fields to be present in the filter
• Specify an identifier
• Parameterize (prevent SQL injection)
• Map fields in a JSON object columns in a database table
I’m here to talk to you about a fairly common problem that we all, as Node.js engineers, have likely had to tackle at some point. And that is, how do we accept filter criteria in web API endpoints.
We’ll examine some common approaches to these challenges, and analyze their pros and cons.
While this sounds like a fairly mundane problem, there are some potential technical and security-related challenges involved.
And this will segue into a discussion on a tool I built that will hopefully help you tackle this problem. It’s a tool I call...
So, lets talk briefly about what I mean by Web API filtering, just so we’re on the same page.
Say you have a REST API with a resource called “employees.” In REST the endpoint shown here functions as a collection of employees. As you see here, we have a paged result of 10 employees from a total of 130,042.
Now lets say we need to filter that result, to work with a particular subset.
Lets say we want to get all of the people who directly report to General Leslie Groves. So, we need to filter on managerId=1.
A typical approach to solve this use case would be add support for a query string parameter that allows us to filter on managerId.
Okay, so lets walk through a couple of approaches.
We’ve already seen one approach, and I would conjecture it is the most common. That is to simply add support filtering on various datapoints via query string parameters.
We can just continue adding support quite easily this way.
Now lets expand upon this a bit, and say we want to perform a comparison that is not an “equals” operation. Query strings don’t have support different operators. So, we’ll have to come up with something ourselves.
One way of solving this is to require that all filter parameters specify a comparison operator, as seen here we’re prefixing our filter value with ”neq,” and then delimiting the oeprator and value with a colon.
Internally, we’d have to write some code to parse out the operator from the filter value, and use this information to construct our database queries in the persistence layer of our application.
And we could easily use this pattern to support a variety of operators.
Say the complexity of our requirements are expanding, and we need to support disjunctive Boolean operators, as well as logical conjunctions. In other words, a mix of AND and OR conjunction operators.
This is where our approach up this point begins to fall over. Eventually, our code has to reassemble these clauses into something usable by a database.
And we can’t guarantee order. The examples here should work.
But since we cannot guarantee order, we will inevitably run into a situation where reassembling clauses results in a statement that is logically different from what was intended.
One method I’ve seen developers try is to simply take something looks like a SQL WHERE clause, or even MongoDB find statement, in a “filter” query parameter, and just pass that on through to the persistence layer of their application.
One method I’ve seen developers try is to simply take something looks like a SQL WHERE clause, or even MongoDB find statement, in a “filter” query parameter, and just pass that on through to the persistence layer of their application.
PLEASE PLEASE PLEASE DO NOT DO THIS!
It leaks the underlying database technology you’re using. So, now you’ve coupled API clients to your database technology.
And, obviously, it’s extremely difficult to secure.
What seems to be en vogue these days is to utilize an off-the-shelf architecture like GraphQL, Falcor, or, if you’re feeling especially masochistic, OData.
Personally, I’ve really enjoyed working with GraphQL and Falcor, and I encourage you explore these concepts.
That said there are some things to consider before you jump on the GraphQL bandwagon...
These are, on their own, API design concepts. They include tools for:
Defining your model
Allowing clients to create views in an ad hoc manner
Batch mutations
Etc
If you have an existing system that you’re maintaining and expanding upon, then introducing something Falcor or GraphQL would probably require a paradigm shift in your architecture.
And that’s because these concepts are opinionated. And those opinions can have deeper ramifications on the underlying system design and technology choices.
And depending on your technology choices, these things can be fairly non-trivial to implement.
Just to be clear, my intent is not to discourage you from using these technologies. These are merely points of consideration. If you find Falcor or GraphQL or, even, OData solves your problems then awesome.
For those of us for whom these off-the-shelf tools are not an option, we continue our journey.
So, to solve this problem, we need to develop a bit more sophisticated structure with which to serialize our filters. One way to do this is to represent our filters as JSON.
In this example, we’re creating an array of objects that represent a clause in the filter. All clauses can then specify a conjunction operator.
This gives us a structure that allows us to guarantee order such that we can assemble a database query that logically matches the intention of our API user.
It is also worth noting that at this point our code is probably becoming complex enough to breaking this logic off into a different code path. Here, we are creating a sub-resource of “employees” called “searches.” So, the REST semantic would be to POST to this resource.
And we can begin to expand on this structure, and do things like logical grouping.
This is starting to get complicated.
We’ve covered a number of different options, and they all require varying levels of effort to implement. We’ve talked about a few issues that may come up, so lets review them, and expand a bit on our list.
Your solution, obviously, has to be robust enough to suit the functional requirements of your system.
What kind of comparisons do you need in your filter?
Do you need support for conjunctive and disjunctive Boolean logic, or a mix of the two?
Do you need to be able to logically group clauses together?
You don’t want to leak the technologies, such as the database you’re using, to the client.
This is something that can be said about virtually any system you design, but consistency is a good thing. It makes it easier for users to learn your system, and conjecture how something works.
For example, if you’re going to implement things like sub-resources for “searches,” then do so across the board. You do not want to leave your users guessing whether or not they should be POST’ing searches, or GET’ing from a collection with a bunch of query parameters.
What is the impact of your solution on the underlying architecture?
If, for example, your system is based on event sourcing with CQRS, and is composed of dozens of microservices pulling from disparate databases using a multitude of technologies, then GraphQL may not be a practical solution.
Any solution you implement will require input sanitizing. In the event you have a complex dialect or JSON graph, this can become non-trivial.
This is an obvious one, but, amazingly, is still a problem a lot of companies.
Personally, I like the idea of having library that handles filtering like this for me, as it reduces the chance of developer mistakes resulting in security holes.
This one is less obvious, and is even a potential issue with GraphQL, Falcor, and OData.
Lets say a client application supplies your API with a filter that is doing something computationally expensive, such as a LIKE comparison on a field on a table with a million rows. Then lets say that field is not indexed. All of a sudden, you’re receiving several hundred of these queries per second, your database’s CPU spikes, and grinds everything to a halt.
You have some options to fix this. You could...
Index that field.
Not allow non-indexed fields to be queried.
Or you could require certain indexed fields to appear in any filter to minimize the resources filter on non-indexed fields consume.
Option “c” may only get you so far. Some database engines rely on the order of clauses in a WHERE statement to understand what indexes to use and when. So, if you have that expensive LIKE comparison on a non-indexed field appearing before the simple equality comparison on an indexed field, then you haven’t solved the problem.
As you can see, depending on your needs complexity can start to explode.
For example, if you’re reordering clauses in a user-provided filter statement based on a priority, this can become quite complicated when you also have to support conjunctive and disjunctive logical operators.
That’s a lot of complicated code to write. There’s a lot of edge cases, and that means lots of unit tests.
So, where does that leave us? We’ve discussed some options, but we may be stuck having to write and maintain a great deal of highly-complicated code.
And that was the motivation for writing...
Perhaps first and foremost, Spleen is a dialect for creating filter expressions.
And...
Big JSON graphs are neither human readable or terse.
If you have a field that is an object with its own set of fields, or if you have a field that is an array. We want to be sure that the way we are reference fields is flexible.
AND and OR
The AND operator is typically evaluated before OR, so if we need to evaluate OR before AND, then we can group statements together.
Little to know escaping is required.
Uses JSON pointers. Here we’re reference the first element in an array on the field “bar,” which is nested in an object that is the value of the field “foo.”
JSON has become the preferred data serialization format for the web. So, the use of JSON pointers not only gives us flexibility, it provides another layer of abstraction in our filter expressions.
Supports the common operators, and some of the more robust operators like range comparisons, array searching, and pattern matching.
Pretty straightforward.
The project provides a library for working with spleen filter expressions.
Un-opinionated.
Method for parsing a spleen expression into an instance of spleen’s Filter class.
Or build Filter instances directly with no parsing.
Intended to be the transport between the various layers in your application.
Match method.
Provides a method to reorganize clauses in an expression based on a given an ordered list of fields. This method is pretty intelligent, and will preserve the logical structure of the filter expression.
Lets dive in a bit some example code. Here we’re taking a spleen expression as a string, and parsing it into an instance of the Filter class.
We can now take advantage of the Filter class’ features. Here we using the filter to match against an object.
We could also pass the Filter class to different layers in our app for version into something else. More on that in a bit.
This is preferable over parsing in many use cases. It’s more performant, and provides a method for application code to easily and dynamically construct Filter instances.
Spleen is also a set of tools.
And that means plugins.
We have our filter instance, so what can we do with it. We’ve already seen we can use it to programmatically match JSON objects.
And we know this is an abstraction that can neatly be passed between layers.
The typical use case is to pass this down into your persistence layer, and convert it into something the database understands.
The spleen ecosystem currently only fully supports N1QL (Couchbase queries), but a number of other database plugins are in the works. First up is PostgreSQL, which will be published towards the end of next week. MySQL and MongoDB will immediately follow.
Also in the works is support for the Joi validation library. The idea here is to validate that filter expressions match the intended resource’s schema. For example, if someone provides a clause reference “foo” and “foo” is a string, but the user provided a Boolean, you can validate that and respond back to the client with a 400.
Some notes on the functionality you’ll find with all database plugins.
For example, if different fields are coming from different tables via a JOIN, you can specify which identifier to use for what field in the resulting SQL.
Some very lightweight, non-obtrusive ORM functionality.
Very robust, with support for conjunctive and disjunctive logical operators, a wide variety of comparison operators, complex data structures, and so on.
Very easy to implement. Less code you have to write, debug, and maintain.
Prevents SQL injection attacks, and DoS’ing via poorly composed filter expressions.
This an active and open source project. If you’d like to contribute, please reach out to me. There is a lot of work to be done, and I’m always looking for volunteers to help expand functionality, and port spleen to other languages.