WunderGraph Query Compiler: Access multiple APIs through a single unified Interface efficiently

Published: 2022-05-23

The WunderGraph Query Compiler makes it possible to interact with multiple APIs and Services as if they were a single API. We'll explain how it works and why it's so fast.

High Level Design#

The design of the Query Compiler is similar to an execution engine of a SQL database. The biggest difference though is that we're not using SQL to interact with tables but instead, we're using GraphQL to talk to services.

While this description is easy to understand, the reality is a lot more complex. GraphQL is only used as the meta language, there's not much left of GraphQL during the execution.

We're leveraging two essential parts of GraphQL. We use the GraphQL schema to describe the API, and GraphQL Operations to define our Actions. Once all Actions are defined, we're removing GraphQL entirely. What's left a set of functions that, when executed, generate a JSON that happens to look like a GraphQL response. Because we're stripping out GraphQL at runtime, we call the Schema "virtual" because it doesn't really exist, it's just a virtual abstraction on top of a specific set of APIs, services, databases etc...

At this point, you might start to have some questions, like...

What's behind this design?
Is the complexity necessary?
What are alternatives?
Why GraphQL as the intermediate API Layer?
What exactly is meant by "Virtual Schema"?
How does the execution compare to traditional GraphQL Servers?
What additional benefits do we get from this approach?

Alternatives#

Let's start with the simplest one, Alternatives, which is also a good intro to understanding the motivations behind this design.

If you wanted to build an abstraction layer, e.g. a backend for frontend (BFF), you'd first choose an API style and then start implementing the API Endpoints, Controllers or Resolvers, depending on which style you would choose.

For simplicity, let's stick with GraphQL, but you could build your BFF using REST or gRPC. So, if you wanted to build an API abstraction on top of your existing infrastructure, you'd create a GraphQL server. This means, for each datasource, for each field, you'd have to build a resolver. You'd choose a programming language, and depending on the data sources, you'd add drivers or adapters to talk to different services.

If you're new to this world, you'll probably have to refactor along the way as you have to make your learnings. It's going to be a lot of code, this code needs to be maintained and tested. On the other hand, you're going to have full control over the API. As every resolver is hand-written, you're able to fully customize the API.

In terms of maintenance, whenever there's a change to the upstream services, the origins, you have to modify the BFF to reflect the changes. So, this project is not going to be a one-off but rather a continuous workload.

If you don't just have one single team, you'll also need to model a good process and ownership model to make sure that the BFF is of good quality and working on it scales across teams. If it's a shared codebase, maintained by multiple teams, you could run into bottlenecks. You could also split the codebase and turn to a federated microservice architecture, but this also adds overhead in terms of communication and deployment strategies.

After all, you have to secure your BFF, protect it against denial of service attacks as well as handle authentication and authorization.

All this is doable, but it's a lot of work and requires experience to get it right.

Observations that led to the design of the Query Compiler#

I've been a heavy user of the BFF pattern and have seen it in many companies. I've understood that it's important for engineering teams to get good abstractions on top of all the services that are available to them. Point-to-point integrations don't scale, at the same time, building BFFs is expensive.

I've looked at companies like Twitter and learned about the tools they were using internally. One presentation that stood out was about Strato, Twitters "Virtual" Database. Unfortunately, Strato is a proprietary System and probably tightly coupled to Twitter.

Looking at Strato, I've realized that every company should have a virtual Database, an abstraction on top of all their API infrastructure, allowing for unified access across all (Micro-)Services, APIs and Databases.

At that time, I already knew about GraphQL, but it took me some more time to put the puzzle together.

GraphQL is to services what SQL is to tables.

There are other, probably more powerful Query Languages than GraphQL. However, GraphQL has some critical features that make it the perfect fit for this problem.

First, GraphQL enforces a Schema. Other API styles allow optional typing, not so with GraphQL. If you write a Query, you know exactly what data comes back, field by field.

Second, GraphQL returns JSON. There are good reasons to use other formats, mostly binary, to increase performance and size. At the same time, all these other formats like Avro, Protobuf, Flatbuffers, etc. make the Developer Experience more complicated.

Third, GraphQL has a fast growing community. A lot of Developers are already familiar with the Query Language, most of the rest are interested in learning it.

GraphQL comes with Queries, Mutations and Subscriptions, so it can easily cover most of the common use cases.

However, I've extensively wrote about the security vulnerabilities that are inherent to GraphQL. While the Language is great, running it in production comes with some challenges.

It took me a while to realize the missing piece of the puzzle, Persisted Operations, also known as Persisted Queries, but Queries in this context doesn't make sense because it could also be mutations or subscriptions.

Persisted Queries are GraphQL Operations, stored (persisted) on a GraphQL server, that can be executed using variables but not altered at runtime. They have the potential to improve performance and security, as clients are not able to define arbitrary Operations.

The final observation to make the WunderGraph architecture possible is the ability to somehow describe every service. REST APIs can be described using OpenAPI Specification (OAS), GraphQL APIs already have a typed schema, SQL databases have meta-data that can be queried to understand the structure and data types of the tables, Streaming services like Kafka or PubSub Brokers can be described using Async API.

All this means, we're able to describe the capabilities of all services using a machine-readable format.

Porting the design of Twitters Strato to make it general purpose#

I've taken the time to introduce all the ingredients for our general purpose "virtual Database", let's quickly recap what we have before explaining the architecture.

Schema: GraphQL SDL
Introspection of DataSources and description in machine-readable format
Defining Actions: GraphQL Operations

With these ingredients, we're able to describe how the "virtual Graph" is being created.

Introspect all DataSources
Generate GraphQL SDL for the introspected DataSources
Describe in machine-readable language, how the fields can be resolved

This might sound a bit abstract so here's a specific example. Let's say we've got a single REST API that returns a user by ID, this is what the GraphQL SDL could look like, followed by the machine-readable description on how to resolve the field.

type Query {
    userByID(id: ID): User
}
type User {
    id: ID!
    mame: String!
}

{
  "typeName": "Query",
  "fieldName": "userByID",
  "resolver": {
    "type": "REST",
    "config": {
        "URL": "example.com/users/:id",
        "arguments": {
          "id": {
            "source": "PATH",
            "sourceName": "id"
          }
        }
    }
  }
}

This description is not even close to the actual description, but for illustration purposes, this should be enough. In reality, there are a lot more problems and edge cases to deal with, you have to handle authentication and other aspects.

By the way, you don't have to create this definition manually, although it's possible. WunderGraph provides an SDK to introspect services and generate the configuration.

Now that we have the virtual Graph, let's have a look at the Operations. To keep it consistent, we'll now create an Operation for the Schema from above to query a user.

# User.graphql
{
    userByID(id: "1"){
        id
        name
    }
}

We've created a file User.graphql to get the user with the ID 1 and retrieve their id and name.

If we now take the SDL, the DataSource description and the Operation, we're able to generate a function that, in the end has nothing to do with GraphQL. The function would look similar to this one:

const User = async ({id}) => {
    const user = await fetch(`http://example.com/user/${id}`);
    const data = await user.json()
    return {
        data: {
            user: {
                id: data.id,
                name: data.name,
            }
        }
    }
}

Again, it's a simplification, but that's fine for demonstration purposes.

How does the execution compare to traditional GraphQL Servers?#

Now that we understand how the virtual Graph, Query Planning, and the Execution works, we're able to make a comparison with the execution of a GraphQL Query within traditional GraphQL Servers.

Let's have a look at what happens when a GraphQL Query hits a traditional GraphQL Server:

Parse JSON Body
Extract GraphQL Operation
Lexing / Tokenizing of Operation
Parsing of Operation
Normalization (cleaning the AST, deduplication of fields)
Validation
Interpretation of GraphQL AST
1. Recursively visiting all AST Nodes
2. For each Node, check if a resolver exists
3. Execute Resolver if any
4. Resolvers itself might access AST to retrieve meta-data
5. Collection of all resolved fields
6. Assembly of the Response
Returning the Response

Next, let's have a look at how WunderGraph executes a Request.

Parse Variables
Call user function with variables and return Response

The Architecture of WunderGraph is designed to move all heavy computational work out of the hot path. Everything that can be done at compile time will be done at compile time. What's left is calling the origin services and assembling the response.

We've benchmarked this approach against the Apollo Gateway. We knew it's going to be fast, but we didn't expect almost 300x in throughput on generic benchmarks. These benchmarks were running on localhost with minimal latency, so they will probably not reflect the reality.

That said, we're confident that Performance should never be an issue with WunderGraph.

If you thought that's it, here's one more thing to add!

As described before, the WunderGraph Engine works in multiple steps. It analyzes the Operation at deployment time and executes it at runtime.

We've designed the Engine in such a way that it's possible to add additional "optimization" steps. This way, an Execution Plan can be optimized and rewritten.

One such use-case is our implementation of the Apollo Federation Gateway Specification. Thanks to our architecture, we're able to "rewrite" the Execution Plan to solve the N+1 problem.

Through static analysis, the Query Planner can detect, when the N+1 problem would occur and rewrites the Execution Plan in such a way that at runtime, a DataLoader will be used to batch all the Nth Requests into a single Call.

As the implementation is generic and general purpose, this pattern can be applied to any DataSource that supports batching.

What additional benefits do we get from this approach?#

Before concluding, let's talk about additional benefits from doing "GraphQL" like this.

Just calling predefined functions instead of doing all the steps outlined above is not just extremely fast, it also makes the execution a lot more secure. Most client applications get deployed with a build step. Once this build step is done, the Operations of the application will not change anymore, only the variables will.

If Operations don't change at runtime, why deal with the complexity of "securing" a GraphQL API when all you need is a set of pre-defined functions?

Additionally, having pre-defined functions makes analytics and logging a lot easier.

Another benefit is that, through the "intermediate function", we're getting another layer of abstraction, de-coupling client and server. This abstraction can be used to implement versionless APIs, making it easier to keep APIs backwards compatible.

Finally, pre-defining functions has another great benefit. As described above, we've said that GraphQL Responses are in JSON format. The variables of a GraphQL Operation can also be encoded as a JSON. What's great about this is that we're able to generate a JSON-Schema for both the inputs and the response of an Operation. This way, we're able to use JSON-Schema Validation to validate the inputs of an operation.

Conclusion#

WunderGraph introspects all your APIs, Services and Databases and merges all descriptions into a unified virtual Graph with a GraphQL SDL. Next, you'll write a bunch of GraphQL Operations to define the Actions you'd like to run against the Graph.

These Operations define the actual API surface. The Schema is not the API, the Operations are, hence the "virtual Graph".

Instead of writing another Backend for Frontend, all you have to do is introspect the services you need and define your Operations.

Finally, as we can describe the inputs and outputs of all Operations using JSON-Schema, we're able to generate type safe clients from these descriptions. It's like Firebase, but type safe, and using your own services.

Sounds like something you'd like to try out yourself? Have a look at the Quickstart Guide