The case against normalized caching in GraphQL
Jens Neuse, CEO & Founder of WunderGraph
In this post we'll compare rich GraphQL clients that come with a normalized cache implementation and the generated WunderGraph clients that rely on HTTP caching.
As you might have already found out, WunderGraph uses persisted queries by default.
With the WunderGraph code generator (https://www.npmjs.com/package/@wundergraph/sdk) you can generate a client that knows exactly how to invoke your previously registered operations.
With the @cache
directive you're able to configure that the response of an operation should be cached by the server & client.
Cache Control headers will be set accordingly, including etags. This mechanism is fully compatible with all major browsers and CDN's who implement caching according to the HTTP Caching RFC.
To illustrate this a bit better I'd like to introduce two example queries. The first one fetches a list of friends. The second one fetches some details about those friends.
Let's consider we want to show a list of friends:
query Friends {friends {idnameageavatarURL}}
For each friend we'd like to be able to click on the friend in the list and open up a detail page:
query FriendByID {friend(id: 123) {idnameageavatarURL}}
You will recognize that we already have all the data for the detail page. So in an ideal scenario the client won't have to make another request. This is possible thanks to cache normalization.
A smart normalized cache will identify the Friend entity and will recognize that the "FriendByID" query can be fulfilled using the data from the "Friends" query which we already ran.
What are the pros of this concept?
- Navigating to a friend detail page will be instant because there is no network request required
- The client will save bandwidth, and the user experience will be more fluent
- If we navigate back we can also immediately pull out the list of friends from the normalized cache
How can this situation become hairy? Let's add a third operation. While on a user detail page we'd like to unfriend one of our peers:
mutation Unfriend {unfriend(id: 123) {id}}
How does the unfriend
mutation make the situation complex?
In your normalized cache you have to invalidate or update the "Friends" and "Friend" entities.
In your friends list you have to remove the user with id 123.
For the Friends
you have to make sure no friend is returned for id 123 anymore.
How does your normalized cache draw the lines between the unfriend
mutation and the friend
and friends
query?
You as the frontend developer have to program the cache to do so.
After the mutation you must inform the cache about these changes.
With that let's talk about the cons of a normalized cache:
- a rich GraphQL client with a normalized cache is complex to build and maintain
- the cache is running in the javascript vm of your browser and therefore a lot less efficient than the browser cache
- the logic to keep the cache state correct can become quite hairy
- the frontend developer must understand the domain and program the cache correctly to avoid unwanted behaviour
- the frontend developer must implement custom rules for cache eviction
One thing that I want to explicitly mention outside of the list:
There's no single source of truth for the business object in this scenario.#
The frontend developer might accidentally allow the UI to show a friend in the friends list even if you have previously unfriended said person. Errors like these are very hard to spot. I think we're giving the frontend developer a lot of responsibility in this case to get caching right.
Should it really be a concern of a frontend developer if data is stale? Shouldn't a frontend developer focus on the UI and trust the data layer? Does it really have to be that complicated to build rich apps with good performance?
I believe there are applications where it's definitely worth having such a complexity. On the other hand I see many use cases, eg a news website, where relying on the HTTP Caching RFC is a lot simpler and more efficient.
Enter WunderGraph caching:#
With WunderGraph every registered Query becomes an endpoint to which you can apply caching rules individually.
Let's revisit the example from above:
query FriendByID @cache(maxAge: 5){friend(id: 123) {idnameageavatarURL}}
This Query becomes the following endpoint on your WunderGraph Node:
/application-id/FriendByID?variables={"id":123}
In this scenario we decided to cache a friend object for 5 seconds using the @cache
directive.
After 5 seconds the client will re-request the user and send an If-None-Match
header with the request.
If the previous response is still valid the server will respond with a 304 (Not Modified)
http status code.
The same logic can be applied to the Friends
Query.
All you have to do is define the desired behaviour using directives on the Operations.
What are the pros of this approach?
- there's a single source of truth - the Operation Definition
- caching is handled automatically by the browser which is easier to use and understand
- no complex tooling is required to understand why a request is cached, browsers have excellent debuggers for this
- no javascript code has to be written to keep the cache state in sync
- with a service worker you can easily build offline apps using standard caching techniques
- less javascript code to be run by the browser
- the frontend developer gets to focus on the UI and has to worry less about data fetching and caching logic
What are the cons of HTTP caching for GraphQL queries?
- the client has to make more requests than with a normalized cache
- more requests lead to more bandwidth usage
- there's no easy way to invalidate the cache immediately
Does a normalized cache prevent us from doing more requests?#
Let's make this scenario a bit more realistic. On the friends detail page we'd like to see the bio of the friend too:
query FriendByID {friend(id: 123) {idnameageavatarURLbio}}
With this one field added even with normalized caching we have to refetch each individual friend even though we already have most of the data. At the end of the day a normalized cache might introduce a lot of complexity to your app while the benefits are not as huge as you expect. In this last example you could have saved to transfer less fields for each user detail page at the expense of a complex GraphQL client that understands which fields are missing for an entity.
Cache invalidation#
As mentioned previously, a normalized cache can easily be invalidated. This comes at the cost of implementing and maintaining the cache code plus defining the logic when to invalidate which objects.
With HTTP caching it's not that easy. You could add a dynamic parameter, e.g. a timestamp, to the Query. This would allow for easy cache invalidation but also reduces possible cache hits.
Users can have multiple clients#
Is it possible for your users to open your application in multiple tabs, native applications, etc.? If that's the case what happens if you unfriend a user in one tab while you have another tab open? At that point your normalized cache has no way of figuring out if data is stale, it needs to make a network call if you switch tabs.
Should you cache at all?#
Are we actually solving the problem at the right layer or creating a new, even more complex, problem?
If data like in this example could change any time at any click (add friend/unfriend) should we really cache this at the client or transport level at all?
Why not use an application cache, e.g. Redis or Memcached, in the Backend if hitting the database directly is a performance bottleneck? In this scenario, neither transport level caching, nor a normalized client cache is the proper solution.
When to cache#
Caching makes sense for publicly available data that doesn't change frequently.
E.g. on a news website it's totally fine to cache the content for each article for a few seconds (e.g. 5). This would reduce the amount of requests from thousands to one per resource per 5 seconds.
In case data can change at high frequencies, especially after user interactions with the application, caching should happen at the application layer.
Summary#
When you think you have to use a normalized cache in the client you should consider an application level cache first.
A normalized cache introduces a second source of truth in the frontend which needs to be maintained. Optimistic can get that last bit of performance out of an app to get the user experience from 95% to 98% at the cost of extra complexity.
Most of the time you don't need this complexity and should avoid it. Keep it simple, solve a business problem, don't introduce technical debt.
WunderGraph gives you a simple and powerful way to use transport based caching. For 99% of the other use cases you should consider adding an application level cache if performance is an issue.
What to read next
This is a curated list of articles that I think you'll find interesting.
- In the WunderHub Announcement, I talk about how WunderHub will change the way we share and collaborate on APIs. It allows you to share APIs like npm packages.
- How automating API integrations benefits your business is dedicated to C-level executives who want to learn more about the business benefits of automating API integrations.
- Another interesting topic is to JOIN APIs without Schema Stitching or Federation, just by using a single GraphQL Operation
- For those interested in the most common GraphQL Security vulnerabilities, I suggest to read about them and how WunderGraph helps you to avoid them.
- A classic post but still relevant is I believe that GraphQL is not meant to be exposed over the Internet. It's a controversial topic and many misunderstand it. But think about it, why is HTTP not mentioned a single time in the GraphQL specification?
- One very common problem of using GraphQL is the Double Declaration Problem, the problem of declaring your types over and over again. This post explains that it's even more complicated than just double declaration and how we can solve it.
- The Fusion of GraphQL REST and HTTP/2 is a very long post, probably too long for a blog post. But if you're interested in a deep dive on the motivations behind creating WunderGraph, this is the post for you.
About the Author
Jens Neuse, CEO & Founder of WunderGraph
Jens has experience in building native apps for iOS and Android, built hybrid apps with Xamarin, React Native and Flutter, worked on backends using PHP, Java and Go. He's been in roles ranging from development to architecture and led smaller and larger engineering teams.
Throughout his whole career he realized that working with APIs is way too complicated, repetitive and needs a lot more standardization and automation. That's why he started WunderGraph, to make usage of APIs and collaboration through APIs easier.
He believes that businesses of the future will be built on top of collaborative systems that are connected through APIs. Making usage, exploration, sharing and collaboration with and through APIs easier is key to achieve this goal.
Follow and connect with Jens to exchange ideas or simply participate in his feed of thoughts.