Static analysis of GraphQL queries

July 14, 2019

If you just want to analysis your query, go to www.graphql-analyzer.com.

It is sometimes desired to analyze a GraphQL query without executing it. Use cases are predicting how expensive queries are, planning the execution or just understanding the query better.

Each GraphQL API has a schema defining which fields can be queried and what type each field has. But this makes it still not straight forward to analyze a general query and to decide what the output of the analysis should be.

The dependency of a field

Each GraphQL field has a dependency on the parent field because the value of the parent field is the source argument of a resolver. This is a bit of an implicit dependency because the GraphQL spec never really mentions it clearly, but rather describes the execution algorithm.

Lets assume we have the following schema:

type Query {
    pets: [Pet]
}
interface Pet {
    name: String
}
type Dog implements Pet {
    name: String
    shedding: Boolean
}
type Cat implements Pet {
    name: String
    indoor: Boolean
}

A simple query looks like this:

{
    pets {
        name
    }
}

This means we have two fields: Pet.name which depends on Query.pets.

We are deciding to make the dependency explicit by connecting both fields with an arrow (edge) from Pet.name to Query.pets.

(Note: This is actually not the dependency graph we will end up with for this query, but it will look like this:

It will be explained later.)

While this example is not really complicated, if we want to analyze all possible queries we need to look at two distinct challenges and how we deal with it.

Challenge 1: Merged Fields

GraphQL allows you define a field multiples times. The simplest example is

{ 
    pets {
        name
        name
    }
}

Another is

{ 
    pets {
        name
        name
    }
    pets {
        name
    }
}

And an even more extreme example is:

{ 
    pets {
        name
        name
    }
    pets {
        name
        ...{
            name
        }
    }
    ... on Query {
        pets {
            ... on Pet {
                name
            }
        }
    }
}

All of these queries are executed in the same way as {pets{name}}. This means if we want our dependency graph as accurate as possible we have to take multiple declared fields into account.

These fields which are declared multiple times but are actually one are called “Merged Fields” and you can read more about it here.

The strategy how to deal with it not really surprising: every merged field should be represented as one node in the dependency graph.

Challenge 2: Type conditions and abstract types

The other GraphQL feature which makes analysis of queries challenging is type conditions via Fragments and abstract types (abstract types in GraphQL are Interfaces and Unions)

A fragment has a type condition which lets you query more specific fields to this type. For example if we not only want to know the name of the pets, but also for each dog if they are shedding or not, we can do that:

{
    pets {
        name
        ... on Dog {
            shedding
        }
    }
}

Our dependency graph for this query could look like this:

One not very obvious aspect of this graph is that the Dog.shedding node is only to be executed if one of the Query.pets objects is of type Dog. In order to make this more clear we color the arrow from Dog.shedding to Query.pets red:

This is not the final version of the graph we want, see next section.

Only object types really count

Abstract types in GraphQL (Interfaces and Unions) are not really relevant for executing a query: resolvers are defined on an Object type and not on Interfaces or Unions. The schema also defines statically what objects implements an Interface and which Objects make up an Union

For example all of the following queries are executed in the same way:

{
    pets {
        name
    }
}

{
    pets {
        ... on Dog {
            name
        }
        ... on Cat {
            name
        }
    }
}

{
    pets {
        name
        ... on Cat {
            name
        }
    }
}

{
    pets {
        name
        ... on Dog {
            name
        }
    }
}

This means our dependency graph should look the same for all these queries. We are choosing to make the fact explicit that only Object fields count and representing all of the above queries with the following graph:

While there are other possibilities to design this dependency graph (you could choose to include Interface fields for example) the choice to eliminate Interfaces and Unions reflects most accurately the execution and also is arguable easier to understand: you don’t need to worry about Interfaces or Unions at all.

Examples

Here are a few more example how the dependency graph for different queries look:

Just querying Dog.shedding with a named Fragment:

{
    pets {
        ...Shedding
    }
}
fragment Shedding on Dog{
    shedding
}

This is exactly the same as querying with an inline fragment:

{
    pets {
        ... on Dog {
            shedding
        }
    }
}

The dependency graphql looks like this:

Lets assume we add a new top level field dogs: [Dog] and we want to query for them if they are shedding and we still use the same fragment:

{
    dogs {
        ...Shedding
    }
}
fragment Shedding on Dog{
    shedding
}

This will result in a similar graph but this time with a black arrow, because the types returned by dogs matches the type of Dog.shedding.

Lets assume we add a third object type implementing Pet: Bird. The graph for {pets{name}} will then have a third node pointing to Query.pets:

For the last example lets lets change our schema:

type Query {
    pets: [CatOrDog]
}
union CatOrDog = Cat | Dog
type Human {
    lastName: String
}
interface Pet {
    name: String
}
type Dog implements Pet {
    name: String
    bestFriend: Pet
    owner: Human
}
type Cat implements Pet {
    name: String
    indoor: Boolean
}

The following query:

{ 
    pets {
        ... on Dog {
            bestFriend {
               ... on Cat {
                   name
               } 
            }
            owner {
                lastName
            }

        }
        ... on Cat {
            indoor
        }
    }
}

results ins

Impossible type conditions

One interesting result is, that this analysis detects impossible type conditions, which are allowed by the GraphQL specification.

In the section Abstract Spreads in Object Scope the following example is given (slightly modified here) :

Schema:

type Query {
    pets: [CatOrDog]
}
union CatOrDog = Cat | Dog
interface Pet {
    name: String
}
type Dog implements Pet {
    name: String
}
type Cat implements Pet {
    name: String
    indoor: Boolean
}

Query:

{
    pets {
        ...UnionWithObjectFragment
    }
}
fragment CatOrDogNameFragment on CatOrDog {
  ... on Cat {
      indoor
  }
}

fragment UnionWithObjectFragment on Dog {
  ...CatOrDogNameFragment
}

The dependency graph for this query is just one node, because the type condition can never be true (Cat which is a CatOrDog which is a Dog):

Available as library and interactively

The described algorithm is available as JS module: graphql-analyzer and will also be available in GraphQL Java.

If you want to try out interactively how the dependency graph of you query looks like go to: www.graphql-analyzer.com

Any feedback? Ping me on twitter: @andimarek

Written by Andi Marek You should follow him on Twitter