Faster Retrieval of Shopify Metafields with GraphQL

Shopify metafields

If you’re an app developer working on the Shopify platform, you’re probably aware that many merchants keep additional product data stored in Shopify metafields on their store. This is challenging for developers, as retrieving data from metafields is notoriously difficult via the Shopify REST API, and can cause slow performance when synching a client’s store data.

Our app, Klevu Search, involves retrieving a merchant’s product data to index it for search. We initially struggled with metafields, but found with some careful query creation and resource management, GraphQL can be used to fetch product metafield data much faster than with REST. But, there is also a cut-off point where GraphQL becomes slower than REST due to a combination of query cost and throttling.

But the payoff is still real: for some test cases we were able to reduce the sync time of product metafields from four minutes to just 10 seconds. In this article, we’d like to share how we did it, and the useful information we learned along the way.

Fetching Shopify metafields via REST API

The problem with using the REST API to retrieve metafield data stored against your client’s products is that there is no way to retrieve them in bulk. There are a few promising leads to be found in the Shopify documentation, such as /metafields.json?metafield[owner_resource]=product. However, in practice, these methods do not return the data we hope for.

As a result, we must fetch metafield data one product at a time, or one product variant at a time. This means that if your client has a store with 100 products and 400 variants, you need to make 500 API calls to get all of that metafield data. We found that this can take around four minutes, which is far too long.

Fetching Shopify metafields via GraphQL

Fortunately, Shopify also has a GraphQL API which allows a little more flexibility in terms of retrieving data in bulk and can be used to retrieve product metafields more efficiently.

That being said, it is not just a case of replacing your REST calls with a corresponding GraphQL call. Some careful query creation and resource management is required, which we will take a deep dive into in the sections below.

Query cost

The most important thing to understand is the query cost aspect of GraphQL. More information can be found in the Shopify documentation, but we will go through what you need to know in this article.

A GraphQL query to get the name and description of 50 products looks something like this:

There is a resource cost associated with each element of this query result. Increasing the number of products from 50 to 100 would cause an increase in the overall query cost, whereas decreasing the product count to 10 would decrease it.

From the Shopify documentation:

“Every field in the schema has an integer cost value assigned to it. The cost of a query is the sum of the costs of each field. Connection fields have a multiplying effect on the cost of their sub-selections based on the first or last arguments.”

Let’s look at a best case query to retrieve as many products’ metafields as we can in one go, given that the generally accepted maximum number of records to retrieve via the Shopify APIs is 250:

In the above query, we are requesting 250 products, and for each of those products, we are also requesting 250 associated metafield values. The result of this single API call would mean we could efficiently retrieve all of the data we need 250 products at a time, which would be amazing compared to the REST approach of one product at a time.

The problem comes when we run this query, and get the following API response:

Error: Query has a cost of 63252, which exceeds the max cost of 1000

What’s happening? From the Shopify documentation:

“An app is given a bucket of 1,000 cost points. This means that the total cost of your queries cannot exceed 1,000 points.”

Our query is 63 times too complicated for what Shopify can produce. We need to simplify it.

In other words, we need to reduce the number of records we are requesting:

  • 250 products with 50 metafields = 13,252 query cost, still too high.
  • 50 products with 50 metafields = 2652 query cost, getting closer.
  • 30 products with 30 metafields = 692 query cost. Bingo!

Providing our store has 30 metafields or less, we can efficiently retrieve those values 30 products at a time, whereas with the REST API, we could only retrieve one product at a time.

You might also like: The Shopify GraphQL Learning Kit.

Throttling

Based on this, you might think a data sync with GraphQL is going to be 30 times faster than REST, right?

Unfortunately, it’s not quite that simple. When we run the above query twice in quick succession, for example with getting the first 30 products followed by the next 30 products, the API response is something like this:

Error: Throttled

This is because we’ve used up about 700 points of our 1,000 point bucket, and we must wait until there are enough points available to make our next request, at a refill rate of 50 points per second.

From the Shopify documentation:

“An app is given a bucket of 1,000 cost points, with a leak rate of 50 cost points per second. This means that the total cost of your queries cannot exceed 1,000 points at any given time, and that room is created in the app's bucket at a rate of 50 points per second.”

If we remember that it takes four minutes to retrieve the metafields of 100 products and 400 variants using the REST API, let’s compare how long this would take using the above GraphQL query:

As we can see, despite the ability to retrieve metafields in bulk with GraphQL, it has still taken 38 seconds just to retrieve the 100 parent products’ metafields, and we still need to get the product variants.

Based on the above, we can expect 14.5 seconds for each GraphQL request of 30 results, and 14 GraphQL requests needed to get all 400 variants, giving us an additional 203 seconds. 

That means that even using this approach, it still takes around four minutes to retrieve the metafields of all 500 records, which is disappointingly similar to the time taken by the REST API.

You might also like: GraphQL vs REST: How One Shopify Partner Increased Performance and Reliability.

Our approach

For our app, we know which metafields we need for a given Shopify store, and we have the namespace and key available so we can be quite specific in our approach.

The most efficient GraphQL query we found was to only request these specific metafields and ensure that we are only retrieving data we absolutely need. It looks something like this:

The pageInfo{hasNextPage} is required to know whether we need to fetch more records from the next page, in which case we use the cursor field. legacyResourceId is the ID of the product variant, and Product{legacyResourceId} is needed to associate the product variant with its parent, i.e., the Parent ID.

The three metafields are the specific namespace:key values we require from this store. This part is dynamic per store and can increase or decrease in number depending on how many metafields are required.

With this query, we can fetch the three specific metafields we need, 50 products at a time.

Large page size vs. small page size

Due to this query cost and throttling, we found that the common sense thinking that “more products in one go will be faster” is not necessarily correct.

If the page size is low, the query cost is low, so you can make more requests without being throttled. However, you get less data returned with each request. If the page size is high, you get more data with each request. However, the query cost is also high, so you get throttled even more.

From testing the retrieval of a single metafield from 100 products and 400 variants, we found:

  • One product per page took 216 seconds, throttled zero times
  • Five products per page took 44 seconds, throttled zero times
  • 10 products per page took 21 seconds, throttled zero times
  • 25 products per page took 10 seconds, throttled zero times
  • 50 products per page took 10 seconds, throttled three times
  • 75 products per page took 12 seconds, throttled six times
  • 100 products per page took 10 seconds, throttled four times
  • 150 products per page took 16 seconds, throttled seven times
  • 200 products per page took 10 seconds, throttled five times
  • 250 products per page took 18 seconds, throttled seven times

As you can see, the timings and the throttle counts are not entirely uniform. The total time taken is based on a combination of three factors:

  1. The number of metafields being retrieved
  2. The optimal pagination count
  3. The throttle count

For this example, the best times were due to the optimal number of API requests required, since 100 products and 400 variants divide into 25, 50, 100, and 200 page counts perfectly. However, despite one, five, and 10 also dividing into our page count perfectly, they were slower due to being below the optimal threshold of products to be retrieved with each request.

Importantly, we can see the timing is the same whether we’re fetching 25, 50, 100, or 200 products per page, which is due to the extra throttles as the page size increases.

As a result, we opted to always request 25 products per page.

The results

In summary, we found an enormous improvement when fetching data for stores where we only needed to retrieve a small number of product metafields, since GraphQL really shines here with a low query cost and a very efficient bulk retrieval.

For a Shopify store with 100 products and 400 variants, retrieving 25 results per page, we found the following data sync times versus the REST API approach, which if you remember took around 4 minutes (or 240 seconds):

  • One metafield: ~10 seconds
  • Two metafields: ~20 seconds
  • Five metafields: ~50 seconds
  • 10 metafields: ~100 seconds
  • 20 metafields: ~200 seconds

We can see a clear pattern of {metafield count * 10}seconds emerging.

  • 24 metafields: ~240 seconds (the same as the REST API)
  • 30 metafields: ~300 seconds (slower than the REST API)

For stores where we needed to retrieve more than 24 metafields, the benefit of GraphQL was lost due to throttling, and in fact, the REST API was quicker. As a result, we would select the correct method between REST and GraphQL based on the number of metafields needed for a particular store.

Taking a couple of real customer stores with over 10,000 products and more than 10 metafields, we've reduced one Shopify Standard store’s full data sync time from three hours to just one hour, and a similar Shopify Plus store from one hour and 45 minutes, to just 35 minutes using this GraphQL approach.

Using GraphQL bulk operations to speed up the retrieval of Shopify metafields

The improvements we found with GraphQL are already a great start, and we're still diving deeper into the other options Shopify has available for data retrieval, such as the GraphQL bulk operations API.

This requires a slightly different approach, since you must still use a GraphQL query. But, rather than receiving the results in the response directly, you receive a reference ID which you can use to periodically check if Shopify has finished preparing your data yet. Once the task has completed, you will be provided with a URL to download the results in JSONL format.

From our initial testing, this approach is showing even better performance gains. By using a Shopify bulk mutation operation, we have been able to retrieve the following data with a single API call:

  • All products
  • All product variants
  • All product metafields
  • All product variant metafields

This has a query cost of 10 and only took 30 seconds to complete. With this approach there is also no need to worry about rate limits, pagination or throttling, making it a much more efficient and performant option over REST and even standard GraphQL calls.

Grow your business with the Shopify Partner Program

Learn more