Google Data Studio: How to extract data from a JSON string with REGEXP_EXTRACT

In some cases it happens that you need to collect several data points into one single field. In this article I’ll share with you how to extract such data from a JSON string with Google Data Studio. Key benefit is that you have maximum flexibility to access all data points contained in such a string. To achieve this, In the following how to, I’ll show how to extract data with the Google Data Studio regular expression function called “REGEXP_EXTRACT”.

Contents

  1. Problem & challenge
  2. Solution & outcome
  3. How to extract data from a JSON string with regexp_extract
  4. Some final words

Problem & challenge

For one of my clients, we are collecting a huge amount of data within a single event in Google Universal Analytics. It’s a results data set of a configurator tool. For several reasons, we decided to send a complete JSON string with all configurator results into the Universal Analytics event action data point.

Example of a JSON string from which single values will be extracted for use in Google Data Studio

Example of a JSON string from which single values will be extracted for use in Google Data Studio

As a next step, we had the challenge to unravel that data from that JSON string again.
From this, we came up with the solution introduced below.

Here’s how we did it before developing this new approach. We worked with the function “CONTAINS_TEXT” and the following data studio resource custom field construct:

Data source field with CONTAINS_TEXT function

Data source field with CONTAINS_TEXT function

But there are some constraints coming with this solution:

  • Only a few parameter values can be added the manual way. In case of above shown example of cities’ names, one can imagine there are way more cities that users can come from.
  • Use the data as dimension, not as metric. Thus, this is not so flexible.
  • Couldn’t extract values with many different values. (Show example of 0.12345 value)

Solution & outcome

A great way to accomplish this is Google Data Studios Regular Expressions function: REGEXP_EXTRACT.
You can find Google’s documentation of the function here.

The following outcomes come with that approach:

  • More flexibility
  • collect *all* values contained in the JSON string
  • Do calculations with numbers coming from a JSON string

Two more annotations before we start:

    • This also works with any other format in which data is packed into a single field. For example comma separated values (CSV).
    • This article is not about how to connect a JSON data source to Google Data Studio. To this topic others have already delivered solutions like Garrett Vorce: “Google Data Studio Tutorial with JSON” on Medium.

So, let’s now have a look into the details, how to achieve this:

How to extract data from a JSON string with the Regular Expression function “REGEXP_EXTRACT”

(Short summary of how to do it)

1. Pre-Requesites: Setup a Google Data Studio dashboard and connect your data source

You can read here at XYZ how to accomplish that.

2. Add a new data source field

  1. In Google Data Studio, click in the menu on “Resource”
  2. Then click on “Manage added data sources”
  3. Find your data source
  4. Click on “Edit”
  5. Click on “(+) Add a field”

3. Use regexp_extract to extract the data out of the JSON string

3.1 Extract a text string from JSON – with a Regular Expression

  1. Now, in the “Add a field” dialogue, first give your new field a name.
    In our example, we want to extract the information “city” from the JSON string. So we can call the field i.e. “City (JSON)”.
  2. Then it’s time to enter the formula:REGEXP_EXTRACT(Ereignisaktion,'"city":"((\\"|[^"!,])*)"')

    Data source field with REGEXP_EXTRACT function

    Data source field with REGEXP_EXTRACT function

  3. So now, what does this formula do in detail? We’ll have a look:
    1. With  REGEXP_EXTRACT the function gets called
    2. Ereignisaktion is the (german) name of the Universal Analytics event tracking data field, in which the JSON string is stored. (In english it would be Event action.)
    3. The key part now is the regular expression (also called “regex” or “regexp”): '"city":"((\\"|[^"!,])*)"'. It does the following:
      • The two '  signs mark beginning and end of the “regex”.
      • "city":" are the signs that are directly before the searched string (here the cities’ names) that we want to extract.
      • With the part ((\\"|[^"!,])*)" the searched string (the city name) gets collected into our newly added field.

There are different kinds of data that could be extracted:

3.1 Extract numbers from JSON

You want to extract numbers from JSON? This is also possible.

  1. You can do this with following regular expression (regex) formula. Example here is a Longitude definition named lon:
    REGEXP_EXTRACT(Ereignisaktion,'"lon":(([^"!,])*)')
    With this regex formula we do not search for a string in brackets, but for a string without brackets. This is like numbers are defined in JSON.
  2. To be able to calculate with this new metric also, you will need to change the data fields type to “number”:

    Change data source field to type "number"

    Change data source field to type “number”

4. Add a new chart to your Data Studio Dashboard with the new data source field

Now you can add the new data to your Google Data Studio dashboard:

Data visualization with JSON extracted data via REGEXP_EXTRACT function

Data visualization with JSON extracted data via REGEXP_EXTRACT function

Some final words

In case you have any questions or annotations, please leave a comment.

About me: My name is Thomas Gerstmann. I help organizations to get their data driven marketing off ground. My focus is more on data-driven strategy and analytics & research as well as Conversion Optimization. But I also work sometimes – like documented here in this article – on technical questions.