In some cases it happens that you need to collect several data points into one single field. In this article I’ll share with you how to extract such data from a JSON string with Google Data Studio. Key benefit is that you have maximum flexibility to access all data points contained in such a string. To achieve this, In the following how to, I’ll show how to extract data with the Google Data Studio regular expression function called “REGEXP_EXTRACT”.
- Problem & challenge
- Solution & outcome
- How to extract data from a JSON string with regexp_extract
- Some final words
For one of my clients, we are collecting a huge amount of data within a single event in Google Universal Analytics. It’s a results data set of a configurator tool. For several reasons, we decided to send a complete JSON string with all configurator results into the Universal Analytics event action data point.
As a next step, we had the challenge to unravel that data from that JSON string again.
From this, we came up with the solution introduced below.
Here’s how we did it before developing this new approach. We worked with the function “CONTAINS_TEXT” and the following data studio resource custom field construct:
But there are some constraints coming with this solution:
- Only a few parameter values can be added the manual way. In case of above shown example of cities’ names, one can imagine there are way more cities that users can come from.
- Use the data as dimension, not as metric. Thus, this is not so flexible.
- Couldn’t extract values with many different values. (Show example of 0.12345 value)
A great way to accomplish this is Google Data Studios Regular Expressions function: REGEXP_EXTRACT.
You can find Google’s documentation of the function here.
The following outcomes come with that approach:
- More flexibility
- collect *all* values contained in the JSON string
- Do calculations with numbers coming from a JSON string
Two more annotations before we start:
- This also works with any other format in which data is packed into a single field. For example comma separated values (CSV).
- This article is not about how to connect a JSON data source to Google Data Studio. To this topic others have already delivered solutions like Garrett Vorce: “Google Data Studio Tutorial with JSON” on Medium.
So, let’s now have a look into the details, how to achieve this:
How to extract data from a JSON string with the Regular Expression function “REGEXP_EXTRACT”
(Short summary of how to do it)
1. Pre-Requesites: Setup a Google Data Studio dashboard and connect your data source
You can read here at XYZ how to accomplish that.
2. Add a new data source field
- In Google Data Studio, click in the menu on “Resource”
- Then click on “Manage added data sources”
- Find your data source
- Click on “Edit”
- Click on “(+) Add a field”
3. Use regexp_extract to extract the data out of the JSON string
3.1 Extract a text string from JSON – with a Regular Expression
- Now, in the “Add a field” dialogue, first give your new field a name.
In our example, we want to extract the information “city” from the JSON string. So we can call the field i.e. “City (JSON)”.
- Then it’s time to enter the formula:
- So now, what does this formula do in detail? We’ll have a look:
REGEXP_EXTRACTthe function gets called
Ereignisaktionis the (german) name of the Universal Analytics event tracking data field, in which the JSON string is stored. (In english it would be
- The key part now is the regular expression (also called “regex” or “regexp”):
'"city":"((\\"|[^"!,])*)"'. It does the following:
- The two
'signs mark beginning and end of the “regex”.
"city":"are the signs that are directly before the searched string (here the cities’ names) that we want to extract.
- With the part
((\\"|[^"!,])*)"the searched string (the city name) gets collected into our newly added field.
- The two
There are different kinds of data that could be extracted:
3.1 Extract numbers from JSON
You want to extract numbers from JSON? This is also possible.
- You can do this with following regular expression (regex) formula. Example here is a Longitude definition named lon:
With this regex formula we do not search for a string in brackets, but for a string without brackets. This is like numbers are defined in JSON.
- To be able to calculate with this new metric also, you will need to change the data fields type to “number”:
4. Add a new chart to your Data Studio Dashboard with the new data source field
Now you can add the new data to your Google Data Studio dashboard:
In case you have any questions or annotations, please leave a comment.
About me: My name is Thomas Gerstmann. I help organizations to get their data driven marketing off ground. My focus is more on data-driven strategy and analytics & research as well as Conversion Optimization. But I also work sometimes – like documented here in this article – on technical questions.