Flattening this nested JSON structure?

Hello there!

I am attempting to flatten this JSON output so that I have this end result table (OK to have extraneous columns).

keywords|values (these are the column names)
rugby|10
cricket|60

However, I have only successfully achieved these two unwanted results:

keywords|values (these are the column names)
rugby|10
rugby|60
cricket|10
cricket|60
This example shows incorrect combinations of keywords & values

keywords|values (these are the column names)
rugby,cricket|10
rugby,cricket|60
This example shows failure to denote the separate keywords in the result

Below is the JSON output I am working with. I think the problem might stem from the fact that the ‘keyword’ and ‘values’ are at different “levels of the hierarchy.”

 "result": [
        {
          "keywords": [
            "rugby",
            "cricket"
          ],
          "type": "trends",
          "location_code": 2840,
          "language_code": "en-US",
          "check_url": "https://trends.google.com/trends/explore?hl=en-US&cat=3&gprop=youtube&geo=US&date=today%2012-m&q=rugby%2Ccricket",
          "datetime": "2020-03-25 11:44:16 +00:00",
          "items_count": 6,
          "items": [
            {
              "position": 1,
              "type": "google_trends_graph",
              "title": "Interest over time",
              "keywords": [
                "rugby",
                "cricket"
              ],
              "data": [
                {
                  "date_from": "2019-03-31",
                  "date_to": "2019-04-06",
                  "timestamp": 1553990400,
                  "values": [
                    10,
                    60
                  ]
                },
                {
                  "date_from": "2019-04-07",
                  "date_to": "2019-04-13",
                  "timestamp": 1554595200,
                  "values": [
                    16,
                    58
                  ]
                },
                {
                  "date_from": "2019-04-14",
                  "date_to": "2019-04-20",
                  "timestamp": 1555200000,
                  "values": [
                    11,
                    55
                  ]
                }
            ],

I have tried things like this:

  • Set JSON Flattener to flatten the ‘result’ column into ‘api.result > items > data’. However, this just gives me the separated data values but fails to show me the separated keywords:

keywords|values
rugby,cricket|10
rugby,cricket|60

Do you happen to see where I am going wrong? I would greatly appreciate any solution I may be overlooking. Thank you!

This is a tricky one!

As you pointed out, there are two levels of hierarchy here, and the association that you are trying to enforce is only represented by the order in which things appear in those two arrays you are trying to match up.

So knowing that, we need to process this in two branches and join it back up using the order as the key.

Assuming you are starting with the results array expanded:

  1. Use a JSON Flattener to flatten and then expand the items array
  2. Send the output from that JSON Flattener into two more JSON Flatteners, in parallel branches.
  3. In the 2nd JSON Flattener, flatten and expand the items.keywords array
  4. In the 3rd JSON Flattener, flatten and expand the items.data.values array
  5. After each JSON Flattener (2 and 3), add a Row Numbers Step
  6. Join the two branches together from the Row Numbers steps, using the Row Numbers columns as the keys. Use all default settings in the Join.

That should do it!

2 Likes

Thanks Brian, this is a brilliant solution. It worked great.

I really appreciate your incredibly fast and thoughtful reply!

3 Likes