Hey John! I certainly can help you with this. It’s important for context for everyone to have same starting point for this question.
Incoming wall of text!
When you pull in data via an API Import, most likely you are pulling in a list of things. Let’s say a list of products.
That initial pull of data is almost guaranteed to be held in an array. Arrays in JSON (the type of notation used by most APIs that one would use in Parabola) are wrapped in [square braces]
and simply say, I am a list of things.
The API Import, sometimes on its own, and sometimes with a little prodding, will unpack
that array into a friendly table. Many times, that is all automatic and no one has to know it happened. You see a table of products and you are satisfied!
But other times, like in this case, each Product may have nested arrays
within in it. What if every product contains a list of all of its variants? That’s a common way that product data is structured. I have a product called hat
and it has a few variants, such as: red hat, white hat, blue hat
. That data is held in another array. Each product has a column called Variant, where there is another array in each of the cells - you’d see something like [{"name":"red hat"},{"name":"white hat"},{"name":"blue hat"}]
.
We want to get that data out of that JSON array and into something more friendly. So we look to the JSON Flattener for this! You pick that column to flatten, and then, once you update, you see a new option, to Expand List into New Rows
. When you use that option, all of a sudden, you have duplicate rows.
Specifically, you will see that the hat
row now has 3 rows - 1 for each variant. What that JSON Flattener did was change your frame of reference. Instead of 1 row representing 1 product, your data now has 1 row representing 1 product+variant. red hat
, white hat
, and blue hat
are each on their own row.
Why is this done? It’s because arrays can have any number of items in them. They are lists, and a list can have any number of things in it. Let’s say that instead of making new rows, the expansion of an array created more columns. Each had variant lived in its own column. So this hat row remains as 1 row, with 3 additional columns. Maybe they are called Name 1
, Name 2, and
Name 3`. That totally works.
But remember we don’t just have the hat row - we have a full list of products that we are applying this to. So the next row is called sticker
and it has 30 product variants! This new process now generates 30 more columns. To keep the table “square” (more like a rectangle), every row now has those columns. So we have Name 1
through Name 30
, even for products like the hat, which only has data fro 3 of them.
Okay that is still kind of working, its just a little messy. So you then use a column filter, and say, I just want to see the column called Name 10
because it has a piece of data that I want. The next time this flow runs, and your data has been updated, you may only have products with 2 variants each. Now that column filter cannot find Name 10
and it throws an error and stops your flow. That’s not good. Additionally, the order of items in an array is not guaranteed by APIs, so Name 10
might be the purple sticker one day, and the blue sticker another day. That kind of inconsistency is not great.
So in order to avoid the mess from above, we move array items into new rows! It means your column names are always guaranteed, and the table still represents the same overall structure.
That was a lot, I know.
So what do we do about this? To take an example from your actual data, John, you have a list of products to start, and each product has a list of measurements. Most items have 1 set of measurements in that list, so those rows stay as a single row. But some products are actually multiple items, each with their own measurements. Let’s say you have a table that comes with a chair. Its 1 product, but there is a set of measurements for the table and for the chair. So the JSON Flattener puts that on two rows.
The question then is not how to stop it from making new rows, but more of what do you want to do with those rows - how is your destination system set up to store this information?
In other words, can you describe the ideal shape of your data at the end of all of this?