Newbie joining tables question

Hello, new user is here. Hope someone can help. I have two CSV files: one is orders report with order ID in each row and one with order line items report, where each row has item ID and order ID it belongs to. For one order ID there could be several rows in second CSV file, each would have different item ID.
Now the goal is to add columns Item1, Item2 and Item3 to first CSV file (order report), which will be fed from second CSV file.
I hope I make sense with my explanation, I would have attached two CSV files as an example, but looks like it is impossible to attach anything but images.
Thank you in advance.

Hi Gregory,

If I understand your problem correctly, there are multiple ways you can solve it with Parabola. In my opinion, the easiest might be to start by turning the item ID rows into item columns and then joining the two files together based on order ID.

I’m just guessing what the data in your second file looks like, but I mocked up an example using these three steps:

Column Filter settings:

Merge Values settings:

Column Split settings:

Once you’ve gotten to this point, you should be able to add a Join step where the primary table is your first file and you join this cleaned up version of the second file onto it.

Let me know if this makes sense and if I’m understanding your data correctly!

Thank you so much for the very thorough answer. I have some additional questions:
The first table already has Item1, Item2, Item3 until Item10 columns. Is there way during JOIN to copy values from second table instead of just appending new columns?
Also is there any way to address the variable number of items? In some source files there will be only 1 item per order, next day there may be orders with 5 different items per order. I do not want to modify the flow every day…
Thank you again for answering my questions.

For your first table that already has the columns, Item 1, Item 2, Item 3, etc, do they already have values in them? Or is it just the column headers? If you can attach a screenshot of the current structure of your two tables, that’ll help us suggest the best solution for joining. (Feel free to black out any private information).

Not sure how to make such a screenshot, there are many columns unrelated to the discussion and it does not fit into one screen :slight_smile:
In general, table 1 contains columns OrderID with actual unique order numbers and empty columns Item1, Item1Quantity, Item2, Item2Quantity, Item3, Item3Quantity etc up to Item10, Item10Quantity.
Table 2 contains non-empty columns OrderID (non-unique), Item, ItemQuantity.
The goal is to fill ItemX and ItemXQuantity in table 1 with actual data from table 2.
Thank you for your help.

Thanks for the additional info. The matching up of the columns across Table 1 and Table 2 are difficult because of the variable changes this Parabola flow will need to support every day. I created a dummy flow to demonstrate how you can build something that’s close to what you’re looking for (just the column names are different).

I know that flow looks complicated, so I’ll explain step-by-step.
I’m envisioning your Table 1 looks something like this:

I’m envisioning your Table 2 looks something like this:

So with those 2 sources as my initial imports (represented by the two Google Sheet Import steps):

  1. Use a Column Filter step to only keep the Order ID column. This is what I was mentioning at the beginning about not being able to match up your existing column names.
  2. Use a Join step to match up the tables by Order ID. Your results should look like this:
  3. Use the Merge Values step to merge Item and Item Quantity columns:
  4. Use a Column Rename step to rename Item (concat) to item and Item Quantity (concat) to item quantity.
  5. Use a Column Split step to split the item column into multiple columns:
  6. Use another Column Split step to split the item quantity column into multiple columns. The New Column Names should be item quantity instead.
  7. Use the Unpivot step and select the column Order ID as the Unique Identifier column.
    Now, this next section is how we’re going to reorder the columns correctly
  8. Use the Row Filter step and filter to include rows in Type if they do not contain quantity.
  9. Use a Row Number step to add a column of row numbers
  10. Use a Math step to create a new column name called Sort Order.
  11. Use a Column Filter step to remove the Row Number column.
  12. Repeat steps 8-11 for the rows where Type does contain quantity. You should be able to duplicate all the other steps.
  13. Then use a Table Merge step to stack the two branches back together
  14. Use a Sort step and choose to sort by the column Sort Order where order is ASC.
  15. Use a Column Filter step and remove the column Sort Order.
  16. Use a Pivot step to pivot the table back. You’ll see the the columns for item and item quantity are now in the desired order.

That’s it, just choose the destination step that makes sense to you depending on where you want the cleaned data to go. As you can see in the screenshot above, the column structure is similar to the one that you requested, but the column names do not match your original naming structure. You could always add a Column Rename step to fix it, but this is where the manual updates would be required from you. If you’re not particular about the column names, this suggested solution should handle itself nicely when new csvs are uploaded.

Phew! I know that was a lot of information so let me know if I can help clarify anything!

Thank you very much! I was able to replicate your line of thought with several modifications.

  1. Since table 1 contained more information per order (name, address etc) on Step 3 while Merging Values I had add these extra columns to the list “Create a new row for each unique value in the following columns…” section. Otherwise it just threw them all away.
  2. In step 5 I used “New Column Names (Optional)” section to give names Item1,Item2,Item3 etc until Item10. This way it created names exactly like I need column names to be! Similar workaround I did for step 6. For Quantities.
  3. Since I had more columns than needed, on Step 7 during Unpivoting I had to add them to “Pivot all columns Except” list.
  4. Since Order ID can contain letters, on Step 10 instead of Math I had to use Text Merge with text “{Row Number}_{Order - Number}” in order to create Sort Order column.
    Now I have another question :slight_smile:
    Let’s consider table 2 as before: Order, Item, Item Quantity.
    I would like to “cross” it with reference table 3, which contains the following information:
    Item, Item Category, Country of Origin, Cost
    What I need to do is take table 2, replace data in Item column with Item Category from table 3, and add column with Country of Origin and column with Cost (also taken from appropriate row in table 3).
    Is it possible?

Thanks for detailing the modifications. Glad you were able to get it to work with your own data!

One note I want to make about your modifications. For #2, the column names should work for now, but if you ever have orders that exceed the column names that you’ve mapped out (basically anything with more than 10 items) the column names will revert to old naming. Giving you a heads-up in case you encounter this issue in the future.

Let me make sure I understand the final output you’re looking for after cross-referencing Table 3:

Your final column headers would look something like Order ID, Item Category 1, Item Quantity 1, County of Origin 1, Cost 1, Item Category 2, Item Quantity 2, County of Origin 2, Cost 2.

Am I understanding that correctly? If so, generally, I would suggest joining that third table with the first Join step.

Not exactly. I would like to keep Item1 column header, just put there Category data instead. The rationale is that consider the final output kind of a manifest. In manifest it does not matter whether it is t-shirt, undershirt or a sweatshirt. Just shirt is enough.
I think If I had only one Item column, I could have joined, delete Item column, rename Item Category column to Item column and be done, but I have variable number of Item columns. How can I overcome this problem? Maybe I should do this step somewhere in the middle of your previous steps? Maybe just after step 2? Not sure how it will affect all subsequent operations…

Got it. I would suggest importing Table 3 and then joining with Table 2 first using a separate join step where you match Item from Table 2 and Item from Table 3. That Join step should append the columns Item Category, County of Origin, and Cost to each of your rows.

Then proceed with the Join step you’ve already configured to join Table 1 and 2 by Order Id. Once you get there, you’ll need to make adjustments to the rest of the flow to account for these additional columns (County of Origin and Cost), but your instinct is correct to just rename Item Category column to Item and remove your original Item column.

Looks like this will do the job :scream: . I did not expect my first project to be something like this :slight_smile: