Trying to Combine Tables but getting settings error

Hi, I’m trying to combine tables but I get a settings error saying joins are limited to 5m rows, but there are only 24k rows. With a different file (4k rows) it did work. A calculation error popped up once as well.

I’m splitting up a csv file into 23 steps then combining again. So even if it’s 24k x 23 it’s still within all limits. I also tried combining just 2 of the steps as a test and that failed too. When I try it with a smaller file (4k rows) it works.

Something isn’t right, I’ve looked at all the hard limits, the data is within all those limits, looks like a bug?

I’ve attached a screenshot. Please note the Combine Tables step is unhooked at the moment, and there are 23 sets of calculations that you see going up vertically before they are re-combined. Though like I said, even trying to combine data from just 2 sets fails.

Thanks

Hi @Vanja - Not sure about the 5 million row limit error, but the calculation error most likely means that the step timed out when trying to complete the combine. This would depend on what criteria you were using for the matching in the combine step.

It sounds like for each 23 branches you’ve created, you’re adding new columns based on a set of rules. I imagine some of the columns on your original CSV are staying the same in every single branch, and when you combine the data back together, you want those new columns to be added to the right of your original CSV file.

Here’s what I recommend to make the Combine Tables step work a lot better for you:

  1. In the top branch, keep everything as it. This will become your “primary table” for the Combine Tables step.
  2. In all of your subsequent branches, add a Remove Columns step and select to keep, not remove, the unique ID column you’ll use the join the tables back together, and just the new columns you’ve created in that branch.
  3. Then, in your Combine Tables step, make sure to connect the top branch to that step first so that becomes your primary table. Connect all the other branches and use the unique ID column as the matching rule for all the tables. Make sure to set the matching rule to keep all rows in the primary table and only matches from all the other tables :
    Screen Shot 2020-09-02 at 10.22.19 AM

This will bring the tables back together and append the new columns you’ve created to the right of your original CSV file.

Let me know how that goes!

1 Like

@sachi thanks for your input. I tried doing as you said, and tested 5 branches but still get the same error. Here’s a screenshot

Thanks

Hmm that error message does seem weird. I believe you might be timing out (calculation error), but not hitting the 5 million rows limit. I’ve reported this as a bug because I think we’re displaying the wrong error message in this case.

I think I figured out a way to help make your flow more efficient. The main thing to change is to remove the blank rows that do not need to be “recombined” in the Combine Tables step.

Here are my suggestions:

  1. After your CSV step, add a Remove Columns step to filter for just the columns you’ll need for your 23 branches. Reducing the number of columns passing through each branch will help your flow run more efficiently.
  2. Then, instead of the If/Else step where you’re creating your separate branches, use a Remove Rows step instead. This will reduce the rows of each branch to only the rows that previously applied to your If/Else rule.
  3. Then, keep your math operation and make sure to remove any duplicate columns at the end of each branch.
  4. When you’re ready to combine the tables back together, you’ll actually use your original CSV file as the primary table now. Then, add all the branches back in by matching Variant SKU. This way, only one table (your CSV file) will have the total rows and all your branches will only be joining back the rows that are affected by your rules.

This should help make the flow more efficient. If you need more detailed instructions on how to set this up for your specific flow, shoot us an email at help@parabola.io!