Introducing the Find Overlap step

rosemary · July 15, 2020, 10:15pm

Hey everybody,

I’m Liam, a software engineer at Parabola. Today I’m pleased to introduce you to the Find Overlap step, which is a major upgrade from the powerful List Contains step you may be familiar with.

Find Overlap allows you to compare one table to another to see only the rows that match according to criteria you define. These criteria can be complex, so the step is configurable in plain English so that you can easily describe what data you’re looking for:

It’s live as of this afternoon, so please take it for a spin and let us know what you think. We think it’ll come in handy when you’re trying to reconcile large sets of data.

temirlan · August 3, 2020, 10:36pm

So how do we do the opposite of “find overlap”? I know that we could have done this with !list contains option previously.

brian · August 3, 2020, 10:49pm

Hey Temirlan!

Basically you invert the rules in your Find Overlap.

If you have 1 rule, you can change it to not match. If you have many rules, do the same thing, and switch to between using ALL or ANY of the rules - whichever you are using, do the opposite!

That should find you all of the rows that were removed by the initial settings.

THOMAS_A_HUSMANN · August 6, 2020, 4:19pm

Hi, I love the new feature. Been using it for so many things and can’t thank you enough for making it happen.

I may have encountered a bug though and was hoping to get some feedback or help with flow 68737. The goal is to identify the most recent purchase of any given item as it occurs in a list of line items from multiple invoices. THe list contains 4300 +/- lines with 50 columns. On one fork I dedupe based on UPC and combine the invoicedDate items into a single column separated by a comma, it produces around 1200 unique results. Then I break the comma separated values out into columns and flip the table and find the max value in each unique UPC’s column. Then find/replace " (max)" with nothing and flip the table back. Now I have UPC’s in column 1 (Group) and the highest invoicedDate value in column 2 (Total). The next step is where I have a problem.

When the find overlap tool is used to compare against the original list of invoice items using two parameters and “if all of the following are true” I only find 50 matches. The first rule says its a match if the UPC and Group match, and second says it’s a match if the invoicedDate and Total match. Considering all of the matches came from same dataset, I am expecting there to be 1200+ matches not 50. Is this a bug?

I did a few spot checks to see if perhaps the data got slightly offset, and so far every match from the deduped list of 1200 UPC’s matches up to a row in the original data.

I imagine that if this were a list of people and we were looking for marketing matches, some would be dismayed to find it left out so many potential candidates. What’s happening?

EDIT The Join feature appears to do the job I need. Can you explain the difference?

brian · August 6, 2020, 8:21pm

Hey Thomas!

That is a really fascinating path that you’ve described and built in that flow. If I am not mistaken, it is finding the most recent invoice for each UPC - is that correct? Using the largest numerical size of the invoice date per UPC to accomplish that.

If that is the goal, then take your list that represents 1 UPC and Timestamp per row, sort it by both Timestamp and UPC (both descending). The order you add the rules makes a difference. This should result in all of the same UPC’s grouped together, with the timestamps sorted in each “group”.

Then, you can use a dedupe on the UPC column to keep the row per UPC that has the largest timestamp value!

That should be a lot easier to maintain and read!

From there, your join method is probably better in this instance, since you want the information from both tables, but the Find Overlap and the Combine Tables (Join) should give you the same number of rows if they are set up in the same way.

Could you add the Find Overlap back next to your join so that I can see it producing different results?

THOMAS_A_HUSMANN · August 6, 2020, 10:03pm

Thank you for the response. I added it back in and labeled it as “For BRIAN” it has 52 matches and should have over 1200.

I like your solution and good timing too. The end result will be a report that shows each UPC, the most recent purchase date, the most recent purchase quantity, how many sold since that date, minimum order quantity, and the average sold per week. Using those variables I will draft a formula that decides whether it’s time to reorder then create the order with our vendor and add those items to it. I didn’t realize that the dedupe step takes the first value of the group when merge isn’t enabled. That’s good to know.

brian · August 7, 2020, 5:40pm

Hey Thomas,

Thanks for adding that in. Looking through it right now, and it does seem like something is amiss. I’m going to sync with Liam, who worked on this last, and see if we can’t figure out what is going on. I’ll keep you posted in this thread.

Glad you like the solution! Sorting and then deduping is a really powerful pattern to use when you need to reduce the number of rows in an arbitrary group.

Topic		Replies	Views
The Find Overlap step has been updated and simplified! Announcements	0	455	January 21, 2021
JOIN two tables and keep results that don't match Ask a question	3	642	November 28, 2020
Unable to find 'List Contains' function - is it deleted? Ask a question Building-Flows	3	468	October 29, 2020
Remove overlap from list Ask a question	2	403	December 11, 2020
Find rows not equal between two tables Ask a question Building-Flows	15	985	November 14, 2020

Introducing the Find Overlap step

Related topics