Combine tables > Match with regex

I need to combine two tables, matching Product Titles. Spreadsheet A has the official Product Titles, and B has product titles that are kinda messy, e.g.:

Sheet A: Gridiron
Sheet B: GRIDIRON LQC 019

Sheet B is all caps, and has extra characters after the product name.

Even when there are no extra characters, the cells don’t match because it’s case sensitive.

Some of the product titles are actually in all caps, so if possible I’d like to avoid changing Sheet B to Title Case.

Is this possible with Parabola?

Hey Nathan,

At this moment, your most stable path forward will be to clean those messy names. We have a step that can change the casing of a column, which will help, and the other stuff will probably need to be done with the Find and Replace step (or the Regex step). It will be a little tedious to clean up all possible options, but the flow will have a stable way in which it is cleaning and then matching your product names.

Hope that helps!

Hi Brian. I’m okay with putting everything into Title Case.

But even after that, there are 250+ unclean Product Names that I would have to put into a Find+Replace. As new products are coming out all the time, in order to make it more future-proof I’d need 2 transforms:

  1. a regex find/replace to “replace everything after [word] including [word]”. I’ve tried a few things but regex isn’t my strong suit.

  2. for the rest I can easily copy/paste the long list of messy names and type the proper name, but want the best UI for the job. maybe can I make something like a Find-Replace table with two columns, first column is Before and 2nd column is After, and then find/replace the Product Titles column using that sheet? or something like that.

Hey Nathan,

Can you give me a few examples of “replace everything after [word] including [word]”? I can try and form some regex for you.

The “text extract” step can be a useful tool as well if there are certain characters or word counts that you can rely on to trim down the name in certain cases.

Another strategy is to use the find and replace or regex to remove the unwanted words. Sometimes those are more likely to repeat than the words that you do want.

Hey Brian, thanks for your time on this.

  1. here are some examples:

Block Party ← correct
Block Party Insd 100 ← unclean, has common word “insd” with random characters after it

Colossus ← correct
Colossus Lqc 028 ← unclean, has common word “lqc” with random characters after it

Then I’ll run Clean data after it to remove trailing spaces, unless you can somehow include that into the regex.

  1. here is a small example of why I need a more efficient Find/Replace:
Add Distribution for Polsat 20
Add Distribution for Public Pe
Add Distribution for Radio Lub
Add Distribution for Radio Reg
Add Distribution for Radio Szc
Add Distribution for Radio Wro
Add Distribution for Tv Polsat
Add Distribution for Tv Puls 2
Add Distribution for Tv Tvn Tu
Add Distribution for Tv Tvp 20
Add Distribution for Tvn Turbo
Add Distribution for Tvn24 201
Add Distribution for Tvp 2013
Add Distribution for Tvp 2016
Add Distribution for Tvp 2017
Add Distribution for Tvp Sa 20
Add Distribution Public Perfor
Add Distribution Puls Tv 2014
Add Distribution Radio Reg Gda
Add Distribution Region Rozgl
Add Distribution Tv Iti Neovis
Add Distribution Tv Polsat 201
Add Distribution Tv Puls 2018
Add Distribution Tv Tvn Turbo
Add Distribution Tv Tvn24 2018
Add Distribution Tv Tvp 2018
Add Distribution Tvp Sa 2018
Additiona Distribution Tvp S A
Additional Distribution Iti Ne
Additional Distribution Polsat
Additional Distribution Public
Additional Distribution Puls 2
Additional Distribution Radio
Additional Distribution Tv Pol
Additional Distribution Tv Pul
Additional Distribution Tvp 20

I need these to all be “Additional Distribution”. Can I do this in one Find/Replace rule by having multiple search terms and one replace term? (Is boolean OR supported? I tried a few things with no luck.) There are about 100 of these, so while one rule for each is possible, it’s messy and will get even messier over the years.

For the regex, try:
((Insd|Lqc) .+)
You can add more common words next to the existing ones in the expression - just separate each one with a | character. In the replace field, leave it blank.

For your second question, the current version of the Find & Replace step allows you to add as many search terms as you want to be replaced by a single term! So 1 rule would do all of those, which I think will help a lot.

You can also click the … menu in those rules to search for the entire cell or just part of the cell, and same with the replace - replace the entire cell (as you probably want) or just replace the piece that was found.

1 Like

Thank you! Worked great (just had to remove the space) and I just realized I can use regex to remove everything after “Additional” in the example above, after a Find/Replace that changes “Add Distribution” or “Additiona” to “Additional”.

I love Parabola!

3 Likes

How do I get the regex to support the number 30 for example?

(I know this isn’t a regex forum but it may help someone :slight_smile:)

Also I found a bug in Regex transfer. To recreate:

Use (?<= in the Expression, close the window and come back, the < changes to &lt; and for some reason sometimes it breaks too, even after changing it back to <… I think if you Update Live then come back it breaks, not sure.