You can remove URLs that contain a blacklisted word by using the Find Overlap step. You’ll want to keep rows from the CSV file that “match” your imported killwords.
To be considered a match, the Find Overlap step will search your URL column using the values in the Killwords column. These sidebar settings should do the trick:
The resulting URLs shouldn’t have killwords. Depending on your dataset, you may have to adjust the approximation percentage, but I found 20% worked well.
thanks @daniel. This kinda works but it’s very fuzzy and I am not really sure about the quality of result. I tested it with a file with roughly 10k rows and just changing the percentage from 20% to 19% produces massively different results without me actually knowing whats going on. ultimately what i need to know is if the exact string exists in an URL.
My guess is that a 20% match could also mean that I get a positive with “domain.com/blog” even though my killword is “/blog/”, or am I mistaken?
The results can definitely be fuzzy, so thanks for trying that out. @anon36387498 came up with a great solution that should be accurate. It’s not too different from the first flow.
Import your data tables and add an Insert Column step after each import. Let’s create a new column called Join and add a text value of 1. This will help us merge our tables.