How do find common keywords

I have a list of product titles (rows) I want to analyze. Is there a way to find common keywords between these rows?

Here is an example
Row 1: iPhone XS Cover
Row 2: Cover for iPhone S
Row 3: Android Mobile Phone Cover

Is there a way I can find most common keywords among all rows? In this case result would be Cover - Count 3, iPhone - Count 2

Hey Saad!

How many rows do you have? If its not too many, you can use a Column Split, set to split into new ROWS, and set the delimiter to a space.

That will make 1 row per word.

Then you can use the Count Values to count the column of words, and it should give you a count of each word present!

But if you have a lot of rows to start, it will quickly become too many rows.

Brian, I just tried it and it works like it should but I’m not getting the results I was hoping for. I’m trying to analyze Product Titles on multiple stores and find products which are on multiple stores. When I break each product title in to multiple words, the data is gibberish at best. Is there any way to process this to find keywords instead of just words?

Hi saad - Brian may have a better answer since I’m just a Parabola user, but I wanted to make sure you are aware of the concept of “n-gram” analysis as it can be helpful when it comes to grouping words together for analysis.

I briefly searched and found this API endpoint which you may be able to use to turn your strings (e.g. “Android Mobile Phone Cover”) into n-grams (e.g. “Android Mobile”, “Mobile Phone”, “Phone Cover”, etc.) which you could then potentially use a similar process as Brian outlined to find the n-grams with the highest count.

2 Likes

@John_Doe is right - if you want to distinguish between “filler” words and “keywords”, n-grams are pretty much the route to take. You can use an API, or you can try to define rules in Parabola to omit some of the more common filler words.

It will be a tough process, though, and is certainly prone to inaccuracies and subjective results, so keep that in mind!

1 Like