Regex: Extracting data from string

I am trying to extract data from a string using regex and have that extracted data become the value of the new column.

I have read through https://parabola.io/transform/use-regex and many Q&A from the community, but cant quite find the answer to my problem.

The string is a URL
Example: https://www.test.com/item-name-t3-2?source=twi&medium=twi%6tFeed&campaign=WM+-+6%36+YAY+Targeting+4.3+%e6%8b&content=93746788475070651&ad_id=6384778084577065&twiid=PBCa8s4MYMdOFo1zYcHMKUXRaXFh6Zkhq%5fBuvlHPLnZbfI6Y%5em

My expression works on the site: https://regexr.com/
My Expression: (?<=ad_id=)\d*|(?=&)
Which would result in capturing the ad_id 6384778084577065

The Problem:
When I make my regex call it is just removing the ad_id from the url and inserting the new url as the value for the new column.

URL Column= https://www.test.com/item-name-t3-2?source=twi&medium=twi%6tFeed&campaign=WM+-+6%36+YAY+Targeting+4.3+%e6%8b&content=93746788475070651&ad_id=6384778084577065&twiid=PBCa8s4MYMdOFo1zYcHMKUXRaXFh6Zkhq%5fBuvlHPLnZbfI6Y%5em

NEW Column = https://www.test.com/item-name-t3-2?source=twi&medium=twi%6tFeed&campaign=WM+-+6%36+YAY+Targeting+4.3+%e6%8b&content=93746788475070651&ad_id=&twiid=PBCa8s4MYMdOFo1zYcHMKUXRaXFh6Zkhq%5fBuvlHPLnZbfI6Y%5em

I am wanting the NEW column to be: 6384778084577065 and if no ad_id is found leave column blank

Hi @Noah_Orr,

Welcome to the community :wave:

The easiest way to go about this is to use the “Use regex” step and the “Extract text from column” step. This should work nicely if your URL parameters are separated by &.

Start by using the regex expression (.*)ad_id=. This will capture everything up until the ad_id value.

Next, we can extract all characters before the first instance of & into a new column.

This should work for all URLs following that format. If a URL does not contain an ad_id, the final column will remain blank.

Let me know if that helps!

Got it! Thank you for the help!

It looks like you don’t need to use regex, you can simply use “Extract text from column” twice.

On the first extract to remove the first part of the URL (Everything before the ad_id).
image
Which will return: 6384778084577065&twiid=PBCa8s4MYMdOFo1zYcHMKUXRaXFh6Zkhq%5fBuvlHPLnZbfI6Y%5em

and on the second extract you remove everything after the ad_id.
image

Which returns: 6384778084577065

I am not sure if that will affect performance, but for someone reading this solution and doesnt understand regex very well. This is an easier solution.

1 Like