Extract word from column of numbers

I have imported an excel sheet with multiple columns, which I have reduced down using different actions. I now have 3 columns, one of which is almost all numbers, with the exception of one cell that is a word. I am wanting to extract that word and put it in another column and the fill that column with that word. I am having trouble working out the exact regex formula. I have used the \w with [A-z] which shoudl only extract letters and not numbers, but it keeps extracting the numbers as well. I can only think that because in subsequent rows there are no letters that it is extracting/coping the cell over to the new column. I am wanting to stop that, any help would be appreciated.

Hi Justin :wave:

Happy to help! It seems like Regex might not be totally necessary for this extraction. I recreated your data to work through your use case here:

After importing my data, I used an Insert Text Column step to copy my original values into a new column.

I next used the Clean Data step to remove all numbers in that column, leaving me with just the word we hoped to extract.

Finally, I used the Fill in Blanks step to fill blank values in the “Test Values” column with values from both above and below, which populates all cells in that column with your desired word.

Hope this helps! Let me know if this solution works for you.

2 Likes

Thanks Adam_R, That worked a treat. I have another issue you maybe able to help with. It is in the same data. I also have a cell that has a number and letter combination, in the format of SM12345, which I also need to put into its own column and copy down the same as you did with the test values. The fix you supplied me with works when it is just cells of numbers or letters but this is both. Would you have a way to achieve this?
thanks in advance

Hi Justin,

Stepping in for Adam here. There may be a better regex expression, but here’s a workaround that could work.

This starts after using the initial Insert Text Column step mentioned above. We’ll start by using the Use regex step and target any values that do not start with SM and replace it with *

Next, we’ll use an Insert If/else column step. If the example column does not contain *, extract that value into a new column.

Finally, we can use the Fill in blanks step to fill in the cells using values from above and below.

Feel free to remove or rename any of the columns to further clean your data!

Thank you, this worked a treat on two of my flows. I am now having issues with it, unsure why, I assume there is a space or something before the letters.
Is there a regex expression I can use to isolate a reference number that is 2 letters and 4 numbers long, in the format of LL1234.
The expression above works well if the 2 letters at the start are the same, but if the letters at the front are different then it obviously doesn’t find them.
For example the reference numbers could be LF1234 or LF3452 and then in the same file there could be reference numbers that are QG2341 or QG3243 or ST6756 etc etc.
I have looked at regex sites to try and work out the formulas, but anything I come up with doesn’t work in parabola.

Hey Justin,

This should be the regex expression that you are looking for:

[a-zA-Z]{2}\d{4}

If your data includes only uppercase letters, you can instead use this expression:

[A-Z]{2}\d{4}

The beginning of the expression, [a-zA-Z]{2}, targets two letters at the start of your string. \d{4} identifies 4 digits after the characters. Let us know if this works for your dataset, and if you have any additional questions!

1 Like

Thanks Adam,

That mostly worked. I had to change the expression slightly to ^[^A-Z]{2} and remove the \d{3} for it to work for what i needed. In the column I am using the expression on it has 4 different types of information. The majority of the cells are dollar amounts in the format of $11.11 or if there is no $ value it shows up as $- and then there is one cell of all letters as a name like CHICKEN and the cell I am trying to keep will most likely be a cell of 2 letters and 4 numbers like mentioned in the last post NP2345.
When I use the expression as you sent it I get the replacement value of a * in the letter number cell as in the screen shot below
[A-Z]{2}\d{3}
Screen Shot 2021-05-11 at 06.18.23

If I use the adjusted formula I get more of what I want, although as you can see the word Beef is still there as well. Not that it matters to much because later in the flow I fill the column from above and the number letter combination fills down the column
^[^A-Z]{2}
Screen Shot 2021-05-11 at 06.23.52

So I am unsure if I need to it to recognise the 4 numbers after the two letters, but in case I do is there a way for it to leave a combination of letters and numbers?
Thanks for your help

Hi @Justin_Saunders,

If I’m understanding correctly, you want to fill the entire column using the value of NP2244. To do this, you can set up a separate regex step with 2 rules that create two columns:

Column 1

Find:
[^A-Z]{2}

Replace:
*

Column 2
Find:
[A-Z]{4}

Replace:
*

From there, we can use an if/else to create a new column called Do not target. If the values in Column 1, contain ฿, set the value to Yes. If the values in Column 2 contain *, is blank, or contain -, also set the value to Yes

We can use one more if/else step to target the values in your Do not target column. If the value is blank, set it to the value in {Column 2}. That should create a new column that only has the value NP2244.

Lastly, use another step to fill in blank values from both above and below. The resulting column should show just that single value throughout the column.

If you have any additional comments about this flow, feel free to reach out to us at help@parabola.io and we can work through this together.

Thanks Daniel,

thank all seems to work as I need it to.