New parabola user here. Is there an easy way to strip html from a column and turn it into plain text?
Many thanks in advance…
New parabola user here. Is there an easy way to strip html from a column and turn it into plain text?
Many thanks in advance…
Hey Brian,
Your best bet is to use a Regular Expression in the RegEx step.
I looked at the community posted expressions on https://regexr.com/ and this one seems best:
(<script(\s|\S)*?<\/script>)|(<style(\s|\S)*?<\/style>)|(<!--(\s|\S)*?-->)|(<\/?(\s|\S)*?>)
Try that out in the expression box. Set it to replace with nothing.
If you have non-tag things you need to remove, like non-breaking spaces
then you will need to take care of those separately.
If you want to extract the text content of a full webpage, using @brian’s regex is pretty safe.
If your columns just contain very simple html snippets, and you know there aren’t <script>
and <style>
tags, you might be able to use a simpler expression:
<.*?>
Brian & Zach - thanks so much! Both approaches seem to work.