I possess a collection of image URLs obtained through web scraping. Occasionally, the scraping tool duplicates certain image URLs. In organizing these URLs into columns labeled Image 1, Image 2, and Image 3, I devised a new row named “Images,” where I consolidated them using a delimiter. However, I am encountering difficulty in eliminating duplicate entries within these rows. I know its probably something simple but I’ve tried all the remove duplicates even AI and can’t seem to remove the duplicate Image URLS.
Hi @mack - Happy to help! Are the duplicate URLs across the different columns and rows before they’re combined?
Yes, the were in individual columns but in the same row. I split them into one single column using a delimiter.
Hi @mack,
If I understand correctly, duplicate image URLs appear in your final Images
column. To remove duplicates, I recommend using the following steps after your “Edit columns” step:
- Remove duplicates (based on the
Page_URL
) - Unpivot columns (
Page_URL
is the unique identifier. Pivot the individualImage_URL
columns) - Merge duplicate (merge
Value
based on uniquePage_URL
)
Note: Make sure these values are checked to guarantee duplicate Image URLs are removed:
Feel free to copy/paste the snippet below to duplicate the step sequence and configuration listed above:
parabola:cb:19429321-5b70-4ff9-bc3f-2b4cbf25f89d
Let me know if this helps!
1 Like
Thanks so much for helping me out!