Merge Columns and remove duplicates

I possess a collection of image URLs obtained through web scraping. Occasionally, the scraping tool duplicates certain image URLs. In organizing these URLs into columns labeled Image 1, Image 2, and Image 3, I devised a new row named “Images,” where I consolidated them using a delimiter. However, I am encountering difficulty in eliminating duplicate entries within these rows. I know its probably something simple but I’ve tried all the remove duplicates even AI and can’t seem to remove the duplicate Image URLS.

Hi @mack - Happy to help! Are the duplicate URLs across the different columns and rows before they’re combined?

Yes, the were in individual columns but in the same row. I split them into one single column using a delimiter.

Hi @mack,

If I understand correctly, duplicate image URLs appear in your final Images column. To remove duplicates, I recommend using the following steps after your “Edit columns” step:

  1. Remove duplicates (based on the Page_URL)
  2. Unpivot columns (Page_URL is the unique identifier. Pivot the individual Image_URL columns)
  3. Merge duplicate (merge Value based on unique Page_URL)

Note: Make sure these values are checked to guarantee duplicate Image URLs are removed:

image

Feel free to copy/paste the snippet below to duplicate the step sequence and configuration listed above:
parabola:cb:19429321-5b70-4ff9-bc3f-2b4cbf25f89d

Let me know if this helps!

1 Like

Thanks so much for helping me out!