Using Parabolas description literally: Remove duplicate rows based on a key column
The base functionality of this step is easy to grasp - remove duplicates. To do that, you need to specify the column that should be used to determine if any rows are duplicates of each other.
I am trying to remove duplicate phone numbers from records. In an effort to simplify it to the most basic level I’ve split the phones column into rows and included name, address and 1 phone per row as column values.
As an example:
John Doe 123 Main St 704-555-1212
John Doe 123 Main St 704-555-1234
John Doe 123 Main St 704-555-1212
Then I’ve asked for a dedupe on the phone column values which should eliminate any row that would have a duplicate phone but it’s not. From my understanding, in the example above I should end up with 2 rows remaining (the second 555-1212 phone row should be removed)?
I would appreciate any clarification on this logic. Thanks