Friday, May 7, 2010

Remove duplicate records/objects uniquely identified by multiple attributes

Programmer Question


I have a model called HeroStatus with the following attributes:




  • id

  • user_id

  • recordable_type

  • hero_type (can be NULL!)

  • recordable_id

  • created_at



There are over 100 hero_statuses, and a user can have many hero_statuses, but can't have the same hero_status more than once.



A user's hero_status is uniquely identified by the combination of recordable_type + hero_type + recordable_id. What I'm trying to say essentially is that there can't be a duplicate hero_status for a specific user.



Unfortunately, I didn't have a validation in place to assure this, so I got some duplicate hero_statuses for users after I made some code changes. For example:



user_id = 18
recordable_type = 'Evil'
hero_type = 'Halitosis'
recordable_id = 1
created_at = '2010-05-03 18:30:30'

user_id = 18
recordable_type = 'Evil'
hero_type = 'Halitosis'
recordable_id = 1
created_at = '2009-03-03 15:30:00'

user_id = 18
recordable_type = 'Good'
hero_type = 'Hugs'
recordable_id = 1
created_at = '2009-02-03 12:30:00'

user_id = 18
recordable_type = 'Good'
hero_type = NULL
recordable_id = 2
created_at = '2009-012-03 08:30:00'


(Last two are not a dups obviously. First two are.) So what I want to do is get rid of the duplicate hero_status. Which one? The one with the most-recent date.



I have three questions:




  1. How do I remove the duplicates using a SQL-only approach?


  2. How do I remove the duplicates using a pure Ruby solution? Something similar to this: http://stackoverflow.com/questions/2790004/removing-duplicate-objects.


  3. How do I put a validation in place to prevent duplicate entries in the future?






Find the answer here

No comments:

Post a Comment

LinkWithin

Related Posts with Thumbnails