Sunday, September 12, 2010

grouping strings by similarity

Programmer Question

I have an array of strings, not many (maybe a few hundreds) but often long (a few hundred chars).



Those string are, generally, nonsense and different one from the other.. but in a group of those string, maybe 5 out of 300, there's a great similarity. In fact they are the same string, what differs is formatting, punctuation and a few words..



How can I work out that group of string?



By the way, I'm writing in ruby, but if nothing else an algorithm in pseudocode would be fine.



thanks



Find the answer here

No comments:

Post a Comment

LinkWithin

Related Posts with Thumbnails