Tuesday, September 21, 2010

What does sorting mean in double-byte languages?

Programmer Question

I have some code that sorts table columns by object properties. It occurred to me that in Japanese or Chinese (non-alphabetical languages), the strings that are sent to the sort function would be compared the way an alphabetical language would.



Take for example a list of Japanese surnames:



??
??
??
??
??


In English, these would be Suzuki, Matsuzaka, Matsui, Yamada, Fujimoto.



When I sort the above list via Javascript, the result is:



??
??
??
??
??


(Suzuki, Yamada, Matsui, Matsuzaka, Fujimoto) This is different from the ordering of the Japanese syllabary, which would order the list (phonetically) as Suzuki, Fujimoto, Matsui, Matsuzaka, Yamada.



What I want to know is:




  1. Does one double-byte character really get compared against the other in a sort function?

  2. What really goes on in such a sort?

  3. (Extra credit) Does the result of such a sort mean anything at all? Does the concept of sorting really work in Asian (and other) languages? If so, what does it mean and what should one strive for in creating a compare function for those languages?



Find the answer here

No comments:

Post a Comment

LinkWithin

Related Posts with Thumbnails