[gull] character set on MAC OS X

Daniel Cordey dc at mjt.ch
Fri Feb 22 14:37:07 CET 2008


On Friday 22 February 2008, Félix Hauri wrote:

> As-tu essayé de le monter à la mains, avec ``-t hfs''?

ce sont juste les noms des fichiers et repertoires qui me posent probleme. Le 
reste est bien monte en ISO. Mais il faut savoir qu'Apple a aussi des 
tendances discutables, tel que  
(http://en.wikipedia.org/wiki/UTF-8#Mac_OS_X) :

"...The Mac OS X Operating System uses canonically decomposed Unicode, encoded 
using UTF-8 for file names in the filesystem. This is sometimes referred to 
as UTF-8-MAC. In canonically decomposed Unicode, the use of precomposed 
characters is forbidden and combining diacritics must be used to replace 
them.

A common argument is that this makes sorting far simpler, but this argument is 
easily refuted: for one, sorting is language dependent (in German, the ä 
character sorts just after the a character, while in Scandinavian languages ä 
sorts after z). Therefore, it can be confusing for software built around the 
assumption that precomposed characters are the norm and combining diacritics 
are only used to form unusual combinations. This is an example of the NFD 
variant of Unicode normalization—most other platforms, including Windows and 
Linux, use the NFC form of Unicode normalization, which is also used by W3C 
standards, so NFD data must typically be converted to NFC for use on other 
platforms or the Web..."

Ainsi que :

http://www.j3e.de/linux/convmv/man/

"...Apple has modified UFS in a way, which makes it impossible to create 
filenames in UTF-8 NFC, they will always be NFD. Also creating filenames in 
other (non UTF-8) encodings is not possible. This hacks on UFS makes Darwin a 
real crappy Unix..."

Super !!!

J'ai essaye plein de trucs avec *MAC*, *adobe*, rien a faire. Naturellement, 
l'auteur du CD ne comprend pas ma question et me dit qu'il a fait tout ca 
avec 'Adobe Lightroom', etc. 

Finalement, j'ai reussi a utilise la commande suivante :

	convmv -f MacRoman -t utf8 --nfc /media/cdrom/*

ET c'est la seule forme d' "encodage" qui soit accepte. Il se contente de 
replacer un catactere code '248' en MAC Roman, vers le caractere 248 UTF-8...
Ouai... :-(

dc





More information about the gull mailing list