[NCLUG] charset/encoding challenges & filesystems

Bob Proulx bob at proulx.com
Sun Nov 25 23:28:32 MST 2012


Luke Jones wrote:
> Blue Öyster Cult/Agents of Fortune/03 (Don't Fear) The Reaper.mp3
> 
> look the same on Linux (...ue Öyster ...)

Seems like a good sign.

> but the encoding appears to be different, since rsync keeps trying
> to fix it.

What do you mean?  Pleasy say *exactly* what rsync is doing.  Are you
sure it isn't just trying to update timestamps or user:group?

Perhaps just let it copy the data and be done with it and then not
worry about what it was trying to do?

> Now rsync has an --iconv=CHARSET option I never noticed before, so maybe I
> could use that.

Probably not needed if your locale is set correctly on both ends.

> Alternatively, I see that ext4 will accept as filenames pretty much any
> sequence of bytes I throw at it.

Unix filenames since the beginning have always accepted any character
as valid except for the '/' and the C terminating zero byte '\0'.  All
other characters are valid in file names.  What your terminal or other
program does when displaying that data is up to it.  But there is very
little restriction in what you can put in a file name.

I recommend using UTF-8 everywhere.

What encoding does the Mac use?  (Knowing Apple the character encoding
is probably "Apple".  :-)

> So maybe I just need to make sure all my user interfaces (Gnome, the
> Terminal, etc.) all have the right locale specified.

Definitely true regardless.  Normally one sets LANG and optionally
sets LC_COLLATE.  I personally have:

  export LANG=en_US.UTF-8
  export LC_COLLATE=C

> 1) do you tell rsync to use iconv?

I have never used --iconv but I also have never had a Mac to worry
about.  I would think you should be able to ignore it the option and
have rsync auto-detect it.  But perhaps your LANG isn't set right on
one end or the other and therefore it isn't autodetecting it
correctly?

> 2) do you set mount options for your ext4 filesystems?

No.  Definitely not.  The Unix filesystem doesn't care.  It is your
environment that cares how to display the data in the name.

> 3) do you say screw it all and downmode to 7bit ASCII?

I don't normally _type in_ anything but 7-bit ascii.  But if the
filename is there then the shell will autocomplete it for me with a
TAB and all of the characters that need it escaped automatically.

But I don't depend upon filenames for songs.  Instead I depend upon
the songs having tags, either mp3 or ogg, and my players prefer the
embedded audio tags.  The filename is mostly not relevant to them.

Bob



More information about the NCLUG mailing list