[NCLUG] charset/encoding challenges & filesystems

Luke Jones slukejones at gmail.com
Mon Nov 26 01:02:20 MST 2012


Bob,

Thanks for the tips. I copied a few problematic files into a test directory
-- try to focus the problem to something smaller than "I just sent 10K
files and it didn't work" -- and when I rsync'ed it both ways everything
appeared to work correctly. I got messages from rsync that looked like this:

opening connection using ssh craters.local rsync --server -vvnlogDtpr .
Desktop/
luke at craters.local's password:
building file list ...
done
delta-transmission enabled
test/02 Wagner ?\#200\#224 Die Walku?\#210re ?\#200\#224 Ride Of The
Valkyries.mp3 is uptodate
test/05 Se A Vida E?\#201 (That's The Way Life Is).mp3 is uptodate
test/08 Blue O?\#210yster Cult.mp3 is uptodate
test/08 Sla?\#201inte Bhreagh Hiu?\#201lit (Hewlett).mp3 is uptodate
total: matches=0  hash_hits=0  false_alarms=0 data=0

That has the advantage of showing what the actual bytes are in the
filename, and showing that they're matching their counterparts on the far
side and correctly determining what is and is not up to date.

So. I still don't know what is wrong, but you were helpful if only in
asking that I "sit down calmly, take a deep breath, and think things over"
as it work. I'll put some more time into debugging this tomorrow and
hopefully figure out what's breaking with the 10K music files.




On Sun, Nov 25, 2012 at 9:28 PM, Bob Proulx <bob at proulx.com> wrote:

> Luke Jones wrote:
> > Blue Öyster Cult/Agents of Fortune/03 (Don't Fear) The Reaper.mp3
> >
> > look the same on Linux (...ue Öyster ...)
>
> Seems like a good sign.
>
> > but the encoding appears to be different, since rsync keeps trying
> > to fix it.
>
> What do you mean?  Pleasy say *exactly* what rsync is doing.  Are you
> sure it isn't just trying to update timestamps or user:group?
>
> Perhaps just let it copy the data and be done with it and then not
> worry about what it was trying to do?
>
> > Now rsync has an --iconv=CHARSET option I never noticed before, so maybe
> I
> > could use that.
>
> Probably not needed if your locale is set correctly on both ends.
>
> > Alternatively, I see that ext4 will accept as filenames pretty much any
> > sequence of bytes I throw at it.
>
> Unix filenames since the beginning have always accepted any character
> as valid except for the '/' and the C terminating zero byte '\0'.  All
> other characters are valid in file names.  What your terminal or other
> program does when displaying that data is up to it.  But there is very
> little restriction in what you can put in a file name.
>
> I recommend using UTF-8 everywhere.
>
> What encoding does the Mac use?  (Knowing Apple the character encoding
> is probably "Apple".  :-)
>
> > So maybe I just need to make sure all my user interfaces (Gnome, the
> > Terminal, etc.) all have the right locale specified.
>
> Definitely true regardless.  Normally one sets LANG and optionally
> sets LC_COLLATE.  I personally have:
>
>   export LANG=en_US.UTF-8
>   export LC_COLLATE=C
>
> > 1) do you tell rsync to use iconv?
>
> I have never used --iconv but I also have never had a Mac to worry
> about.  I would think you should be able to ignore it the option and
> have rsync auto-detect it.  But perhaps your LANG isn't set right on
> one end or the other and therefore it isn't autodetecting it
> correctly?
>
> > 2) do you set mount options for your ext4 filesystems?
>
> No.  Definitely not.  The Unix filesystem doesn't care.  It is your
> environment that cares how to display the data in the name.
>
> > 3) do you say screw it all and downmode to 7bit ASCII?
>
> I don't normally _type in_ anything but 7-bit ascii.  But if the
> filename is there then the shell will autocomplete it for me with a
> TAB and all of the characters that need it escaped automatically.
>
> But I don't depend upon filenames for songs.  Instead I depend upon
> the songs having tags, either mp3 or ogg, and my players prefer the
> embedded audio tags.  The filename is mostly not relevant to them.
>
> Bob
> _______________________________________________
> NCLUG mailing list       NCLUG at lists.nclug.org
>
> To unsubscribe, subscribe, or modify
> your settings, go to:
> http://lists.nclug.org/mailman/listinfo/nclug




-- 
Luke Jones  slukejones at gmail.com (907) 229-2699



More information about the NCLUG mailing list