[nfb-talk] Character Conversion Question

Sherri flmom2006 at gmail.com
Sun Dec 2 23:28:54 CST 2007


I don't see that stuff in my messages from Mike. Just FYI.
----- Original Message ----- 
From: "T. Joseph Carter" <tjcarter at bluecherry.net>
To: "NFB Talk Mailing List" <nfb-talk at nfbnet.org>
Sent: Thursday, November 29, 2007 2:23 AM
Subject: Re: [nfb-talk] Character Conversion Question


Actually Mike, your messages look like this:

1. multipart/alternative
1.1. text/plain, 7-bit, iso-8859-1 character set
1.2. text/plain, quoted-printable, iso-8859-1 character set
2. text/plain, 7-bit, us-ascii character set

Part two of the message is the little mailing list blurb, and not really
important.  (It does mean that any message sent to the lists becomes MIME
encoded before it goes out to subscribers, but again, that's not important
right now..)

Part one of the message is the part your client actually sends, more or
less.  The default configuration of Outlook is to send as both HTML and
text so that you can send mail to absolutely anyone, but that anyone with
HTML and MIME support gets whatever pretty text decorations you decide to
have in your mail.  To do that, it encodes them as separate MIME subparts
to a multipart/alternative part.

The first part is always text/plain.  It doesn't have to be, but it's
considered best practice.  Someone could be using a vintage 1985 email
program that doesn't support MIME, so they'll get a couple of lines of
MIME junk followed by your readable message, followed by more junk and
HTML they can just ignore.

NFBnet's list setup doesn't let HTML messages get through like that
though.  It ignores the fact that there's a perfectly good text/plain part
there already and attempts to convert the text/html part into text/plain.
It manages to lose track of what character set the HTML portion was
actually written in as part of the process, and the label of iso-8859-1 is
slapped onto it.  Now, iso-8859-1 and Windows Latin 1 are similar, but
they aren't the same.  Things like non-breaking spaces are different
(which is what was showing up in your messages.)  Once the character set
information is lost, it's non-trivial for a computer to figure out what
the character set ought to be.

Additionally, some HTML messages are not properly stripped.  You get
pieces of the HTML left behind--specifically, HTML comments are not
stripped, and certain versions of Outlook use this to include text style
information.  (Your messages don't have that, but several others do.)

There are two separate problems here.  First, it's illegal to have two
text/plain parts of a multipart/alternative block.  The point of
multipart/alternative is to allow your client to choose the format it
prefers to display, based solely on its format.  It is undefined which
text/plain part a particular client will choose, and mutt unfortunately
chooses the latter.

The second problem is that the HTML conversion could be better.  Ideally
it would convert the character set to 8859-1 or utf-8, rather than just
assuming it is 8859-1 as it does now, and it would strip out things like
style information in HTML comments too.

Of course, as I have already pointed out, the standard mail program on
this list seems to be Outlook.  No modern email program ships without the
ability to display HTML messages, and that's been the case for some ten
years now.  Even a text-based client like mutt has the option if you
enable it.

The only reason to avoid HTML email is the possibility that such messages
may contain scripts or web bugs that tell the message sender that you
opened their message.  Spammers use this feature.  But the feature can be
easily disabled by a defanging process.

Hopefully that sheds some light on it.


On Wed, Nov 28, 2007 at 08:38:16PM -0800, Mike Freeman wrote:
> Insofar as I am aware, I am not MIME-encoding using an unusual character 
> set or "quoted printable" or any of those other oddities; I am sending 
> HTML email but I'm using whatever standard character set Windows XP uses.
> Â
> Mike
> Â
> ----- Original Message -----
> From:
> mailto:dandrews at visi.com David Andrews
> To:
> mailto:nfb-talk at nfbnet.org NFB Talk Mailing List
> Sent:
> Wednesday, November 28, 2007 12:38 PM
> Subject:
> Re: [nfb-talk] Character Conversion Question
> Joseph:
> First, I will say that I am not an expert in mime encoding, html
> messages etc. So, there could be settings I could do to change
> things. However, I will also say that we are not removing
> attachments or doing any kind of nonstandard scrubbing. We use
> Mailman and sendmail, with mostly standard settings, as do thousands
> of other lists.
> I believe you use a somewhat non-standard setup, and I don't see a
> lot of conversion problems or complaints.
> Different mail programs handle things in different ways, and users
> can fiddle with things, so it may well not be us.
> At 02:08 PM 11/28/2007, you wrote:
> >Steve,
> >
> >It's part of how NFB listservs scrub messages to remove attachments,
> >un-HTML messages, etc. I've been urging David for years now to either 
> >fix
> >it (he's not sure exactly where the problem is), or just stop trying to
> >strip out parts of messages entirely.
> >
> >The reason not to do it is because some of us use text-based email
> >software like pine and mutt for UNIX systems which can't handle HTML
> >correctly. However, pine has dealt with HTML email for something close 
> >to
> >a decade now, and I can deal with HTML in mutt much more effectively than
> >I can deal with a mislabeled character set.
> >
> >There's also that the HTML stripper leaves behind HTML comments, which
> >results in lots of extra crap in messages. Unfortunately, the way the
> >messages are mangled does not lend itself to undoing the mangling.
> >
> >So my Christmas wish is that a new HTML defang script will replace the
> >HTML stripper we have now. ;)
> >
> >
> >On Wed, Nov 28, 2007 at 12:11:21PM -0600, Steve Jacobson wrote:
> > > Mike,
> > >
> > > I have started seeing some unusual characters in your notes and
> > I'm trying to figure out if it is my screen reader, my mail
> > program, or something else. for example,
> > > there is one such character just above and just below "Merry
> > Christmas" in your note below. Do you remember if you typed a
> > particular character that is being
> > > converted somehow? Thanks for humoring me.
> >
> >_______________________________________________
> >nfb-talk mailing list
> >nfb-talk at nfbnet.org
> >http://www.nfbnet.org/mailman/listinfo/nfb-talk
> David Andrews and white cane Harry.
> _______________________________________________
> nfb-talk mailing list
> mailto:nfb-talk at nfbnet.org nfb-talk at nfbnet.org
> http://www.nfbnet.org/mailman/listinfo/nfb-talk 
> http://www.nfbnet.org/mailman/listinfo/nfb-talk

> _______________________________________________
> nfb-talk mailing list
> nfb-talk at nfbnet.org
> http://www.nfbnet.org/mailman/listinfo/nfb-talk

_______________________________________________
nfb-talk mailing list
nfb-talk at nfbnet.org
http://www.nfbnet.org/mailman/listinfo/nfb-talk



More information about the nfb-talk mailing list