Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #36523 > unrolled thread
| Started by | roy@panix.com (Roy Smith) |
|---|---|
| First post | 2013-01-09 17:54 -0500 |
| Last post | 2013-01-09 20:17 -0500 |
| Articles | 10 — 5 participants |
Back to article view | Back to comp.lang.python
Why BOM in logging message? roy@panix.com (Roy Smith) - 2013-01-09 17:54 -0500
Re: Why BOM in logging message? Chris Angelico <rosuav@gmail.com> - 2013-01-10 10:07 +1100
Re: Why BOM in logging message? Roy Smith <roy@panix.com> - 2013-01-09 20:17 -0500
Re: Why BOM in logging message? roy@panix.com (Roy Smith) - 2013-01-10 11:06 -0500
Re: Why BOM in logging message? Chris Angelico <rosuav@gmail.com> - 2013-01-11 03:16 +1100
Re: Why BOM in logging message? roy@panix.com (Roy Smith) - 2013-01-10 11:40 -0500
Re: Why BOM in logging message? Chris Angelico <rosuav@gmail.com> - 2013-01-11 03:52 +1100
Re: Why BOM in logging message? Terry Reedy <tjreedy@udel.edu> - 2013-01-10 18:45 -0500
Re: Why BOM in logging message? John Gordon <gordon@panix.com> - 2013-01-09 23:19 +0000
Re: Why BOM in logging message? Roy Smith <roy@panix.com> - 2013-01-09 20:17 -0500
| From | roy@panix.com (Roy Smith) |
|---|---|
| Date | 2013-01-09 17:54 -0500 |
| Subject | Why BOM in logging message? |
| Message-ID | <kcksg2$bkp$1@panix2.panix.com> |
We've got 10 (supposedly) identical servers, all running Ubuntu 12.04,
Python 2.7, Django 1.3. We log to syslog using the logging module and
a custom fomatter.
'formatters': {
'verbose': {
'format': '%(asctime)s [%(process)d]: %(program)s %(session_id)s %(request_id)s %(request_id_digest)s %(remote_addr)s %(name)s %(level\
name)s %(funcName)s() %(message)s',
'()': 'songza.logging.ContextFormatter',
},
},
There's nothing particularly exciting in the formatter code:
class ContextFormatter(logging.Formatter):
def format(self, record):
record.program = context.get_program()
record.request_id = context.get_request_id()
record.request_id_digest = context.get_request_id_digest()
record.session_id = context.get_session_id() or '-'
record.remote_addr = context.get_remote_addr() or '-'
return logging.Formatter.format(self, record)
What's weird is that two of the servers, and only those two, stick a
BOM (Byte Order Mark) in front of the message they log. It shows up
in syslog as:
2013-01-09T00:00:00+00:00 web5.songza.com <U+FEFF>2013-01-0900:00:00,754 [18979]: [etc...]
The other machines, never put the BOM in. Given that all 10 machines
are ostensibly clones of each other, we're scratching our heads what
might be different about those two which cause the BOMs to appear.
Any ideas?
I suppose it's possible it's a syslog config problem and not a Python
problem, but I figured I'd start at the beginning of the chain.
[toc] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-01-10 10:07 +1100 |
| Message-ID | <mailman.344.1357772847.2939.python-list@python.org> |
| In reply to | #36523 |
On Thu, Jan 10, 2013 at 9:54 AM, Roy Smith <roy@panix.com> wrote: > What's weird is that two of the servers, and only those two, stick a > BOM (Byte Order Mark) in front of the message they log. Could it be this issue you're looking at? http://bugs.python.org/issue14452 What are the exact Python versions in use? Are the two different servers running an older revision of Py 2.7? ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2013-01-09 20:17 -0500 |
| Message-ID | <roy-10B053.20174309012013@news.panix.com> |
| In reply to | #36525 |
In article <mailman.344.1357772847.2939.python-list@python.org>, Chris Angelico <rosuav@gmail.com> wrote: > On Thu, Jan 10, 2013 at 9:54 AM, Roy Smith <roy@panix.com> wrote: > > What's weird is that two of the servers, and only those two, stick a > > BOM (Byte Order Mark) in front of the message they log. > > Could it be this issue you're looking at? > > http://bugs.python.org/issue14452 > > What are the exact Python versions in use? Are the two different > servers running an older revision of Py 2.7? > > ChrisA It sounds like it might be it, but we're running 2.7.3 on all machines.
[toc] | [prev] | [next] | [standalone]
| From | roy@panix.com (Roy Smith) |
|---|---|
| Date | 2013-01-10 11:06 -0500 |
| Message-ID | <kcmoug$rk4$1@panix2.panix.com> |
| In reply to | #36542 |
In article <roy-10B053.20174309012013@news.panix.com>,
Roy Smith <roy@panix.com> wrote:
>In article <mailman.344.1357772847.2939.python-list@python.org>,
> Chris Angelico <rosuav@gmail.com> wrote:
>
>> On Thu, Jan 10, 2013 at 9:54 AM, Roy Smith <roy@panix.com> wrote:
>> > What's weird is that two of the servers, and only those two, stick a
>> > BOM (Byte Order Mark) in front of the message they log.
>>
>> Could it be this issue you're looking at?
>>
>> http://bugs.python.org/issue14452
>>
>> What are the exact Python versions in use? Are the two different
>> servers running an older revision of Py 2.7?
>>
>> ChrisA
>
>It sounds like it might be it, but we're running 2.7.3 on all machines.
Well, this is fascinating. It turns out that while all of our
machines report that they're running 2.7.3, they have two different
versions of /usr/lib/python2.7/logging/handlers.py!
-rw-r--r-- 1 root root 45076 Aug 1 05:39 /usr/lib/python2.7/logging/handlers.py
-rw-r--r-- 1 root root 45143 Apr 20 2012 /usr/lib/python2.7/logging/handlers.py
The April 24th version has the BOM code in SysLogHander.emit():
| # Message is a string. Convert to bytes as required by RFC 5424
| if type(msg) is unicode:
| msg = msg.encode('utf-8')
| if codecs:
| msg = codecs.BOM_UTF8 + msg
| msg = prio + msg
and the August 1st version doesn't:
| # Message is a string. Convert to bytes as required by RFC 5424
| if type(msg) is unicode:
| msg = msg.encode('utf-8')
| msg = prio + msg
Is it possible there's two different 2.7.3 builds that somehow got
packaged by Ubuntu at different times? The pattern of which machines
have the August code and which have the April code correlates with
when we rolled out each server instance.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-01-11 03:16 +1100 |
| Message-ID | <mailman.372.1357834577.2939.python-list@python.org> |
| In reply to | #36573 |
On Fri, Jan 11, 2013 at 3:06 AM, Roy Smith <roy@panix.com> wrote: > Well, this is fascinating. It turns out that while all of our > machines report that they're running 2.7.3, they have two different > versions of /usr/lib/python2.7/logging/handlers.py! > > -rw-r--r-- 1 root root 45076 Aug 1 05:39 /usr/lib/python2.7/logging/handlers.py > -rw-r--r-- 1 root root 45143 Apr 20 2012 /usr/lib/python2.7/logging/handlers.py Ha, that would do it! I don't have a corresponding system to compare against, but what package is that file managed by? $ dpkg -S /usr/lib/python2.7/logging/handlers.py See if both systems show it as part of the same package, and if so, if the package is at the same version on each. On my Maverick desktop, I have 2.6, and the package is python2.6-minimal. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | roy@panix.com (Roy Smith) |
|---|---|
| Date | 2013-01-10 11:40 -0500 |
| Message-ID | <kcmqu8$6gc$1@panix2.panix.com> |
| In reply to | #36574 |
>On Fri, Jan 11, 2013 at 3:06 AM, Roy Smith <roy@panix.com> wrote: >> -rw-r--r-- 1 root root 45076 Aug 1 05:39 /usr/lib/python2.7/logging/handlers.py >> -rw-r--r-- 1 root root 45143 Apr 20 2012 /usr/lib/python2.7/logging/handlers.py Chris Angelico <rosuav@gmail.com> wrote: >$ dpkg -S /usr/lib/python2.7/logging/handlers.py Yup, on some machines we've got 2.7.3-0ubuntu3, and on others, 2.7.3-0ubuntu3.1 of python2.7-minimal. Details at: https://launchpad.net/ubuntu/+source/python2.7/2.7.3-0ubuntu3.1 Well, I guess today is dist-upgrade day :-)
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-01-11 03:52 +1100 |
| Message-ID | <mailman.373.1357836747.2939.python-list@python.org> |
| In reply to | #36576 |
On Fri, Jan 11, 2013 at 3:40 AM, Roy Smith <roy@panix.com> wrote: >>On Fri, Jan 11, 2013 at 3:06 AM, Roy Smith <roy@panix.com> wrote: >>> -rw-r--r-- 1 root root 45076 Aug 1 05:39 /usr/lib/python2.7/logging/handlers.py >>> -rw-r--r-- 1 root root 45143 Apr 20 2012 /usr/lib/python2.7/logging/handlers.py > > Chris Angelico <rosuav@gmail.com> wrote: >>$ dpkg -S /usr/lib/python2.7/logging/handlers.py > > Yup, on some machines we've got 2.7.3-0ubuntu3, and on others, > 2.7.3-0ubuntu3.1 of python2.7-minimal. Details at: > > https://launchpad.net/ubuntu/+source/python2.7/2.7.3-0ubuntu3.1 I love it when everything adds up! The message even cites the specific change. It's like a cryptic crossword - once you get the answer, you KNOW it's the answer because suddenly it all makes sense :) Thanks for bringing a fun problem to solve! It's (unfortunately) a refreshing change to read a post from someone who knows how to ask smart questions. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2013-01-10 18:45 -0500 |
| Message-ID | <mailman.383.1357861568.2939.python-list@python.org> |
| In reply to | #36573 |
On 1/10/2013 11:06 AM, Roy Smith wrote:
> Well, this is fascinating. It turns out that while all of our
> machines report that they're running 2.7.3, they have two different
> versions of /usr/lib/python2.7/logging/handlers.py!
>
> -rw-r--r-- 1 root root 45076 Aug 1 05:39 /usr/lib/python2.7/logging/handlers.py
> -rw-r--r-- 1 root root 45143 Apr 20 2012 /usr/lib/python2.7/logging/handlers.py
>
> The April 24th version has the BOM code in SysLogHander.emit():
The 'BOM' you are discussing here is not a true 2 or 4 byte
byte-order-mark, but the pseudo-BOM (pseudo because a stream of single
bytes has no byte order within the single bytes) that Micro$tupid adds
(sometimes) to utf-8 encoded bytes to mark their encoding as utf-8
rather than anything else. In otherwords, it is a non-(anti-)standard
U(nicode)E(ncoding)M(ark). It is actually the utf-8 encoding of the
2-byte BOM, and hence not a single mark! It is really confusing when
people use 'BOM' to refer to a UEM sequence instead of a BOM.
> | # Message is a string. Convert to bytes as required by RFC 5424
> | if type(msg) is unicode:
> | msg = msg.encode('utf-8')
> | if codecs:
> | msg = codecs.BOM_UTF8 + msg
> | msg = prio + msg
2.7.3 was released in April.
> and the August 1st version doesn't:
>
> | # Message is a string. Convert to bytes as required by RFC 5424
> | if type(msg) is unicode:
> | msg = msg.encode('utf-8')
> | msg = prio + msg
The issue referenced in an earlier messaged was to remove the UEM where
it did not belong.
> Is it possible there's two different 2.7.3 builds that somehow got
> packaged by Ubuntu at different times?
As you discovered, Ubuntu sometimes updates a release with bug patches
before we release a new official bugfix release. 2.7.4, with many
bugfixes, is still to see the light of day.
> The pattern of which machines
> have the August code and which have the April code correlates with
> when we rolled out each server instance.
Great detective work ;-).
--
Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | John Gordon <gordon@panix.com> |
|---|---|
| Date | 2013-01-09 23:19 +0000 |
| Message-ID | <kcktti$sbg$1@reader1.panix.com> |
| In reply to | #36523 |
In <kcksg2$bkp$1@panix2.panix.com> roy@panix.com (Roy Smith) writes:
> What's weird is that two of the servers, and only those two, stick a
> BOM (Byte Order Mark) in front of the message they log. It shows up
> in syslog as:
> 2013-01-09T00:00:00+00:00 web5.songza.com <U+FEFF>2013-01-0900:00:00,754 [18979]: [etc...]
I worked on an application that would insert a BOM in syslog messages if
the logged message contained unicode, but not if it was plain ascii.
Not sure if this relates to your issue, but it's similar enough that it
seemed worth mentioning.
--
John Gordon A is for Amy, who fell down the stairs
gordon@panix.com B is for Basil, assaulted by bears
-- Edward Gorey, "The Gashlycrumb Tinies"
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2013-01-09 20:17 -0500 |
| Message-ID | <roy-A2F810.20170209012013@news.panix.com> |
| In reply to | #36526 |
In article <kcktti$sbg$1@reader1.panix.com>,
John Gordon <gordon@panix.com> wrote:
> In <kcksg2$bkp$1@panix2.panix.com> roy@panix.com (Roy Smith) writes:
>
> > What's weird is that two of the servers, and only those two, stick a
> > BOM (Byte Order Mark) in front of the message they log. It shows up
> > in syslog as:
>
> > 2013-01-09T00:00:00+00:00 web5.songza.com <U+FEFF>2013-01-0900:00:00,754
> > [18979]: [etc...]
>
> I worked on an application that would insert a BOM in syslog messages if
> the logged message contained unicode, but not if it was plain ascii.
>
> Not sure if this relates to your issue, but it's similar enough that it
> seemed worth mentioning.
That doesn't seem to be it. All messages from web{2,5} have BOMs, no
message from web{1,3,4,6,7,8,9,10} ever does.
I even tried looking at the output of socket.gethostname() on the
various machines to see if maybe the hostname had some unicode character
in it. No joy.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web