Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #64131
| Date | 2014-01-16 19:40 -0600 |
|---|---|
| From | Tim Chase <tim@thechases.com> |
| Subject | Re: Guessing the encoding from a BOM |
| References | <CAPTjJmqyO0UHrq31510iNeoQ2TcrJnosV0A6oHQOt5i-gz3njA@mail.gmail.com> <1389901049.40172.YahooMailBasic@web163804.mail.gq1.yahoo.com> <CAPTjJmqNhokKF8X3jHNZrW0iEt8foTaMM+26a3+2O9FG4rMPpw@mail.gmail.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.5618.1389922759.18130.python-list@python.org> (permalink) |
On 2014-01-17 11:14, Chris Angelico wrote: > UTF-8 specifies the byte order > as part of the protocol, so you don't need to mark it. You don't need to mark it when writing, but some idiots use it anyway. If you're sniffing a file for purposes of reading, you need to look for it and remove it from the actual data that gets returned from the file--otherwise, your data can see it as corruption. I end up with lots of CSV files from customers who have polluted it with Notepad or had Excel insert some UTF-8 BOM when exporting. This means my first column-name gets the BOM prefixed onto it when the file is passed to csv.DictReader, grr. -tkc
Back to comp.lang.python | Previous | Next — Next in thread | Find similar | Unroll thread
Re: Guessing the encoding from a BOM Tim Chase <tim@thechases.com> - 2014-01-16 19:40 -0600
Re: Guessing the encoding from a BOM Rustom Mody <rustompmody@gmail.com> - 2014-01-16 21:08 -0800
Re: Guessing the encoding from a BOM Pete Forman <petef4+usenet@gmail.com> - 2014-01-17 16:26 +0000
Re: Guessing the encoding from a BOM Rustom Mody <rustompmody@gmail.com> - 2014-01-17 08:30 -0800
Re: Guessing the encoding from a BOM Chris Angelico <rosuav@gmail.com> - 2014-01-18 03:50 +1100
Re: Guessing the encoding from a BOM Chris Angelico <rosuav@gmail.com> - 2014-01-18 03:33 +1100
csiph-web