Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #19955
| From | BGB <cr88192@hotmail.com> |
|---|---|
| Newsgroups | comp.lang.java.programmer |
| Subject | Re: A proposal to handle file encodings |
| Date | 2012-11-25 14:36 -0600 |
| Organization | albasani.net |
| Message-ID | <k8tvkc$h9g$1@news.albasani.net> (permalink) |
| References | <lb6ta81u9imfdtlpuesoc8slncju0ehsnm@4ax.com> <k8o50f$1q6$1@news.albasani.net> <9kava8lk1ignppq7rso7gmcb541gnerf8q@4ax.com> |
On 11/23/2012 11:02 AM, Roedy Green wrote: > On Fri, 23 Nov 2012 16:33:40 +0100, Jan Burse <janburse@fastmail.fm> > wrote, quoted or indirectly quoted someone who said : > >> >> Would this not cover your requirements? > > The problem is primarily raw text files with no indication of the > encoding. > > The HTML encoding is incompetent. You can't read it without knowing > the encoding. It is just a confirmation. Thankfully the encoding comes > in the HTTP header -- a case where meta information is available. > it works as far as most usable encodings have ASCII as a subset, so whether it is UTF-8 or 8859-1 or similar doesn't matter, as the header can still be parsed. for UTF-16, there is typically the BOM, so if a BOM is seen, assume UTF-16. with some cleverness, it could probably also be extended to support EBCEDIC, basically just try reading as EBCEDIC and see if it "makes sense". > I feel angry about this. What asshole dreamed up the idea of > exchanging files in various encodings without any labelling of the > encoding? That there is no universal way of identifying the format of > a file is astounding. Parents who thought this way would send their > kids out into the world not knowing their names, addresses, or > genders. > > It sounds like something one of those people who live on beer and > pizza, with a roomful of old pizza boxes lying around would have come > up with. I wish Martha Stewart had gone into programming. > this is overdramatizing the issue. at first I thought it was about binary formats, which can often be identified if-needed by checking for magic values (sometimes augmented with things like header-checksums, ... which can reduce likelihood of false-positives). OTOH, one can get into the whole thing of container formats, where a glob of opaque binary data is often wrapped up in such a format with some identification of what it is. a typical example of such a container format are things like video-formats (AVI, MKV, MP4, OGG/OGM, ...), which may contain frames using any number of codecs, and may sometimes add additional capabilities, such as the ability to multiplex or interleave data chunks, ... for some of my own stuff, I am using informal container formats loosely based on the JPEG file format (itself mostly based on a system of "markers"). it works... or such...
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
A proposal to handle file encodings Roedy Green <see_website@mindprod.com.invalid> - 2012-11-22 13:36 -0800
Re: A proposal to handle file encodings Joerg Meier <joergmmeier@arcor.de> - 2012-11-22 23:36 +0100
Re: A proposal to handle file encodings markspace <-@.> - 2012-11-22 17:20 -0800
Re: A proposal to handle file encodings Arne Vajhøj <arne@vajhoej.dk> - 2012-11-22 20:25 -0500
Re: A proposal to handle file encodings markspace <-@.> - 2012-11-22 19:47 -0800
Re: A proposal to handle file encodings Roedy Green <see_website@mindprod.com.invalid> - 2012-11-22 21:28 -0800
Re: A proposal to handle file encodings Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-24 15:51 +0000
Re: A proposal to handle file encodings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-25 10:18 +0100
Re: A proposal to handle file encodings Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-25 18:05 +0000
Re: A proposal to handle file encodings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-27 19:51 +0100
Re: A proposal to handle file encodings Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-29 02:22 +0000
Re: A proposal to handle file encodings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-12-02 13:02 +0100
Re: A proposal to handle file encodings Martin Gregorie <martin@address-in-sig.invalid> - 2012-12-02 19:36 +0000
Re: A proposal to handle file encodings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-12-02 23:52 +0100
Re: A proposal to handle file encodings Martin Gregorie <martin@address-in-sig.invalid> - 2012-12-02 23:08 +0000
Re: A proposal to handle file encodings Sven Köhler <remove-sven.koehler@gmail.com> - 2012-11-25 13:13 +0100
Re: A proposal to handle file encodings Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-25 18:07 +0000
Re: A proposal to handle file encodings Jan Burse <janburse@fastmail.fm> - 2012-11-23 16:33 +0100
Re: A proposal to handle file encodings Roedy Green <see_website@mindprod.com.invalid> - 2012-11-23 09:02 -0800
Re: A proposal to handle file encodings Jan Burse <janburse@fastmail.fm> - 2012-11-23 19:21 +0100
Re: A proposal to handle file encodings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-24 00:11 +0100
Re: A proposal to handle file encodings Jan Burse <janburse@fastmail.fm> - 2012-11-24 00:53 +0100
Re: A proposal to handle file encodings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-24 09:13 +0100
Re: A proposal to handle file encodings Roedy Green <see_website@mindprod.com.invalid> - 2012-11-24 06:50 -0800
Re: A proposal to handle file encodings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-25 10:07 +0100
Re: A proposal to handle file encodings Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 11:06 -0600
Re: A proposal to handle file encodings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-27 19:28 +0100
Re: A proposal to handle file encodings Roedy Green <see_website@mindprod.com.invalid> - 2012-11-24 06:42 -0800
Re: A proposal to handle file encodings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-25 09:57 +0100
Re: A proposal to handle file encodings Sven Köhler <remove-sven.koehler@gmail.com> - 2012-11-25 15:09 +0100
Re: A proposal to handle file encodings Sven Köhler <remove-sven.koehler@gmail.com> - 2012-11-25 15:06 +0100
Re: A proposal to handle file encodings Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-23 16:43 -0600
Re: A proposal to handle file encodings Jan Burse <janburse@fastmail.fm> - 2012-11-24 01:02 +0100
Re: A proposal to handle file encodings BGB <cr88192@hotmail.com> - 2012-11-25 14:36 -0600
Re: A proposal to handle file encodings Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 16:51 -0600
Re: A proposal to handle file encodings BGB <cr88192@hotmail.com> - 2012-11-25 17:54 -0600
Re: A proposal to handle file encodings Jan Burse <janburse@fastmail.fm> - 2012-11-26 02:03 +0100
Re: A proposal to handle file encodings Jan Burse <janburse@fastmail.fm> - 2012-11-26 02:20 +0100
Re: A proposal to handle file encodings Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-26 02:46 +0000
csiph-web