Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #19955

Re: A proposal to handle file encodings

From BGB <cr88192@hotmail.com>
Newsgroups comp.lang.java.programmer
Subject Re: A proposal to handle file encodings
Date 2012-11-25 14:36 -0600
Organization albasani.net
Message-ID <k8tvkc$h9g$1@news.albasani.net> (permalink)
References <lb6ta81u9imfdtlpuesoc8slncju0ehsnm@4ax.com> <k8o50f$1q6$1@news.albasani.net> <9kava8lk1ignppq7rso7gmcb541gnerf8q@4ax.com>

Show all headers | View raw


On 11/23/2012 11:02 AM, Roedy Green wrote:
> On Fri, 23 Nov 2012 16:33:40 +0100, Jan Burse <janburse@fastmail.fm>
> wrote, quoted or indirectly quoted someone who said :
>
>>
>> Would this not cover your requirements?
>
> The problem is primarily raw text files with no indication of the
> encoding.
>
> The HTML encoding is incompetent. You can't read it without knowing
> the encoding. It is just a confirmation. Thankfully the encoding comes
> in the HTTP header -- a case where meta information is available.
>

it works as far as most usable encodings have ASCII as a subset, so 
whether it is UTF-8 or 8859-1 or similar doesn't matter, as the header 
can still be parsed.

for UTF-16, there is typically the BOM, so if a BOM is seen, assume UTF-16.

with some cleverness, it could probably also be extended to support 
EBCEDIC, basically just try reading as EBCEDIC and see if it "makes sense".


> I feel angry about this. What asshole dreamed up the idea of
> exchanging files in various encodings without any labelling of the
> encoding? That there is no universal way of identifying the format of
> a file is astounding.  Parents who thought this way would send their
> kids out into the world not knowing their names, addresses, or
> genders.
>
> It sounds like something one of those people who live on beer and
> pizza, with a roomful of old pizza boxes lying around would have come
> up with.  I wish Martha Stewart had gone into programming.
>

this is overdramatizing the issue.


at first I thought it was about binary formats, which can often be 
identified if-needed by checking for magic values (sometimes augmented 
with things like header-checksums, ... which can reduce likelihood of 
false-positives).

OTOH, one can get into the whole thing of container formats, where a 
glob of opaque binary data is often wrapped up in such a format with 
some identification of what it is. a typical example of such a container 
format are things like video-formats (AVI, MKV, MP4, OGG/OGM, ...), 
which may contain frames using any number of codecs, and may sometimes 
add additional capabilities, such as the ability to multiplex or 
interleave data chunks, ...

for some of my own stuff, I am using informal container formats loosely 
based on the JPEG file format (itself mostly based on a system of 
"markers"). it works...


or such...

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

A proposal to handle file encodings Roedy Green <see_website@mindprod.com.invalid> - 2012-11-22 13:36 -0800
  Re: A proposal to handle file encodings Joerg Meier <joergmmeier@arcor.de> - 2012-11-22 23:36 +0100
  Re: A proposal to handle file encodings markspace <-@.> - 2012-11-22 17:20 -0800
  Re: A proposal to handle file encodings Arne Vajhøj <arne@vajhoej.dk> - 2012-11-22 20:25 -0500
    Re: A proposal to handle file encodings markspace <-@.> - 2012-11-22 19:47 -0800
      Re: A proposal to handle file encodings Roedy Green <see_website@mindprod.com.invalid> - 2012-11-22 21:28 -0800
        Re: A proposal to handle file encodings Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-24 15:51 +0000
          Re: A proposal to handle file encodings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-25 10:18 +0100
            Re: A proposal to handle file encodings Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-25 18:05 +0000
              Re: A proposal to handle file encodings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-27 19:51 +0100
                Re: A proposal to handle file encodings Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-29 02:22 +0000
                Re: A proposal to handle file encodings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-12-02 13:02 +0100
                Re: A proposal to handle file encodings Martin Gregorie <martin@address-in-sig.invalid> - 2012-12-02 19:36 +0000
                Re: A proposal to handle file encodings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-12-02 23:52 +0100
                Re: A proposal to handle file encodings Martin Gregorie <martin@address-in-sig.invalid> - 2012-12-02 23:08 +0000
    Re: A proposal to handle file encodings Sven Köhler <remove-sven.koehler@gmail.com> - 2012-11-25 13:13 +0100
      Re: A proposal to handle file encodings Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-25 18:07 +0000
  Re: A proposal to handle file encodings Jan Burse <janburse@fastmail.fm> - 2012-11-23 16:33 +0100
    Re: A proposal to handle file encodings Roedy Green <see_website@mindprod.com.invalid> - 2012-11-23 09:02 -0800
      Re: A proposal to handle file encodings Jan Burse <janburse@fastmail.fm> - 2012-11-23 19:21 +0100
        Re: A proposal to handle file encodings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-24 00:11 +0100
          Re: A proposal to handle file encodings Jan Burse <janburse@fastmail.fm> - 2012-11-24 00:53 +0100
            Re: A proposal to handle file encodings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-24 09:13 +0100
            Re: A proposal to handle file encodings Roedy Green <see_website@mindprod.com.invalid> - 2012-11-24 06:50 -0800
              Re: A proposal to handle file encodings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-25 10:07 +0100
                Re: A proposal to handle file encodings Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 11:06 -0600
                Re: A proposal to handle file encodings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-27 19:28 +0100
          Re: A proposal to handle file encodings Roedy Green <see_website@mindprod.com.invalid> - 2012-11-24 06:42 -0800
            Re: A proposal to handle file encodings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-25 09:57 +0100
          Re: A proposal to handle file encodings Sven Köhler <remove-sven.koehler@gmail.com> - 2012-11-25 15:09 +0100
        Re: A proposal to handle file encodings Sven Köhler <remove-sven.koehler@gmail.com> - 2012-11-25 15:06 +0100
      Re: A proposal to handle file encodings Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-23 16:43 -0600
        Re: A proposal to handle file encodings Jan Burse <janburse@fastmail.fm> - 2012-11-24 01:02 +0100
      Re: A proposal to handle file encodings BGB <cr88192@hotmail.com> - 2012-11-25 14:36 -0600
        Re: A proposal to handle file encodings Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 16:51 -0600
          Re: A proposal to handle file encodings BGB <cr88192@hotmail.com> - 2012-11-25 17:54 -0600
          Re: A proposal to handle file encodings Jan Burse <janburse@fastmail.fm> - 2012-11-26 02:03 +0100
            Re: A proposal to handle file encodings Jan Burse <janburse@fastmail.fm> - 2012-11-26 02:20 +0100
              Re: A proposal to handle file encodings Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-26 02:46 +0000

csiph-web