Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #50167

Re: hex dump w/ or w/out utf-8 chars

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <rosuav@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.005
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; 'programmer': 0.03; 'utf-8': 0.07; "'a'": 0.09; 'badly': 0.09; 'character,': 0.09; 'correspond': 0.09; 'differently.': 0.09; 'python': 0.11; '"a"': 0.16; 'any.': 0.16; 'assembler': 0.16; 'character.': 0.16; 'encodings': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'interacting': 0.16; 'magic': 0.16; 'mapped': 0.16; 'on)': 0.16; 'wrote:': 0.18; 'hacking': 0.19; '>>>': 0.22; 'code,': 0.22; 'bytes': 0.24; 'subject:/': 0.26; 'defined': 0.27; 'header:In-Reply-To:1': 0.27; 'point': 0.28; 'am,': 0.29; 'characters': 0.30; 'message-id:@mail.gmail.com': 0.30; 'becoming': 0.31; 'pascal': 0.31; 'languages': 0.32; 'stuff': 0.32; 'beginning': 0.33; 'equal': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'science,': 0.36; 'transition': 0.36; 'arrange': 0.38; 'to:addr:python-list': 0.38; 'to:addr:python.org': 0.39; 'even': 0.60; 'then,': 0.60; 'world.': 0.61; "you're": 0.61; 'back': 0.62; 'more': 0.64; 'between': 0.67; 'jul': 0.74; 'special': 0.74; 'characters,': 0.84; 'different.': 0.84; 'treatment': 0.95; '2013': 0.98
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=IvZc2WNZIM1o70veV1QxC3bUiEHz8cM/cw6+lb3Un5k=; b=MRa8fAstREKbNJ8KNDvYl/cVZIrmyvQFIFN/MroIBf1XT8ej6QcoI+sepStsvY82al yIISH9QKlgik9ZlG4iDPrsbX476tkjRR0AdhbWz9d4/CuyNBrFGG7IU/DVk5BmxY/8Lv M5ojPfvFLNN25YrslOvYTKfsz5x/QPeryh4ppeRjWUv05jsBmAf0+7jfVax3Ay8EuWdU YmMHgJtM5due5ek5M7ZOtXdZaCbAf2wevujpvZtM6/jrv/aQ72Sx0Lbwc3XAP+Aaqqp+ gopxzYU48Xj/eI/GEm4cAOXegSDoAWRAQVUDWVCwJrRnMo7itqc5fCZSgGFMb5sJx52P ZZVg==
MIME-Version 1.0
X-Received by 10.58.223.238 with SMTP id qx14mr14237718vec.98.1373306842465; Mon, 08 Jul 2013 11:07:22 -0700 (PDT)
In-Reply-To <7b6fc645-8bf3-4681-821c-38fb1fa1d191@googlegroups.com>
References <a35609c1-e56f-4180-8176-4405264da0a2@googlegroups.com> <7b6fc645-8bf3-4681-821c-38fb1fa1d191@googlegroups.com>
Date Tue, 9 Jul 2013 04:07:22 +1000
Subject Re: hex dump w/ or w/out utf-8 chars
From Chris Angelico <rosuav@gmail.com>
To python-list@python.org
Content-Type text/plain; charset=UTF-8
Content-Transfer-Encoding quoted-printable
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.4393.1373306845.3114.python-list@python.org> (permalink)
Lines 23
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1373306845 news.xs4all.nl 15957 [2001:888:2000:d::a6]:44223
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:50167

Show key headers only | View raw


On Tue, Jul 9, 2013 at 3:53 AM,  <ferdy.blatsco@gmail.com> wrote:
>>> All characters are UTF-8, characters. "a" is a UTF-8 character. So is "ă".
> Not using python 3, for me (a programmer which was present at the beginning of
> computer science, badly interacting with many languages from assembler to
> Fortran and from c to Pascal and so on) it was an hard job to arrange the
> abrupt transition from characters only equal to bytes to some special
> characters defined with 2, 3 bytes and even more.

Even back then, bytes and characters were different. 'A' is a
character, 0x41 is a byte. And they correspond 1:1 if and only if you
know that your characters are represented in ASCII. Other encodings
(eg EBCDIC) mapped things differently. The only difference now is that
more people are becoming aware that there are more than 256 characters
in the world.

Like Magic 2014 and its treatment of Slivers, at some point you're
going to have to master the difference between bytes and characters,
or else be eternally hacking around stuff in your code, so now is as
good a time as any.

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

hex dump w/ or w/out utf-8 chars blatt <ferdy.blatsco@gmail.com> - 2013-07-07 17:22 -0700
  Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-08 11:17 +1000
  Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-08 05:48 +0000
  Re: hex dump w/ or w/out utf-8 chars ferdy.blatsco@gmail.com - 2013-07-08 10:31 -0700
    Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-09 03:52 +1000
      Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-11 06:18 -0700
        Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-11 23:32 +1000
          Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-11 11:42 -0700
            Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-11 11:44 -0700
            Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-12 03:18 +0000
              Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-12 14:42 -0700
            Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-12 12:16 +1000
              Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-13 00:56 -0700
                Re: hex dump w/ or w/out utf-8 chars Lele Gaifax <lele@metapensiero.it> - 2013-07-13 10:24 +0200
                Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-13 09:36 +0000
                Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-13 19:46 +1000
                Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-13 09:49 +0000
                Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-13 20:09 +1000
                Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-13 07:37 -0700
                Re: hex dump w/ or w/out utf-8 chars Dave Angel <davea@davea.name> - 2013-07-13 15:02 -0400
                Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-14 01:20 -0700
                Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-14 10:44 +0000
                Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-14 06:44 -0700
                Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-24 06:28 -0700
                Re: hex dump w/ or w/out utf-8 chars Neil Hodgson <nhodgson@iinet.net.au> - 2013-07-14 09:17 +1000
  Re: hex dump w/ or w/out utf-8 chars ferdy.blatsco@gmail.com - 2013-07-08 10:53 -0700
    Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-09 04:07 +1000
    Re: hex dump w/ or w/out utf-8 chars Dave Angel <davea@davea.name> - 2013-07-08 16:56 -0400
      Re: hex dump w/ or w/out utf-8 chars Neil Cerutti <neilc@norwich.edu> - 2013-07-09 12:22 +0000
        Re: hex dump w/ or w/out utf-8 chars Dave Angel <davea@davea.name> - 2013-07-09 08:54 -0400
          Re: hex dump w/ or w/out utf-8 chars Neil Cerutti <neilc@norwich.edu> - 2013-07-09 13:00 +0000
            Re: hex dump w/ or w/out utf-8 chars Skip Montanaro <skip@pobox.com> - 2013-07-09 08:18 -0500
            Re: hex dump w/ or w/out utf-8 chars Dave Angel <davea@davea.name> - 2013-07-09 09:23 -0400
    Re: hex dump w/ or w/out utf-8 chars MRAB <python@mrabarnett.plus.com> - 2013-07-08 22:38 +0100
    Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-09 07:49 +1000
      Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-09 06:53 +0000
    Re: hex dump w/ or w/out utf-8 chars Joshua Landau <joshua.landau.ws@gmail.com> - 2013-07-08 23:02 +0100
    Re: hex dump w/ or w/out utf-8 chars Dave Angel <davea@davea.name> - 2013-07-08 18:45 -0400
    Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-09 08:51 +1000
    Re: hex dump w/ or w/out utf-8 chars MRAB <python@mrabarnett.plus.com> - 2013-07-09 00:32 +0100
      Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-09 06:46 +0000
    Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-09 07:00 +0000
      Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-09 02:34 -0700
        Re: hex dump w/ or w/out utf-8 chars Chris “Kwpolska” Warrick <kwpolska@gmail.com> - 2013-07-09 12:15 +0200
          Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-09 16:32 +0000
            Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-10 01:52 -0700
        Re: hex dump w/ or w/out utf-8 chars Joshua Landau <joshua@landau.ws> - 2013-07-12 23:01 +0100
          Re: hex dump w/ or w/out utf-8 chars Tim Roberts <timr@probo.com> - 2013-07-12 20:42 -0700
          Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-13 04:51 +0000

csiph-web