Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #105206
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: How to waste computer memory? |
| Date | 2016-03-18 23:37 +1100 |
| Message-ID | <mailman.314.1458304644.12893.python-list@python.org> (permalink) |
| References | <a2639027-c69c-46df-a7a5-45a677b9e01d@googlegroups.com> <265377f4-741d-4aa2-9338-239f56f8bc57@googlegroups.com> <mailman.302.1458284448.12893.python-list@python.org> <56ebea83$0$1599$c3e8da3$5496439d@news.astraweb.com> |
On Fri, Mar 18, 2016 at 10:46 PM, Steven D'Aprano <steve@pearwood.info> wrote:
> On Fri, 18 Mar 2016 06:00 pm, Ian Kelly wrote:
>
>> On Thu, Mar 17, 2016 at 1:21 PM, Rick Johnson
>> <rantingrickjohnson@gmail.com> wrote:
>>> In the event that i change my mind about Unicode, and/or for
>>> the sake of others, who may want to know, please provide a
>>> list of languages that *YOU* think handle Unicode better than
>>> Python, starting with the best first. Thanks.
>
> Better than Python? Easy-peasy:
>
> List of languages with Unicode handling which is better than Python = []
>
> I'm not aware of any language with better or more complete Unicode
> functionality than Python's. (That doesn't necessarily mean that they don't
> exist.)
And this also doesn't preclude languages that have *as good* handling
as Python's, of which I know of one off-hand, and there may be any
number. (Trivial case: Take Python 3.5, change the definition of a
block to be { } instead of indentation, and release it as Bracethon
1.0. Voila, a distinct-yet-related language whose Unicode handling is
exactly as good as Python's.)
>> jmf has been asked this before, and as I recall he seems to feel that
>> UTF-8 should be used for all purposes, ignoring the limitations of
>> that encoding such as that indexing becomes a O(n) operation.
>
> Technically, UTF-8 doesn't *necessarily* imply indexing is O(n). For
> instance, your UTF-8 string might consist of an array of bytes containing
> the string, plus an array of indexes to the start of each code point. For
> example, the string:
>
> “abcπßЊ•𒀁”
>
> (including the quote marks) is 10 code points in length and 22 bytes as
> UTF-8. Grouping the (hex) bytes for each code point, we have:
>
> e2809c 61 62 63 cf80 c39f d08a e280a2 f0928081 e2809d
>
> so we could get a O(1) UTF-8 string by recording the bytes (in hex) plus the
> indexes (in decimal) in which each code point starts:
>
> e2809c616263cf80c39fd08ae280a2f0928081e2809d
>
> 0 3 4 5 6 8 10 12 15 19
>
> but (assuming each index needs 2 bytes, which supports strings up to 65535
> characters in length), that's actually LESS memory efficient than UTF-32:
> 42 bytes versus 40.
A lot of strings will have no more than 255 non-ASCII characters in
them. (For example, all strings which no more than 255 total
characters.) You could store, instead of the indexes themselves, a
series of one-byte offsets:
e2809c616263cf80c39fd08ae280a2f0928081e2809d
0 2 2 2 2 3 4 5 7 10
Locating a byte based on its character position is still O(1); you
look up that position in the offset table, add that to your original
character position, and you have the byte location. For strings with
too many non-ASCII codepoints, you'd need some other representation,
but at that point, it might be worth just switching to UTF-32.
Of course, O(1) isn't the ultimate goal to the exclusion of all else.
For a simple sequential parser, indexing might be such a rare
operation that it's okay for it to be O(N), as you're never going to
index more than a few characters from a known position. Or if you're
trying to search a few gig of text, it's entirely possible that
transcoding into an indexable format is a complete waste of time, and
it's better to just work with a stream of bytes straight off the disk.
But for a general string type in a high level language, I'm normally
going to assume that indexing is fairly cheap.
ChrisA
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
How to waste computer memory? wxjmfauth@gmail.com - 2016-03-17 07:34 -0700
Re: How to waste computer memory? Rick Johnson <rantingrickjohnson@gmail.com> - 2016-03-17 12:21 -0700
Re: How to waste computer memory? cl@isbd.net - 2016-03-17 20:31 +0000
Re: How to waste computer memory? Chris Angelico <rosuav@gmail.com> - 2016-03-18 07:42 +1100
Re: How to waste computer memory? Grant Edwards <invalid@invalid.invalid> - 2016-03-17 21:08 +0000
Re: How to waste computer memory? Chris Angelico <rosuav@gmail.com> - 2016-03-18 08:13 +1100
Re: How to waste computer memory? Paul Rubin <no.email@nospam.invalid> - 2016-03-17 14:30 -0700
Re: How to waste computer memory? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-17 22:32 +0000
Re: How to waste computer memory? cl@isbd.net - 2016-03-17 22:42 +0000
Re: How to waste computer memory? Marko Rauhamaa <marko@pacujo.net> - 2016-03-17 23:11 +0200
Re: How to waste computer memory? Chris Angelico <rosuav@gmail.com> - 2016-03-18 08:17 +1100
Re: How to waste computer memory? BartC <bc@freeuk.com> - 2016-03-17 21:26 +0000
Re: How to waste computer memory? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-17 22:38 +0000
Re: How to waste computer memory? Chris Angelico <rosuav@gmail.com> - 2016-03-18 10:02 +1100
Re: How to waste computer memory? alister <alister.ware@ntlworld.com> - 2016-03-17 21:37 +0000
Re: How to waste computer memory? alister <alister.ware@ntlworld.com> - 2016-03-17 21:43 +0000
Re: How to waste computer memory? Gene Heskett <gheskett@wdtv.com> - 2016-03-17 20:51 -0400
Re: How to waste computer memory? Rick Johnson <rantingrickjohnson@gmail.com> - 2016-03-17 18:47 -0700
Re: How to waste computer memory? cl@isbd.net - 2016-03-18 10:44 +0000
Re: How to waste computer memory? Gene Heskett <gheskett@wdtv.com> - 2016-03-18 10:11 -0400
Re: How to waste computer memory? Grant Edwards <invalid@invalid.invalid> - 2016-03-19 13:50 +0000
Re: How to waste computer memory? Ian Kelly <ian.g.kelly@gmail.com> - 2016-03-18 01:00 -0600
Re: How to waste computer memory? Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-03-18 10:26 +0200
Re: How to waste computer memory? Marko Rauhamaa <marko@pacujo.net> - 2016-03-18 17:26 +0200
Re: How to waste computer memory? Steven D'Aprano <steve@pearwood.info> - 2016-03-19 03:58 +1100
Re: How to waste computer memory? Marko Rauhamaa <marko@pacujo.net> - 2016-03-18 23:02 +0200
Re: How to waste computer memory? Marko Rauhamaa <marko@pacujo.net> - 2016-03-18 23:28 +0200
Re: How to waste computer memory? Marko Rauhamaa <marko@pacujo.net> - 2016-03-19 00:03 +0200
Re: How to waste computer memory? Marko Rauhamaa <marko@pacujo.net> - 2016-03-19 09:49 +0200
Re: How to waste computer memory? Marko Rauhamaa <marko@pacujo.net> - 2016-03-19 10:22 +0200
Re: How to waste computer memory? Marko Rauhamaa <marko@pacujo.net> - 2016-03-19 11:40 +0200
Re: How to waste computer memory? Steven D'Aprano <steve@pearwood.info> - 2016-03-19 19:38 +1100
Re: How to waste computer memory? wxjmfauth@gmail.com - 2016-03-19 00:14 -0700
Re: How to waste computer memory? wxjmfauth@gmail.com - 2016-03-19 02:17 -0700
Re: How to waste computer memory? Steven D'Aprano <steve@pearwood.info> - 2016-03-19 19:14 +1100
Re: How to waste computer memory? Marko Rauhamaa <marko@pacujo.net> - 2016-03-19 11:31 +0200
Re: How to waste computer memory? wxjmfauth@gmail.com - 2016-03-19 03:40 -0700
Re: How to waste computer memory? Marko Rauhamaa <marko@pacujo.net> - 2016-03-19 13:07 +0200
Re: How to waste computer memory? BartC <bc@freeuk.com> - 2016-03-19 12:24 +0000
Re: How to waste computer memory? Marko Rauhamaa <marko@pacujo.net> - 2016-03-19 14:43 +0200
Re: How to waste computer memory? Steven D'Aprano <steve@pearwood.info> - 2016-03-20 01:18 +1100
Re: How to waste computer memory? BartC <bc@freeuk.com> - 2016-03-19 15:14 +0000
Re: How to waste computer memory? BartC <bc@freeuk.com> - 2016-03-19 15:20 +0000
Re: How to waste computer memory? Steven D'Aprano <steve@pearwood.info> - 2016-03-19 22:32 +1100
Re: How to waste computer memory? Marko Rauhamaa <marko@pacujo.net> - 2016-03-19 14:42 +0200
Re: How to waste computer memory? Steven D'Aprano <steve@pearwood.info> - 2016-03-20 01:39 +1100
Re: How to waste computer memory? Marko Rauhamaa <marko@pacujo.net> - 2016-03-19 16:56 +0200
Re: How to waste computer memory? wxjmfauth@gmail.com - 2016-03-19 07:01 -0700
Re: How to waste computer memory? Steven D'Aprano <steve@pearwood.info> - 2016-03-20 01:56 +1100
Re: How to waste computer memory? Marko Rauhamaa <marko@pacujo.net> - 2016-03-19 17:02 +0200
Re: How to waste computer memory? Steven D'Aprano <steve@pearwood.info> - 2016-03-20 02:47 +1100
Re: How to waste computer memory? Marko Rauhamaa <marko@pacujo.net> - 2016-03-19 18:12 +0200
Re: How to waste computer memory? Steven D'Aprano <steve@pearwood.info> - 2016-03-20 16:01 +1100
Re: How to waste computer memory? Rustom Mody <rustompmody@gmail.com> - 2016-03-19 23:20 -0700
Re: How to waste computer memory? Steven D'Aprano <steve@pearwood.info> - 2016-03-20 22:06 +1100
Re: How to waste computer memory? Chris Angelico <rosuav@gmail.com> - 2016-03-20 22:22 +1100
Re: How to waste computer memory? Steven D'Aprano <steve@pearwood.info> - 2016-03-20 23:14 +1100
Re: How to waste computer memory? Chris Angelico <rosuav@gmail.com> - 2016-03-20 23:27 +1100
Re: How to waste computer memory? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2016-03-20 14:55 +0000
Re: How to waste computer memory? Marko Rauhamaa <marko@pacujo.net> - 2016-03-20 17:36 +0200
Re: How to waste computer memory? Random832 <random832@fastmail.com> - 2016-03-20 14:17 -0400
Re: How to waste computer memory? Marko Rauhamaa <marko@pacujo.net> - 2016-03-20 09:30 +0200
Re: How to waste computer memory? wxjmfauth@gmail.com - 2016-03-18 03:50 -0700
Re: How to waste computer memory? Steven D'Aprano <steve@pearwood.info> - 2016-03-18 22:46 +1100
Re: How to waste computer memory? Steven D'Aprano <steve@pearwood.info> - 2016-03-18 22:58 +1100
Re: How to waste computer memory? wxjmfauth@gmail.com - 2016-03-18 12:53 -0700
Re: How to waste computer memory? Chris Angelico <rosuav@gmail.com> - 2016-03-18 23:37 +1100
Re: How to waste computer memory? Ian Kelly <ian.g.kelly@gmail.com> - 2016-03-18 07:57 -0600
Re: How to waste computer memory? Steven D'Aprano <steve@pearwood.info> - 2016-03-19 03:44 +1100
Re: How to waste computer memory? Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-03-18 20:22 +0200
Re: How to waste computer memory? wxjmfauth@gmail.com - 2016-03-18 13:03 -0700
Re: How to waste computer memory? sohcahtoa82@gmail.com - 2016-03-18 11:18 -0700
csiph-web