Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #41834 > unrolled thread
| Started by | Chris Angelico <rosuav@gmail.com> |
|---|---|
| First post | 2013-03-26 08:51 +1100 |
| Last post | 2013-03-28 12:39 +0000 |
| Articles | 20 on this page of 206 — 30 participants |
Back to article view | Back to comp.lang.python
Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-03-26 08:51 +1100
Re: Performance of int/long in Python 3 Cousin Stanley <cousinstanley@gmail.com> - 2013-03-25 23:35 +0000
Re: Performance of int/long in Python 3 Dan Stromberg <drsalists@gmail.com> - 2013-03-25 17:12 -0700
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-03-26 17:26 +1100
Re: Performance of int/long in Python 3 Cousin Stanley <cousinstanley@gmail.com> - 2013-03-26 13:38 +0000
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-03-27 01:08 +1100
Re: Performance of int/long in Python 3 Cousin Stanley <cousinstanley@gmail.com> - 2013-03-26 16:41 +0000
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-03-27 03:54 +1100
Re: Performance of int/long in Python 3 Terry Reedy <tjreedy@udel.edu> - 2013-03-26 14:24 -0400
Re: Performance of int/long in Python 3 jmfauth <wxjmfauth@gmail.com> - 2013-03-26 11:50 -0700
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-03-27 06:03 +1100
Re: Performance of int/long in Python 3 jmfauth <wxjmfauth@gmail.com> - 2013-03-26 13:44 -0700
Re: Performance of int/long in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-03-26 20:50 +0000
Re: Performance of int/long in Python 3 Grant Edwards <invalid@invalid.invalid> - 2013-03-26 21:08 +0000
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-03-27 08:14 +1100
Re: Performance of int/long in Python 3 Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2013-03-27 12:10 +1300
Re: Performance of int/long in Python 3 Dave Angel <davea@davea.name> - 2013-03-26 19:19 -0400
Re: Performance of int/long in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-03-26 21:26 +0000
Re: Performance of int/long in Python 3 Dave Angel <davea@davea.name> - 2013-03-26 17:28 -0400
Re: Performance of int/long in Python 3 Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-03-26 23:14 -0400
Re: Performance of int/long in Python 3 jmfauth <wxjmfauth@gmail.com> - 2013-03-27 13:30 -0700
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-03-27 07:52 +1100
Re: Performance of int/long in Python 3 Ned Deily <nad@acm.org> - 2013-03-26 17:00 -0700
Re: Performance of int/long in Python 3 rurpy@yahoo.com - 2013-03-26 21:31 -0700
Re: Performance of int/long in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-03-27 00:20 +0000
Re: Performance of int/long in Python 3 Ned Deily <nad@acm.org> - 2013-03-26 18:31 -0700
Re: Performance of int/long in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-03-27 11:51 +0000
Re: Performance of int/long in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-28 01:47 +0000
flaming vs accuracy [was Re: Performance of int/long in Python 3] Ethan Furman <ethan@stoneleaf.us> - 2013-03-27 20:18 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] rusi <rustompmody@gmail.com> - 2013-03-27 20:49 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-28 05:20 +0000
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] rusi <rustompmody@gmail.com> - 2013-03-27 22:42 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-28 07:48 +0000
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] rurpy@yahoo.com - 2013-03-28 12:54 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Ethan Furman <ethan@stoneleaf.us> - 2013-03-28 13:31 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Grant Edwards <invalid@invalid.invalid> - 2013-03-29 14:52 +0000
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Ethan Furman <ethan@stoneleaf.us> - 2013-03-29 08:51 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Grant Edwards <invalid@invalid.invalid> - 2013-03-29 16:50 +0000
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] rurpy@yahoo.com - 2013-03-29 14:26 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Ethan Furman <ethan@stoneleaf.us> - 2013-03-29 16:07 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] jmfauth <wxjmfauth@gmail.com> - 2013-03-31 00:35 -0700
ASCII versus non-ASCII [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-31 08:22 +0000
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-03-31 13:55 +0100
Re: Performance of int/long in Python 3 rusi <rustompmody@gmail.com> - 2013-03-31 22:33 -0700
Re: Performance of int/long in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2013-03-31 23:52 -0600
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-04-01 16:57 +1100
Re: Performance of int/long in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-01 08:14 +0000
Re: Performance of int/long in Python 3 Roy Smith <roy@panix.com> - 2013-04-01 08:15 -0400
Re: Performance of int/long in Python 3 rusi <rustompmody@gmail.com> - 2013-04-01 06:11 -0700
Re: Performance of int/long in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-01 17:02 +0000
Re: Performance of int/long in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-01 17:07 +0000
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-04-02 04:20 +1100
Re: Performance of int/long in Python 3 MRAB <python@mrabarnett.plus.com> - 2013-04-01 18:53 +0100
Re: Performance of int/long in Python 3 jmfauth <wxjmfauth@gmail.com> - 2013-04-01 12:15 -0700
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-04-02 06:28 +1100
Re: Performance of int/long in Python 3 jmfauth <wxjmfauth@gmail.com> - 2013-04-01 13:28 -0700
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-04-02 07:35 +1100
Re: Performance of int/long in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-04-01 22:38 +0100
Re: Performance of int/long in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-04-01 22:43 +0100
Re: Performance of int/long in Python 3 Neil Hodgson <nhodgson@iinet.net.au> - 2013-04-02 10:43 +1100
Re: Performance of int/long in Python 3 jmfauth <wxjmfauth@gmail.com> - 2013-04-02 00:24 -0700
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-04-02 19:03 +1100
Re: Performance of int/long in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-02 08:35 +0000
Re: Performance of int/long in Python 3 jmfauth <wxjmfauth@gmail.com> - 2013-04-02 02:24 -0700
Re: Performance of int/long in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-04-02 10:43 +0100
Re: Performance of int/long in Python 3 Steve Simmons <square.steve@gmail.com> - 2013-04-02 11:58 +0100
Re: Performance of int/long in Python 3 rusi <rustompmody@gmail.com> - 2013-04-02 06:42 -0700
Re: Performance of int/long in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-02 14:03 +0000
Re: Performance of int/long in Python 3 Steve Simmons <square.steve@gmail.com> - 2013-04-02 15:39 +0100
Re: Performance of int/long in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-04-02 16:02 +0100
Re: Performance of int/long in Python 3 jmfauth <wxjmfauth@gmail.com> - 2013-04-02 08:12 -0700
Re: Performance of int/long in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-04-02 16:43 +0100
Re: Performance of int/long in Python 3 rusi <rustompmody@gmail.com> - 2013-04-02 10:08 -0700
Re: Performance of int/long in Python 3 Terry Jan Reedy <tjreedy@udel.edu> - 2013-04-02 17:33 -0400
Re: Performance of int/long in Python 3 Joshua Landau <joshua.landau.ws@gmail.com> - 2013-04-02 23:40 +0100
Re: Performance of int/long in Python 3 Ethan Furman <ethan@stoneleaf.us> - 2013-04-02 08:09 -0700
Re: Performance of int/long in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-04-02 15:12 +0100
Re: Performance of int/long in Python 3 Steve Simmons <square.steve@gmail.com> - 2013-04-02 16:03 +0100
Re: Performance of int/long in Python 3 Ethan Furman <ethan@stoneleaf.us> - 2013-04-02 08:17 -0700
Re: Performance of int/long in Python 3 rusi <rustompmody@gmail.com> - 2013-04-02 09:57 -0700
Re: Performance of int/long in Python 3 jmfauth <wxjmfauth@gmail.com> - 2013-04-02 11:22 -0700
Re: Performance of int/long in Python 3 rusi <rustompmody@gmail.com> - 2013-04-02 11:50 -0700
Re: Performance of int/long in Python 3 Lele Gaifax <lele@metapensiero.it> - 2013-04-03 00:52 +0200
Re: Performance of int/long in Python 3 jmfauth <wxjmfauth@gmail.com> - 2013-04-02 02:20 -0700
Re: Performance of int/long in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2013-04-02 13:44 -0600
Re: Performance of int/long in Python 3 Neil Hodgson <nhodgson@iinet.net.au> - 2013-04-03 14:31 +1100
Re: Performance of int/long in Python 3 rusi <rustompmody@gmail.com> - 2013-04-02 20:53 -0700
Re: Performance of int/long in Python 3 Neil Hodgson <nhodgson@iinet.net.au> - 2013-04-03 15:03 +1100
Re: Performance of int/long in Python 3 rusi <rustompmody@gmail.com> - 2013-04-02 22:11 -0700
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-04-03 17:22 +1100
Re: Performance of int/long in Python 3 Roy Smith <roy@panix.com> - 2013-04-03 09:28 -0400
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-04-04 00:38 +1100
Re: Performance of int/long in Python 3 Roy Smith <roy@panix.com> - 2013-04-03 00:10 -0400
Re: Performance of int/long in Python 3 Neil Hodgson <nhodgson@iinet.net.au> - 2013-04-03 19:15 +1100
Re: Performance of int/long in Python 3 Roy Smith <roy@panix.com> - 2013-04-03 09:25 -0400
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-04-04 00:34 +1100
Re: Performance of int/long in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-03 05:32 +0000
Re: Performance of int/long in Python 3 Terry Jan Reedy <tjreedy@udel.edu> - 2013-04-03 02:19 -0400
Re: Performance of int/long in Python 3 Neil Hodgson <nhodgson@iinet.net.au> - 2013-04-03 17:27 +1100
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-04-03 17:25 +1100
Re: Performance of int/long in Python 3 Neil Hodgson <nhodgson@iinet.net.au> - 2013-04-03 17:29 +1100
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-04-03 17:52 +1100
Re: Performance of int/long in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2013-04-03 01:06 -0600
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-04-03 18:24 +1100
Re: Performance of int/long in Python 3 Neil Hodgson <nhodgson@iinet.net.au> - 2013-04-03 18:37 +1100
Re: Performance of int/long in Python 3 rusi <rustompmody@gmail.com> - 2013-04-03 01:07 -0700
Re: Performance of int/long in Python 3 Neil Hodgson <nhodgson@iinet.net.au> - 2013-04-03 19:22 +1100
Re: Performance of int/long in Python 3 Dave Angel <davea@davea.name> - 2013-04-03 06:20 -0400
Re: Performance of int/long in Python 3 Neil Hodgson <nhodgson@iinet.net.au> - 2013-04-03 22:05 +1100
Re: Performance of int/long in Python 3 Dave Angel <davea@davea.name> - 2013-04-03 07:52 -0400
Sorting [was Re: Performance of int/long in Python 3] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-03 14:43 +0000
Re: Sorting [was Re: Performance of int/long in Python 3] Roy Smith <roy@panix.com> - 2013-04-03 11:00 -0400
Re: Performance of int/long in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2013-04-03 10:30 -0600
Re: Performance of int/long in Python 3 Dave Angel <davea@davea.name> - 2013-04-03 13:51 -0400
Re: Performance of int/long in Python 3 Neil Hodgson <nhodgson@iinet.net.au> - 2013-04-04 09:58 +1100
Re: Performance of int/long in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-03 07:53 +0000
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-04-03 19:02 +1100
Re: Performance of int/long in Python 3 jmfauth <wxjmfauth@gmail.com> - 2013-04-03 01:08 -0700
Re: Performance of int/long in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-04-03 12:27 +0100
Re: Performance of int/long in Python 3 Roy Smith <roy@panix.com> - 2013-04-03 09:43 -0400
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-04-04 01:17 +1100
Re: Performance of int/long in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-03 15:07 +0000
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-04-04 08:57 +1100
Re: Performance of int/long in Python 3 Serhiy Storchaka <storchaka@gmail.com> - 2013-04-06 12:09 +0300
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-04-07 07:24 +1000
Re: Performance of int/long in Python 3 Ethan Furman <ethan@stoneleaf.us> - 2013-04-06 14:58 -0700
Re: Performance of int/long in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-07 01:29 +0000
Re: Performance of int/long in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2013-04-06 19:58 -0600
Re: Performance of int/long in Python 3 Roy Smith <roy@panix.com> - 2013-04-06 22:18 -0400
Re: Performance of int/long in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2013-04-06 23:22 -0600
Re: Performance of int/long in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-07 08:29 +0000
Re: Performance of int/long in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2013-04-06 20:00 -0600
Re: Performance of int/long in Python 3 Serhiy Storchaka <storchaka@gmail.com> - 2013-04-07 11:02 +0300
Re: Performance of int/long in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-04-07 16:14 +0100
Re: Performance of int/long in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-03 15:02 +0000
Re: Performance of int/long in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2013-04-03 10:38 -0600
Re: Performance of int/long in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-03 17:43 +0000
Re: Performance of int/long in Python 3 Chris Angelico <rosuav@gmail.com> - 2013-04-04 08:55 +1100
Re: Performance of int/long in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-04-03 23:39 +0100
Re: Performance of int/long in Python 3 Roy Smith <roy@panix.com> - 2013-04-03 20:49 -0400
Re: Performance of int/long in Python 3 rusi <rustompmody@gmail.com> - 2013-04-03 09:10 -0700
Re: Performance of int/long in Python 3 Ethan Furman <ethan@stoneleaf.us> - 2013-04-03 10:09 -0700
Re: Performance of int/long in Python 3 Roy Smith <roy@panix.com> - 2013-04-03 20:46 -0400
Re: Performance of int/long in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2013-04-03 10:53 -0600
Re: Performance of int/long in Python 3 Neil Hodgson <nhodgson@iinet.net.au> - 2013-04-02 20:28 +1100
Re: Performance of int/long in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-04-03 14:56 +0100
Re: Performance of int/long in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-04-01 20:54 +0100
Re: Performance of int/long in Python 3 roy@panix.com (Roy Smith) - 2013-04-01 16:31 -0400
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-29 00:35 +0000
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Chris Angelico <rosuav@gmail.com> - 2013-03-28 21:22 +1100
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Ned Deily <nad@acm.org> - 2013-03-28 13:23 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Ethan Furman <ethan@stoneleaf.us> - 2013-03-27 23:12 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] jmfauth <wxjmfauth@gmail.com> - 2013-03-28 02:03 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Ian Foote <ian@feete.org> - 2013-03-28 09:36 +0000
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Neil Hodgson <nhodgson@iinet.net.au> - 2013-03-28 23:11 +1100
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-28 13:01 +0000
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] jmfauth <wxjmfauth@gmail.com> - 2013-03-28 07:12 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Chris Angelico <rosuav@gmail.com> - 2013-03-29 01:38 +1100
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] jmfauth <wxjmfauth@gmail.com> - 2013-03-28 08:14 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Chris Angelico <rosuav@gmail.com> - 2013-03-29 02:21 +1100
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] jmfauth <wxjmfauth@gmail.com> - 2013-03-28 08:45 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Terry Reedy <tjreedy@udel.edu> - 2013-03-28 12:01 -0400
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Ian Kelly <ian.g.kelly@gmail.com> - 2013-03-28 10:11 -0600
Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-29 00:39 +0000
Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] Chris Angelico <rosuav@gmail.com> - 2013-03-29 11:54 +1100
Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-29 02:37 +0000
Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] Chris Angelico <rosuav@gmail.com> - 2013-03-29 13:44 +1100
Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] Ian Kelly <ian.g.kelly@gmail.com> - 2013-03-29 00:11 -0600
Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] Ian Kelly <ian.g.kelly@gmail.com> - 2013-03-29 00:22 -0600
Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] Terry Reedy <tjreedy@udel.edu> - 2013-03-29 14:06 -0400
Re: Surrogate pairs in new flexible string representation Christian Heimes <christian@python.org> - 2013-03-29 23:05 +0100
Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-03-29 01:03 +0000
Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] Chris Angelico <rosuav@gmail.com> - 2013-03-29 12:10 +1100
Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] MRAB <python@mrabarnett.plus.com> - 2013-03-29 02:00 +0000
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Chris Angelico <rosuav@gmail.com> - 2013-03-29 03:16 +1100
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Ian Kelly <ian.g.kelly@gmail.com> - 2013-03-28 10:01 -0600
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Neil Hodgson <nhodgson@iinet.net.au> - 2013-03-29 14:34 +1100
unicode and the FSR [was: Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] Ethan Furman <ethan@stoneleaf.us> - 2013-03-28 21:56 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Chris Angelico <rosuav@gmail.com> - 2013-03-29 16:33 +1100
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Neil Hodgson <nhodgson@iinet.net.au> - 2013-03-29 16:46 +1100
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] MRAB <python@mrabarnett.plus.com> - 2013-03-28 14:51 +0000
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Neil Hodgson <nhodgson@iinet.net.au> - 2013-03-29 14:57 +1100
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Chris Angelico <rosuav@gmail.com> - 2013-03-29 02:07 +1100
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-03-28 09:47 +0000
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Chris Angelico <rosuav@gmail.com> - 2013-03-28 21:30 +1100
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] jmfauth <wxjmfauth@gmail.com> - 2013-03-28 06:34 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Ian Kelly <ian.g.kelly@gmail.com> - 2013-03-28 10:33 -0600
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] jmfauth <wxjmfauth@gmail.com> - 2013-03-28 09:55 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Chris Angelico <rosuav@gmail.com> - 2013-03-29 04:13 +1100
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] jmfauth <wxjmfauth@gmail.com> - 2013-03-28 10:48 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Chris Angelico <rosuav@gmail.com> - 2013-03-29 04:55 +1100
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] jmfauth <wxjmfauth@gmail.com> - 2013-03-28 13:26 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Chris Angelico <rosuav@gmail.com> - 2013-03-29 08:45 +1100
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Terry Reedy <tjreedy@udel.edu> - 2013-03-28 19:12 -0400
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Benjamin Kaplan <benjamin.kaplan@case.edu> - 2013-03-28 13:29 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] jmfauth <wxjmfauth@gmail.com> - 2013-03-28 14:11 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] jmfauth <wxjmfauth@gmail.com> - 2013-03-28 14:33 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] MRAB <python@mrabarnett.plus.com> - 2013-03-28 21:50 +0000
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Benjamin Kaplan <benjamin.kaplan@case.edu> - 2013-03-28 14:52 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-03-28 19:53 -0400
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Chris Angelico <rosuav@gmail.com> - 2013-03-29 11:03 +1100
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-03-29 00:15 +0000
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Chris Angelico <rosuav@gmail.com> - 2013-03-28 14:40 +1100
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] 88888 Dihedral <dihedral88888@googlemail.com> - 2013-03-28 16:04 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] 88888 Dihedral <dihedral88888@googlemail.com> - 2013-03-28 16:04 -0700
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-03-28 12:39 +0000
Page 9 of 11 — ← Prev page 1 … 7 8 [9] 10 11 Next page →
| From | jmfauth <wxjmfauth@gmail.com> |
|---|---|
| Date | 2013-03-28 08:45 -0700 |
| Subject | Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] |
| Message-ID | <9f3c6d10-46a3-4835-baf5-76a7da4b0c7c@k1g2000yqf.googlegroups.com> |
| In reply to | #42143 |
On 28 mar, 16:14, jmfauth <wxjmfa...@gmail.com> wrote: > On 28 mar, 15:38, Chris Angelico <ros...@gmail.com> wrote: > > > > > > > > > > > On Fri, Mar 29, 2013 at 1:12 AM, jmfauth <wxjmfa...@gmail.com> wrote: > > > This flexible string representation is so absurd that not only > > > "it" does not know you can not write Western European Languages > > > with latin-1, "it" penalizes you by just attempting to optimize > > > latin-1. Shown in my multiple examples. > > > PEP393 strings have two optimizations, or kinda three: > > > 1a) ASCII-only strings > > 1b) Latin1-only strings > > 2) BMP-only strings > > 3) Everything else > > > Options 1a and 1b are almost identical - I'm not sure what the detail > > is, but there's something flagging those strings that fit inside seven > > bits. (Something to do with optimizing encodings later?) Both are > > optimized down to a single byte per character. > > > Option 2 is optimized to two bytes per character. > > > Option 3 is stored in UTF-32. > > > Once again, jmf, you are forgetting that option 2 is a safe and > > bug-free optimization. > > > ChrisA > > As long as you are attempting to devide a set of characters in > chunks and try to handle them seperately, it will never work. > > Read my previous post about the unicode transformation format. > I know what pep393 does. > > jmf Addendum. This was you correctly percieved in one another thread. You qualified it as a "switch". Now you have to understand from where this "switch" is coming from. jmf by toy with
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2013-03-28 12:01 -0400 |
| Subject | Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] |
| Message-ID | <mailman.3895.1364486507.2939.python-list@python.org> |
| In reply to | #42128 |
On 3/28/2013 10:38 AM, Chris Angelico wrote: > PEP393 strings have two optimizations, or kinda three: > > 1a) ASCII-only strings > 1b) Latin1-only strings > 2) BMP-only strings > 3) Everything else > > Options 1a and 1b are almost identical - I'm not sure what the detail > is, but there's something flagging those strings that fit inside seven > bits. (Something to do with optimizing encodings later?) Yes. 'Encoding' an ascii-only string to any ascii-compatible encoding amounts to a simple copy of the internal bytes. I do not know if *all* the codecs for such encodings are 393-aware, but I do know that the utf-8 and latin-1 group are. This is one operation that 3.3+ does much faster than 3.2- -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2013-03-28 10:11 -0600 |
| Subject | Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] |
| Message-ID | <mailman.3898.1364487167.2939.python-list@python.org> |
| In reply to | #42128 |
On Thu, Mar 28, 2013 at 8:38 AM, Chris Angelico <rosuav@gmail.com> wrote: > PEP393 strings have two optimizations, or kinda three: > > 1a) ASCII-only strings > 1b) Latin1-only strings > 2) BMP-only strings > 3) Everything else > > Options 1a and 1b are almost identical - I'm not sure what the detail > is, but there's something flagging those strings that fit inside seven > bits. (Something to do with optimizing encodings later?) Both are > optimized down to a single byte per character. The only difference for ASCII-only strings is that they are kept in a struct with a smaller header. The smaller header omits the utf8 pointer (which optionally points to an additional UTF-8 representation of the string) and its associated length variable. These are not needed for ASCII-only strings because an ASCII string can be directly interpreted as a UTF-8 string for the same result. The smaller header also omits the "wstr_length" field which, according to the PEP, "differs from length only if there are surrogate pairs in the representation." For an ASCII string, of course there would not be any surrogate pairs.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-03-29 00:39 +0000 |
| Subject | Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] |
| Message-ID | <5154e2dd$0$29974$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #42164 |
On Thu, 28 Mar 2013 10:11:59 -0600, Ian Kelly wrote: > On Thu, Mar 28, 2013 at 8:38 AM, Chris Angelico <rosuav@gmail.com> > wrote: >> PEP393 strings have two optimizations, or kinda three: >> >> 1a) ASCII-only strings >> 1b) Latin1-only strings >> 2) BMP-only strings >> 3) Everything else >> >> Options 1a and 1b are almost identical - I'm not sure what the detail >> is, but there's something flagging those strings that fit inside seven >> bits. (Something to do with optimizing encodings later?) Both are >> optimized down to a single byte per character. > > The only difference for ASCII-only strings is that they are kept in a > struct with a smaller header. The smaller header omits the utf8 pointer > (which optionally points to an additional UTF-8 representation of the > string) and its associated length variable. These are not needed for > ASCII-only strings because an ASCII string can be directly interpreted > as a UTF-8 string for the same result. The smaller header also omits > the "wstr_length" field which, according to the PEP, "differs from > length only if there are surrogate pairs in the representation." For an > ASCII string, of course there would not be any surrogate pairs. I wonder why they need care about surrogate pairs? ASCII and Latin-1 strings obviously do not have them. Nor do BMP-only strings. It's only strings in the SMPs that could need surrogate pairs, and they don't need them in Python's implementation since it's a full 32- bit implementation. So where do the surrogate pairs come into this? I also wonder why the implementation bothers keeping a UTF-8 representation. That sounds like premature optimization to me. Surely you only need it when writing to a file with UTF-8 encoding? For most strings, that will never happen. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-03-29 11:54 +1100 |
| Subject | Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] |
| Message-ID | <mailman.3928.1364518484.2939.python-list@python.org> |
| In reply to | #42208 |
On Fri, Mar 29, 2013 at 11:39 AM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > ASCII and Latin-1 strings obviously do not have them. Nor do BMP-only > strings. It's only strings in the SMPs that could need surrogate pairs, > and they don't need them in Python's implementation since it's a full 32- > bit implementation. So where do the surrogate pairs come into this? PEP 393 says: """ wstr_length, wstr: representation in platform's wchar_t (null-terminated). If wchar_t is 16-bit, this form may use surrogate pairs (in which cast wstr_length differs form length). wstr_length differs from length only if there are surrogate pairs in the representation. utf8_length, utf8: UTF-8 representation (null-terminated). data: shortest-form representation of the unicode string. The string is null-terminated (in its respective representation). All three representations are optional, although the data form is considered the canonical representation which can be absent only while the string is being created. If the representation is absent, the pointer is NULL, and the corresponding length field may contain arbitrary data. """ If the string was created from a wchar_t string, that string will be retained, and presumably can be used to re-output the original for a clean and fast round-trip. Same with... > I also wonder why the implementation bothers keeping a UTF-8 > representation. That sounds like premature optimization to me. Surely you > only need it when writing to a file with UTF-8 encoding? For most > strings, that will never happen. ... the UTF-8 version. It'll keep it if it has it, and not else. A lot of content will go out in the same encoding it came in in, so it makes sense to hang onto it where possible. Though, from the same quote: The UTF-8 representation is null-terminated. Does this mean that it can't be used if there might be a \0 in the string? Minor nitpick, btw: > (in which cast wstr_length differs form length) Should be "in which case" and "from". Who has the power to correct typos in PEPs? ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-03-29 02:37 +0000 |
| Subject | Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] |
| Message-ID | <5154fe82$0$29974$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #42209 |
On Fri, 29 Mar 2013 11:54:41 +1100, Chris Angelico wrote: > On Fri, Mar 29, 2013 at 11:39 AM, Steven D'Aprano > <steve+comp.lang.python@pearwood.info> wrote: >> ASCII and Latin-1 strings obviously do not have them. Nor do BMP-only >> strings. It's only strings in the SMPs that could need surrogate pairs, >> and they don't need them in Python's implementation since it's a full >> 32- bit implementation. So where do the surrogate pairs come into this? > > PEP 393 says: > """ > wstr_length, wstr: representation in platform's wchar_t > (null-terminated). If wchar_t is 16-bit, this form may use surrogate > pairs (in which cast wstr_length differs form length). wstr_length > differs from length only if there are surrogate pairs in the > representation. > > utf8_length, utf8: UTF-8 representation (null-terminated). > > data: shortest-form representation of the unicode string. The string is > null-terminated (in its respective representation). > > All three representations are optional, although the data form is > considered the canonical representation which can be absent only while > the string is being created. If the representation is absent, the > pointer is NULL, and the corresponding length field may contain > arbitrary data. > """ All the words are in English (well, most of them...) but what does it mean? > If the string was created from a wchar_t string, that string will be > retained, and presumably can be used to re-output the original for a > clean and fast round-trip. Under what circumstances will a string be created from a wchar_t string? How, and why, would such a string be created? Why would Python still support strings containing surrogates when it now has a nice, shiny, surrogate-free flexible representation? >> I also wonder why the implementation bothers keeping a UTF-8 >> representation. That sounds like premature optimization to me. Surely >> you only need it when writing to a file with UTF-8 encoding? For most >> strings, that will never happen. > > ... the UTF-8 version. It'll keep it if it has it, and not else. A lot > of content will go out in the same encoding it came in in, so it makes > sense to hang onto it where possible. Not to me. That almost doubles the size of the string, on the off-chance that you'll need the UTF-8 encoding. Which for many uses, you don't, and even if you do, it seems like premature optimization to keep it around just in case. Encoding to UTF-8 will be fast for small N, and for large N, why carry around (potentially) multiple megabytes of duplicated data just in case the encoded version is needed some time? > Though, from the same quote: The UTF-8 representation is > null-terminated. Does this mean that it can't be used if there might be > a \0 in the string? > > Minor nitpick, btw: >> (in which cast wstr_length differs form length) > Should be "in which case" and "from". Who has the power to correct typos > in PEPs? > > ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-03-29 13:44 +1100 |
| Subject | Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] |
| Message-ID | <mailman.3932.1364525092.2939.python-list@python.org> |
| In reply to | #42213 |
On Fri, Mar 29, 2013 at 1:37 PM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > Under what circumstances will a string be created from a wchar_t string? > How, and why, would such a string be created? Why would Python still > support strings containing surrogates when it now has a nice, shiny, > surrogate-free flexible representation? Strings are created from some form of content. If not from another Python string, then - most likely - it's from a stream of bytes. If from a C API that returns wchar_t, then it'd make sense to have that form around. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2013-03-29 00:11 -0600 |
| Subject | Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] |
| Message-ID | <mailman.3939.1364537545.2939.python-list@python.org> |
| In reply to | #42213 |
On Thu, Mar 28, 2013 at 8:37 PM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: >>> I also wonder why the implementation bothers keeping a UTF-8 >>> representation. That sounds like premature optimization to me. Surely >>> you only need it when writing to a file with UTF-8 encoding? For most >>> strings, that will never happen. >> >> ... the UTF-8 version. It'll keep it if it has it, and not else. A lot >> of content will go out in the same encoding it came in in, so it makes >> sense to hang onto it where possible. > > Not to me. That almost doubles the size of the string, on the off-chance > that you'll need the UTF-8 encoding. Which for many uses, you don't, and > even if you do, it seems like premature optimization to keep it around > just in case. Encoding to UTF-8 will be fast for small N, and for large > N, why carry around (potentially) multiple megabytes of duplicated data > just in case the encoded version is needed some time? >From the PEP: """ A new function PyUnicode_AsUTF8 is provided to access the UTF-8 representation. It is thus identical to the existing _PyUnicode_AsString, which is removed. The function will compute the utf8 representation when first called. Since this representation will consume memory until the string object is released, applications should use the existing PyUnicode_AsUTF8String where possible (which generates a new string object every time). APIs that implicitly converts a string to a char* (such as the ParseTuple functions) will use PyUnicode_AsUTF8 to compute a conversion. """ So the utf8 representation is not populated when the string is created, but when a utf8 representation is requested, and only when requested by the API that returns a char*, not by the API that returns a bytes object.
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2013-03-29 00:22 -0600 |
| Subject | Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] |
| Message-ID | <mailman.3940.1364538176.2939.python-list@python.org> |
| In reply to | #42213 |
On Fri, Mar 29, 2013 at 12:11 AM, Ian Kelly <ian.g.kelly@gmail.com> wrote: > From the PEP: > > """ > A new function PyUnicode_AsUTF8 is provided to access the UTF-8 > representation. It is thus identical to the existing > _PyUnicode_AsString, which is removed. The function will compute the > utf8 representation when first called. Since this representation will > consume memory until the string object is released, applications > should use the existing PyUnicode_AsUTF8String where possible (which > generates a new string object every time). APIs that implicitly > converts a string to a char* (such as the ParseTuple functions) will > use PyUnicode_AsUTF8 to compute a conversion. > """ > > So the utf8 representation is not populated when the string is > created, but when a utf8 representation is requested, and only when > requested by the API that returns a char*, not by the API that returns > a bytes object. Since the PEP specifically mentions ParseTuple string conversion, I am thinking that this is probably the motivation for caching it. A string that is passed into a C function (that uses one of the various UTF-8 char* format specifiers) is perhaps likely to be passed into that function again at some point, so the UTF-8 representation is kept around to avoid the need to recompose it at on each call.
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2013-03-29 14:06 -0400 |
| Subject | Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] |
| Message-ID | <mailman.3957.1364580449.2939.python-list@python.org> |
| In reply to | #42213 |
On 3/28/2013 10:37 PM, Steven D'Aprano wrote: > Under what circumstances will a string be created from a wchar_t string? > How, and why, would such a string be created? Why would Python still > support strings containing surrogates when it now has a nice, shiny, > surrogate-free flexible representation? I believe because surrogates are legal codepoints and users may put them in strings even though python does not (except for surrogate_escape error handling). I believe some of the internal complexity comes from supporting the old C-api so as to not immediately invalidate existing extensions. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Christian Heimes <christian@python.org> |
|---|---|
| Date | 2013-03-29 23:05 +0100 |
| Subject | Re: Surrogate pairs in new flexible string representation |
| Message-ID | <mailman.3969.1364594730.2939.python-list@python.org> |
| In reply to | #42213 |
Am 29.03.2013 07:22, schrieb Ian Kelly: > Since the PEP specifically mentions ParseTuple string conversion, I am > thinking that this is probably the motivation for caching it. A > string that is passed into a C function (that uses one of the various > UTF-8 char* format specifiers) is perhaps likely to be passed into > that function again at some point, so the UTF-8 representation is kept > around to avoid the need to recompose it at on each call. It's not just about caching but also about memory management. The additional utf8 member is required for backward compatibility. The APIs expect a pointer to an existing and shared block of memory. They don't take ownership of the memory block and therefore don't free() it. Christian
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2013-03-29 01:03 +0000 |
| Subject | Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] |
| Message-ID | <mailman.3929.1364519036.2939.python-list@python.org> |
| In reply to | #42208 |
On 29/03/2013 00:54, Chris Angelico wrote: > > Minor nitpick, btw: >> (in which cast wstr_length differs form length) > Should be "in which case" and "from". Who has the power to correct > typos in PEPs? > > ChrisA > Sneak it in here? http://bugs.python.org/issue13604 -- If you're using GoogleCrap™ please read this http://wiki.python.org/moin/GoogleGroupsPython. Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-03-29 12:10 +1100 |
| Subject | Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] |
| Message-ID | <mailman.3930.1364519457.2939.python-list@python.org> |
| In reply to | #42208 |
On Fri, Mar 29, 2013 at 12:03 PM, Mark Lawrence <breamoreboy@yahoo.co.uk> wrote: > On 29/03/2013 00:54, Chris Angelico wrote: >> Minor nitpick, btw: >>> >>> (in which cast wstr_length differs form length) >> >> Should be "in which case" and "from". Who has the power to correct >> typos in PEPs? > > Sneak it in here? http://bugs.python.org/issue13604 Ah! Turns out it's already been fixed; a reword of that section, as shown in the attached files, no longer has the parenthesis, and thus its typos. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2013-03-29 02:00 +0000 |
| Subject | Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] |
| Message-ID | <mailman.3931.1364522420.2939.python-list@python.org> |
| In reply to | #42208 |
On 29/03/2013 00:54, Chris Angelico wrote: > On Fri, Mar 29, 2013 at 11:39 AM, Steven D'Aprano > <steve+comp.lang.python@pearwood.info> wrote: >> ASCII and Latin-1 strings obviously do not have them. Nor do BMP-only >> strings. It's only strings in the SMPs that could need surrogate pairs, >> and they don't need them in Python's implementation since it's a full 32- >> bit implementation. So where do the surrogate pairs come into this? > > PEP 393 says: > """ > wstr_length, wstr: representation in platform's wchar_t > (null-terminated). If wchar_t is 16-bit, this form may use surrogate > pairs (in which cast wstr_length differs form length). wstr_length > differs from length only if there are surrogate pairs in the > representation. > > utf8_length, utf8: UTF-8 representation (null-terminated). > > data: shortest-form representation of the unicode string. The string > is null-terminated (in its respective representation). > > All three representations are optional, although the data form is > considered the canonical representation which can be absent only while > the string is being created. If the representation is absent, the > pointer is NULL, and the corresponding length field may contain > arbitrary data. > """ > > If the string was created from a wchar_t string, that string will be > retained, and presumably can be used to re-output the original for a > clean and fast round-trip. Same with... > >> I also wonder why the implementation bothers keeping a UTF-8 >> representation. That sounds like premature optimization to me. Surely you >> only need it when writing to a file with UTF-8 encoding? For most >> strings, that will never happen. > > ... the UTF-8 version. It'll keep it if it has it, and not else. A lot > of content will go out in the same encoding it came in in, so it makes > sense to hang onto it where possible. > > Though, from the same quote: The UTF-8 representation is > null-terminated. Does this mean that it can't be used if there might > be a \0 in the string? > You could ask the same question about any encoding. It's only an issue if it's passed to a C function which expects a null-terminated string. > Minor nitpick, btw: >> (in which cast wstr_length differs form length) > Should be "in which case" and "from". Who has the power to correct > typos in PEPs? >
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-03-29 03:16 +1100 |
| Subject | Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] |
| Message-ID | <mailman.3900.1364487418.2939.python-list@python.org> |
| In reply to | #42128 |
On Fri, Mar 29, 2013 at 3:01 AM, Terry Reedy <tjreedy@udel.edu> wrote: > On 3/28/2013 10:38 AM, Chris Angelico wrote: > >> PEP393 strings have two optimizations, or kinda three: >> >> 1a) ASCII-only strings >> 1b) Latin1-only strings >> 2) BMP-only strings >> 3) Everything else >> >> Options 1a and 1b are almost identical - I'm not sure what the detail >> is, but there's something flagging those strings that fit inside seven >> bits. (Something to do with optimizing encodings later?) > > > Yes. 'Encoding' an ascii-only string to any ascii-compatible encoding > amounts to a simple copy of the internal bytes. I do not know if *all* the > codecs for such encodings are 393-aware, but I do know that the utf-8 and > latin-1 group are. This is one operation that 3.3+ does much faster than > 3.2- Thanks Terry. So that's not so much a representation difference as a flag that costs little or nothing to retain, and can improve performance in the encode later on. Sounds like a useful tweak to the basics of flexible string representation, without being particularly germane to jmf's complaints. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2013-03-28 10:01 -0600 |
| Subject | Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] |
| Message-ID | <mailman.3896.1364486514.2939.python-list@python.org> |
| In reply to | #42123 |
On Thu, Mar 28, 2013 at 7:01 AM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > Any string method that takes a starting offset requires the method to > walk the string byte-by-byte. I've even seen languages put responsibility > for dealing with that onto the programmer: the "start offset" is given in > *bytes*, not characters. I don't remember what language this was... it > might have been Haskell? Whatever it was, it horrified me. Go does this. I remember because it came up in one of these threads, where jmf (or was it Ranting Rick?) was praising Go for just getting Unicode "right".
[toc] | [prev] | [next] | [standalone]
| From | Neil Hodgson <nhodgson@iinet.net.au> |
|---|---|
| Date | 2013-03-29 14:34 +1100 |
| Subject | Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] |
| Message-ID | <-LGdnWTpyKcdkcjMnZ2dnUVZ_jCdnZ2d@westnet.com.au> |
| In reply to | #42123 |
Steven D'Aprano:
> Some string operations need to inspect every character, e.g. str.upper().
> Even for them, the increased complexity of a variable-width encoding
> costs. It's not sufficient to walk the string inspecting a fixed 1, 2 or
> 4 bytes per character. You have to walk the string grabbing 1 byte at a
> time, and then decide whether you need another 1, 2 or 3 bytes. Even
> though it's still O(N), the added bit-masking and overhead of variable-
> width encoding adds to the overall cost.
It does add to implementation complexity but should only add a small
amount of time.
To compare costs, I am using the text of the web site
http://www.mofa.go.jp/mofaj/ since it has a reasonable amount (10%) of
multi-byte characters. Since the document fits in the the BMP, Python
would choose a 2-byte wide implementation so I am emulating that choice
with a very simple 16-bit table-based upper-caser. Real Unicode case
conversion code is more concerned with edge cases like Turkic and
Lithuanian locales and Greek combining characters and also allowing for
measurement/reallocation for the cases where the result is
smaller/larger. See, for example, glib's real_toupper in
https://git.gnome.org/browse/glib/tree/glib/guniprop.c
Here is some simplified example code that implements upper-casing
over 16-bit wide (utf16_up) and UTF-8 (utf8_up) buffers:
http://www.scintilla.org/UTF8Up.cxx
Since I didn't want to spend too much time writing code it only
handles the BMP and doesn't have upper-case table entries outside ASCII
for now. If this was going to be worked on further to be made
maintainable, most of the masking and so forth would be in macros
similar to UTF8_COMPUTE/UTF8_GET in glib.
The UTF-8 case ranges from around 5% slower on average in a 32 bit
release build (VC2012 on an i7 870) to averaging a little faster in a
64-bit build. They're both around a billion characters per-second.
C:\u\hg\UpUTF\UpUTF>..\x64\Release\UpUTF.exe
Time taken for UTF8 of 80449=0.006528
Time taken for UTF16 of 71525=0.006610
Relative time taken UTF8/UTF16 0.987581
> Any string method that takes a starting offset requires the method to
> walk the string byte-by-byte. I've even seen languages put responsibility
> for dealing with that onto the programmer: the "start offset" is given in
> *bytes*, not characters. I don't remember what language this was... it
> might have been Haskell? Whatever it was, it horrified me.
It doesn't horrify me - I've been working this way for over 10 years
and it seems completely natural. You can wrap access in iterators that
hide the byte offsets if you like. This then ensures that all operations
on those iterators are safe only allowing the iterator to point at the
start/end of valid characters.
> Sure. And over a different set of samples, it is less compact. If you
> write a lot of Latin-1, Python will use one byte per character, while
> UTF-8 will use two bytes per character.
I think you mean writing a lot of Latin-1 characters outside ASCII.
However, even people writing texts in, say, French will find that only a
small proportion of their text is outside ASCII and so the cost of UTF-8
is correspondingly small.
The counter-problem is that a French document that needs to include
one mathematical symbol (or emoji) outside Latin-1 will double in size
as a Python string.
Neil
[toc] | [prev] | [next] | [standalone]
| From | Ethan Furman <ethan@stoneleaf.us> |
|---|---|
| Date | 2013-03-28 21:56 -0700 |
| Subject | unicode and the FSR [was: Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]] |
| Message-ID | <mailman.3936.1364533284.2939.python-list@python.org> |
| In reply to | #42216 |
On 03/28/2013 08:34 PM, Neil Hodgson wrote: > Steven D'Aprano: > >> Any string method that takes a starting offset requires the method to >> walk the string byte-by-byte. I've even seen languages put responsibility >> for dealing with that onto the programmer: the "start offset" is given in >> *bytes*, not characters. I don't remember what language this was... it >> might have been Haskell? Whatever it was, it horrified me. > > It doesn't horrify me - I've been working this way for over 10 years and it seems completely natural. Horrifying or not, I am willing to give up a small amount of speed for correctness. Heck, I'm willing to give up a lot of speed for correctness. Once I have my slow but correct prototype going I can recode in a faster language (if needed) and compare it's blazingly fast output with my slowly-generated but known-good output. > You can wrap > access in iterators that hide the byte offsets if you like. This then ensures that all operations on those iterators are > safe only allowing the iterator to point at the start/end of valid characters. Sure. Or I can let Python handle it for me. > The counter-problem is that a French document that needs to include one mathematical symbol (or emoji) outside > Latin-1 will double in size as a Python string. True. But how often do you have the entire document as a single string? Use readlines() instead of read(). Besides, memory is cheap. -- ~Ethan~
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-03-29 16:33 +1100 |
| Subject | Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] |
| Message-ID | <mailman.3937.1364535241.2939.python-list@python.org> |
| In reply to | #42216 |
On Fri, Mar 29, 2013 at 2:34 PM, Neil Hodgson <nhodgson@iinet.net.au> wrote: > It doesn't horrify me - I've been working this way for over 10 years and > it seems completely natural. You can wrap access in iterators that hide the > byte offsets if you like. This then ensures that all operations on those > iterators are safe only allowing the iterator to point at the start/end of > valid characters. But both this and your example of case conversion are, fundamentally, iterating over the string. What if you aren't doing that? What if you want to parse and process? ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Neil Hodgson <nhodgson@iinet.net.au> |
|---|---|
| Date | 2013-03-29 16:46 +1100 |
| Subject | Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] |
| Message-ID | <mtGdnT6PbeH5tsjMnZ2dnUVZ_vadnZ2d@westnet.com.au> |
| In reply to | #42224 |
Chris Angelico:
> But both this and your example of case conversion are, fundamentally,
> iterating over the string. What if you aren't doing that? What if you
> want to parse and process?
Parsing is also normally a scanning operation. If you want to
process pieces of the string based on the parse then you remember the
positions (as iterators) at the significant places and extract/process
the data based on those positions.
Neil
[toc] | [prev] | [next] | [standalone]
Page 9 of 11 — ← Prev page 1 … 7 8 [9] 10 11 Next page →
Back to top | Article view | comp.lang.python
csiph-web