Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #90130 > unrolled thread
| Started by | jonathan.slenders@gmail.com |
|---|---|
| First post | 2015-05-08 02:14 -0700 |
| Last post | 2015-05-11 08:03 +0100 |
| Articles | 9 — 5 participants |
Back to article view | Back to comp.lang.python
Why is array.array('u') deprecated? jonathan.slenders@gmail.com - 2015-05-08 02:14 -0700
Re: Why is array.array('u') deprecated? wxjmfauth@gmail.com - 2015-05-08 02:37 -0700
Re: Why is array.array('u') deprecated? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-08 20:29 +1000
Re: Why is array.array('u') deprecated? jonathan.slenders@gmail.com - 2015-05-08 04:05 -0700
Re: Why is array.array('u') deprecated? Peter Otten <__peter__@web.de> - 2015-05-08 13:35 +0200
Re: Why is array.array('u') deprecated? jonathan.slenders@gmail.com - 2015-05-08 05:12 -0700
Re: Why is array.array('u') deprecated? Peter Otten <__peter__@web.de> - 2015-05-08 15:11 +0200
Re: Why is array.array('u') deprecated? jonathan.slenders@gmail.com - 2015-05-08 07:40 -0700
Re: Why is array.array('u') deprecated? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-05-11 08:03 +0100
| From | jonathan.slenders@gmail.com |
|---|---|
| Date | 2015-05-08 02:14 -0700 |
| Subject | Why is array.array('u') deprecated? |
| Message-ID | <80167ebf-c5ee-4699-89ca-ef544fe3decc@googlegroups.com> |
Why is array.array('u') deprecated?
Will we get an alternative for a character array or mutable unicode string?
Thanks!
Jonathan
[toc] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2015-05-08 02:37 -0700 |
| Message-ID | <9991c4a3-43e4-4d03-ae44-f556b7b147e5@googlegroups.com> |
| In reply to | #90130 |
Le vendredi 8 mai 2015 11:15:08 UTC+2, jonathan...@gmail.com a écrit :
> Why is array.array('u') deprecated?
>
In order to make Python a non unicode compliant [*]
(Unicode.org) tool.
[*] and buggy!
jmf
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2015-05-08 20:29 +1000 |
| Message-ID | <554c8ff2$0$12990$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #90130 |
On Fri, 8 May 2015 07:14 pm, jonathan.slenders@gmail.com wrote:
> Why is array.array('u') deprecated?
>
> Will we get an alternative for a character array or mutable unicode
> string?
Good question.
Of the three main encodings for Unicode, two are variable-width:
* UTF-8 uses 1-4 bytes per character
* UTF-16 uses 2 or 4 bytes per character
while UTF-32 is fixed-width (4 bytes per character). So you could try faking
it with a 32-bit array and filling it with string.encode('utf-32').
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | jonathan.slenders@gmail.com |
|---|---|
| Date | 2015-05-08 04:05 -0700 |
| Message-ID | <93aaa9cd-5d30-44db-9324-4fffe292e9e4@googlegroups.com> |
| In reply to | #90137 |
Le vendredi 8 mai 2015 12:29:15 UTC+2, Steven D'Aprano a écrit :
> On Fri, 8 May 2015 07:14 pm, jonathan.slenders wrote:
>
> > Why is array.array('u') deprecated?
> >
> > Will we get an alternative for a character array or mutable unicode
> > string?
>
>
> Good question.
>
> Of the three main encodings for Unicode, two are variable-width:
>
> * UTF-8 uses 1-4 bytes per character
> * UTF-16 uses 2 or 4 bytes per character
>
> while UTF-32 is fixed-width (4 bytes per character). So you could try faking
> it with a 32-bit array and filling it with string.encode('utf-32').
I guess that doesn't work. I need to have something that I can pass to the re module for searching through it. Creating new strings all the time is no option. (Think about gigabyte strings.)
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2015-05-08 13:35 +0200 |
| Message-ID | <mailman.237.1431084939.12865.python-list@python.org> |
| In reply to | #90143 |
jonathan.slenders@gmail.com wrote:
> Le vendredi 8 mai 2015 12:29:15 UTC+2, Steven D'Aprano a écrit :
>> On Fri, 8 May 2015 07:14 pm, jonathan.slenders wrote:
>>
>> > Why is array.array('u') deprecated?
>> >
>> > Will we get an alternative for a character array or mutable unicode
>> > string?
>>
>>
>> Good question.
>>
>> Of the three main encodings for Unicode, two are variable-width:
>>
>> * UTF-8 uses 1-4 bytes per character
>> * UTF-16 uses 2 or 4 bytes per character
>>
>> while UTF-32 is fixed-width (4 bytes per character). So you could try
>> faking it with a 32-bit array and filling it with
>> string.encode('utf-32').
>
>
> I guess that doesn't work. I need to have something that I can pass to the
> re module for searching through it. Creating new strings all the time is
> no option. (Think about gigabyte strings.)
Can you expand a bit on how array("u") helps here? Are the matches in the
gigabyte range?
[toc] | [prev] | [next] | [standalone]
| From | jonathan.slenders@gmail.com |
|---|---|
| Date | 2015-05-08 05:12 -0700 |
| Message-ID | <7c33ef30-5923-49eb-933c-bfb568cb4753@googlegroups.com> |
| In reply to | #90148 |
> Can you expand a bit on how array("u") helps here? Are the matches in the
> gigabyte range?
I have a string of unicode characters, e.g.:
data = array.array('u', u'x' * 1000000000)
Then I need to change some data in the middle of this string, for instance:
data[500000] = 'y'
Then I want to use re to search in this text:
re.search('y', data)
This has to be fast. I really don't want to split and concatenate strings. Re should be able to process it and the expressions can be much more complex than this. (I think it should be anything that implements the buffer protocol).
So, this works perfectly fine and fast. But it scares me that it's deprecated and Python 4 will not support it anymore.
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2015-05-08 15:11 +0200 |
| Message-ID | <mailman.249.1431090704.12865.python-list@python.org> |
| In reply to | #90156 |
jonathan.slenders@gmail.com wrote:
>> Can you expand a bit on how array("u") helps here? Are the matches in the
>> gigabyte range?
>
> I have a string of unicode characters, e.g.:
>
> data = array.array('u', u'x' * 1000000000)
>
> Then I need to change some data in the middle of this string, for
> instance:
>
> data[500000] = 'y'
>
> Then I want to use re to search in this text:
>
> re.search('y', data)
>
> This has to be fast. I really don't want to split and concatenate strings.
> Re should be able to process it and the expressions can be much more
> complex than this. (I think it should be anything that implements the
> buffer protocol).
>
> So, this works perfectly fine and fast. But it scares me that it's
> deprecated and Python 4 will not support it anymore.
Hm, this doesn't even work with Python 3:
>>> data = array.array("u", u"x"*1000)
>>> data[100] = "y"
>>> re.search("y", data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/re.py", line 166, in search
return _compile(pattern, flags).search(string)
TypeError: can't use a string pattern on a bytes-like object
You can search for bytes
>>> re.search(b"y", data)
<_sre.SRE_Match object; span=(400, 401), match=b'y'>
>>> data[101] = "z"
>>> re.search(b"y", data)
<_sre.SRE_Match object; span=(400, 401), match=b'y'>
>>> re.search(b"yz", data)
>>> re.search(b"y\0\0\0z", data)
<_sre.SRE_Match object; span=(400, 405), match=b'y\x00\x00\x00z'>
but if that is good enough you can use a bytearray in the first place.
[toc] | [prev] | [next] | [standalone]
| From | jonathan.slenders@gmail.com |
|---|---|
| Date | 2015-05-08 07:40 -0700 |
| Message-ID | <c603bf41-a9e6-4b28-ab71-1d3ac127c3ec@googlegroups.com> |
| In reply to | #90167 |
Le vendredi 8 mai 2015 15:11:56 UTC+2, Peter Otten a écrit :
> > So, this works perfectly fine and fast. But it scares me that it's
> > deprecated and Python 4 will not support it anymore.
>
> Hm, this doesn't even work with Python 3:
My mistake. I should have tested better.
> >>> data = array.array("u", u"x"*1000)
> >>> data[100] = "y"
> >>> re.search("y", data)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/usr/lib/python3.4/re.py", line 166, in search
> return _compile(pattern, flags).search(string)
> TypeError: can't use a string pattern on a bytes-like object
>
> You can search for bytes
>
> >>> re.search(b"y", data)
> <_sre.SRE_Match object; span=(400, 401), match=b'y'>
> >>> data[101] = "z"
> >>> re.search(b"y", data)
> <_sre.SRE_Match object; span=(400, 401), match=b'y'>
> >>> re.search(b"yz", data)
> >>> re.search(b"y\0\0\0z", data)
> <_sre.SRE_Match object; span=(400, 405), match=b'y\x00\x00\x00z'>
>
> but if that is good enough you can use a bytearray in the first place.
Maybe I'll try that. Thanks for the suggestions!
Jonathan
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2015-05-11 08:03 +0100 |
| Message-ID | <mailman.344.1431327837.12865.python-list@python.org> |
| In reply to | #90171 |
On 08/05/2015 15:40, jonathan.slenders@gmail.com wrote:
> Le vendredi 8 mai 2015 15:11:56 UTC+2, Peter Otten a écrit :
>>> So, this works perfectly fine and fast. But it scares me that it's
>>> deprecated and Python 4 will not support it anymore.
>>
>> Hm, this doesn't even work with Python 3:
>
> My mistake. I should have tested better.
>
>>>>> data = array.array("u", u"x"*1000)
>>>>> data[100] = "y"
>>>>> re.search("y", data)
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> File "/usr/lib/python3.4/re.py", line 166, in search
>> return _compile(pattern, flags).search(string)
>> TypeError: can't use a string pattern on a bytes-like object
>>
>> You can search for bytes
>>
>>>>> re.search(b"y", data)
>> <_sre.SRE_Match object; span=(400, 401), match=b'y'>
>>>>> data[101] = "z"
>>>>> re.search(b"y", data)
>> <_sre.SRE_Match object; span=(400, 401), match=b'y'>
>>>>> re.search(b"yz", data)
>>>>> re.search(b"y\0\0\0z", data)
>> <_sre.SRE_Match object; span=(400, 405), match=b'y\x00\x00\x00z'>
>>
>> but if that is good enough you can use a bytearray in the first place.
>
> Maybe I'll try that. Thanks for the suggestions!
>
> Jonathan
>
http://sourceforge.net/projects/pyropes/ of any use to you?
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.
Mark Lawrence
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web