Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #90130 > unrolled thread

Why is array.array('u') deprecated?

Started byjonathan.slenders@gmail.com
First post2015-05-08 02:14 -0700
Last post2015-05-11 08:03 +0100
Articles 9 — 5 participants

Back to article view | Back to comp.lang.python


Contents

  Why is array.array('u') deprecated? jonathan.slenders@gmail.com - 2015-05-08 02:14 -0700
    Re: Why is array.array('u') deprecated? wxjmfauth@gmail.com - 2015-05-08 02:37 -0700
    Re: Why is array.array('u') deprecated? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-08 20:29 +1000
      Re: Why is array.array('u') deprecated? jonathan.slenders@gmail.com - 2015-05-08 04:05 -0700
        Re: Why is array.array('u') deprecated? Peter Otten <__peter__@web.de> - 2015-05-08 13:35 +0200
          Re: Why is array.array('u') deprecated? jonathan.slenders@gmail.com - 2015-05-08 05:12 -0700
            Re: Why is array.array('u') deprecated? Peter Otten <__peter__@web.de> - 2015-05-08 15:11 +0200
              Re: Why is array.array('u') deprecated? jonathan.slenders@gmail.com - 2015-05-08 07:40 -0700
                Re: Why is array.array('u') deprecated? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-05-11 08:03 +0100

#90130 — Why is array.array('u') deprecated?

Fromjonathan.slenders@gmail.com
Date2015-05-08 02:14 -0700
SubjectWhy is array.array('u') deprecated?
Message-ID<80167ebf-c5ee-4699-89ca-ef544fe3decc@googlegroups.com>
Why is array.array('u') deprecated?

Will we get an alternative for a character array or mutable unicode string?

Thanks!
Jonathan

[toc] | [next] | [standalone]


#90131

Fromwxjmfauth@gmail.com
Date2015-05-08 02:37 -0700
Message-ID<9991c4a3-43e4-4d03-ae44-f556b7b147e5@googlegroups.com>
In reply to#90130
Le vendredi 8 mai 2015 11:15:08 UTC+2, jonathan...@gmail.com a écrit :
> Why is array.array('u') deprecated?
> 

In order to make Python a non unicode compliant [*]
(Unicode.org) tool.

[*] and buggy!

jmf

[toc] | [prev] | [next] | [standalone]


#90137

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2015-05-08 20:29 +1000
Message-ID<554c8ff2$0$12990$c3e8da3$5496439d@news.astraweb.com>
In reply to#90130
On Fri, 8 May 2015 07:14 pm, jonathan.slenders@gmail.com wrote:

> Why is array.array('u') deprecated?
> 
> Will we get an alternative for a character array or mutable unicode
> string?


Good question.

Of the three main encodings for Unicode, two are variable-width: 

* UTF-8 uses 1-4 bytes per character 
* UTF-16 uses 2 or 4 bytes per character

while UTF-32 is fixed-width (4 bytes per character). So you could try faking
it with a 32-bit array and filling it with string.encode('utf-32').



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#90143

Fromjonathan.slenders@gmail.com
Date2015-05-08 04:05 -0700
Message-ID<93aaa9cd-5d30-44db-9324-4fffe292e9e4@googlegroups.com>
In reply to#90137
Le vendredi 8 mai 2015 12:29:15 UTC+2, Steven D'Aprano a écrit :
> On Fri, 8 May 2015 07:14 pm, jonathan.slenders wrote:
> 
> > Why is array.array('u') deprecated?
> > 
> > Will we get an alternative for a character array or mutable unicode
> > string?
> 
> 
> Good question.
> 
> Of the three main encodings for Unicode, two are variable-width: 
> 
> * UTF-8 uses 1-4 bytes per character 
> * UTF-16 uses 2 or 4 bytes per character
> 
> while UTF-32 is fixed-width (4 bytes per character). So you could try faking
> it with a 32-bit array and filling it with string.encode('utf-32').


I guess that doesn't work. I need to have something that I can pass to the re module for searching through it. Creating new strings all the time is no option. (Think about gigabyte strings.)

[toc] | [prev] | [next] | [standalone]


#90148

FromPeter Otten <__peter__@web.de>
Date2015-05-08 13:35 +0200
Message-ID<mailman.237.1431084939.12865.python-list@python.org>
In reply to#90143
jonathan.slenders@gmail.com wrote:

> Le vendredi 8 mai 2015 12:29:15 UTC+2, Steven D'Aprano a écrit :
>> On Fri, 8 May 2015 07:14 pm, jonathan.slenders wrote:
>> 
>> > Why is array.array('u') deprecated?
>> > 
>> > Will we get an alternative for a character array or mutable unicode
>> > string?
>> 
>> 
>> Good question.
>> 
>> Of the three main encodings for Unicode, two are variable-width:
>> 
>> * UTF-8 uses 1-4 bytes per character
>> * UTF-16 uses 2 or 4 bytes per character
>> 
>> while UTF-32 is fixed-width (4 bytes per character). So you could try
>> faking it with a 32-bit array and filling it with
>> string.encode('utf-32').
> 
> 
> I guess that doesn't work. I need to have something that I can pass to the
> re module for searching through it. Creating new strings all the time is
> no option. (Think about gigabyte strings.)

Can you expand a bit on how array("u") helps here? Are the matches in the 
gigabyte range?

[toc] | [prev] | [next] | [standalone]


#90156

Fromjonathan.slenders@gmail.com
Date2015-05-08 05:12 -0700
Message-ID<7c33ef30-5923-49eb-933c-bfb568cb4753@googlegroups.com>
In reply to#90148
> Can you expand a bit on how array("u") helps here? Are the matches in the 
> gigabyte range?

I have a string of unicode characters, e.g.:

data = array.array('u', u'x' * 1000000000)

Then I need to change some data in the middle of this string, for instance:

data[500000] = 'y'

Then I want to use re to search in this text:

re.search('y', data)

This has to be fast. I really don't want to split and concatenate strings. Re should be able to process it and the expressions can be much more complex than this. (I think it should be anything that implements the buffer protocol).

So, this works perfectly fine and fast. But it scares me that it's deprecated and Python 4 will not support it anymore.

[toc] | [prev] | [next] | [standalone]


#90167

FromPeter Otten <__peter__@web.de>
Date2015-05-08 15:11 +0200
Message-ID<mailman.249.1431090704.12865.python-list@python.org>
In reply to#90156
jonathan.slenders@gmail.com wrote:

>> Can you expand a bit on how array("u") helps here? Are the matches in the
>> gigabyte range?
> 
> I have a string of unicode characters, e.g.:
> 
> data = array.array('u', u'x' * 1000000000)
> 
> Then I need to change some data in the middle of this string, for
> instance:
> 
> data[500000] = 'y'
> 
> Then I want to use re to search in this text:
> 
> re.search('y', data)
> 
> This has to be fast. I really don't want to split and concatenate strings.
> Re should be able to process it and the expressions can be much more
> complex than this. (I think it should be anything that implements the
> buffer protocol).
> 
> So, this works perfectly fine and fast. But it scares me that it's
> deprecated and Python 4 will not support it anymore.

Hm, this doesn't even work with Python 3:

>>> data = array.array("u", u"x"*1000)
>>> data[100] = "y"
>>> re.search("y", data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.4/re.py", line 166, in search
    return _compile(pattern, flags).search(string)
TypeError: can't use a string pattern on a bytes-like object

You can search for bytes

>>> re.search(b"y", data)
<_sre.SRE_Match object; span=(400, 401), match=b'y'>
>>> data[101] = "z"
>>> re.search(b"y", data)
<_sre.SRE_Match object; span=(400, 401), match=b'y'>
>>> re.search(b"yz", data)
>>> re.search(b"y\0\0\0z", data)
<_sre.SRE_Match object; span=(400, 405), match=b'y\x00\x00\x00z'>

but if that is good enough you can use a bytearray in the first place.

[toc] | [prev] | [next] | [standalone]


#90171

Fromjonathan.slenders@gmail.com
Date2015-05-08 07:40 -0700
Message-ID<c603bf41-a9e6-4b28-ab71-1d3ac127c3ec@googlegroups.com>
In reply to#90167
Le vendredi 8 mai 2015 15:11:56 UTC+2, Peter Otten a écrit :
> > So, this works perfectly fine and fast. But it scares me that it's
> > deprecated and Python 4 will not support it anymore.
> 
> Hm, this doesn't even work with Python 3:

My mistake. I should have tested better.

> >>> data = array.array("u", u"x"*1000)
> >>> data[100] = "y"
> >>> re.search("y", data)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/lib/python3.4/re.py", line 166, in search
>     return _compile(pattern, flags).search(string)
> TypeError: can't use a string pattern on a bytes-like object
> 
> You can search for bytes
> 
> >>> re.search(b"y", data)
> <_sre.SRE_Match object; span=(400, 401), match=b'y'>
> >>> data[101] = "z"
> >>> re.search(b"y", data)
> <_sre.SRE_Match object; span=(400, 401), match=b'y'>
> >>> re.search(b"yz", data)
> >>> re.search(b"y\0\0\0z", data)
> <_sre.SRE_Match object; span=(400, 405), match=b'y\x00\x00\x00z'>
> 
> but if that is good enough you can use a bytearray in the first place.

Maybe I'll try that. Thanks for the suggestions!

Jonathan

[toc] | [prev] | [next] | [standalone]


#90352

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2015-05-11 08:03 +0100
Message-ID<mailman.344.1431327837.12865.python-list@python.org>
In reply to#90171
On 08/05/2015 15:40, jonathan.slenders@gmail.com wrote:
> Le vendredi 8 mai 2015 15:11:56 UTC+2, Peter Otten a écrit :
>>> So, this works perfectly fine and fast. But it scares me that it's
>>> deprecated and Python 4 will not support it anymore.
>>
>> Hm, this doesn't even work with Python 3:
>
> My mistake. I should have tested better.
>
>>>>> data = array.array("u", u"x"*1000)
>>>>> data[100] = "y"
>>>>> re.search("y", data)
>> Traceback (most recent call last):
>>    File "<stdin>", line 1, in <module>
>>    File "/usr/lib/python3.4/re.py", line 166, in search
>>      return _compile(pattern, flags).search(string)
>> TypeError: can't use a string pattern on a bytes-like object
>>
>> You can search for bytes
>>
>>>>> re.search(b"y", data)
>> <_sre.SRE_Match object; span=(400, 401), match=b'y'>
>>>>> data[101] = "z"
>>>>> re.search(b"y", data)
>> <_sre.SRE_Match object; span=(400, 401), match=b'y'>
>>>>> re.search(b"yz", data)
>>>>> re.search(b"y\0\0\0z", data)
>> <_sre.SRE_Match object; span=(400, 405), match=b'y\x00\x00\x00z'>
>>
>> but if that is good enough you can use a bytearray in the first place.
>
> Maybe I'll try that. Thanks for the suggestions!
>
> Jonathan
>

http://sourceforge.net/projects/pyropes/ of any use to you?

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web