Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #37571 > unrolled thread
| Started by | Chris Angelico <rosuav@gmail.com> |
|---|---|
| First post | 2013-01-24 22:35 +1100 |
| Last post | 2013-01-25 12:07 +1100 |
| Articles | 3 — 2 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: using split for a string : error Chris Angelico <rosuav@gmail.com> - 2013-01-24 22:35 +1100
Re: using split for a string : error Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-25 11:20 +1100
Re: using split for a string : error Chris Angelico <rosuav@gmail.com> - 2013-01-25 12:07 +1100
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-01-24 22:35 +1100 |
| Subject | Re: using split for a string : error |
| Message-ID | <mailman.969.1359027330.2939.python-list@python.org> |
On Thu, Jan 24, 2013 at 10:16 PM, Tobias M. <tm@tobix.eu> wrote:
> Chris Angelico wrote:
>> The other thing you may want to consider, if the values are supposed
>> to be integers, is to convert them to Python integers before
>> comparing.
>
> I thought of this too and I wonder if there are any major differences
> regarding performance compared to using the strip() method when parsing
> large files.
>
> In addition I guess one should catch the ValueError that might be raised by
> the cast if there is something else than a number in the file.
I'd not consider the performance, but the correctness. If you're
expecting them to be integers, just cast them, and specifically
_don't_ catch ValueError. Any non-integer value will then noisily
abort the script. (It may be worth checking for blank first, though,
depending on the data origin.)
It's usually fine to have int() complain about any non-numerics in the
string, but I must confess, I do sometimes yearn for atoi() semantics:
atoi("123asd") == 123, and atoi("qqq") == 0. I've not seen a
convenient Python function for doing that. Usually it involves
manually getting the digits off the front. All I want is to suppress
the error on finding a non-digit. Oh well.
ChrisA
[toc] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-01-25 11:20 +1100 |
| Message-ID | <5101cfdb$0$29980$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #37571 |
Chris Angelico wrote:
> It's usually fine to have int() complain about any non-numerics in the
> string, but I must confess, I do sometimes yearn for atoi() semantics:
> atoi("123asd") == 123, and atoi("qqq") == 0. I've not seen a
> convenient Python function for doing that. Usually it involves
> manually getting the digits off the front. All I want is to suppress
> the error on finding a non-digit. Oh well.
It's easy enough to write your own. All you need do is decide what you
mean by "suppress the error on finding a non-digit".
Should atoi("123xyz456") return 123 or 123456?
Should atoi("xyz123") return 0 or 123?
And here's a good one:
Should atoi("1OOl") return 1, 100, or 1001?
That last is a serious suggestion by the way. There are still many people
who do not distinguish between 1 and l or 0 and O.
Actually I lied. It's not that easy. Consider:
py> s = '໑໒໙'
py> int(s)
129
Actually I lied again. It's not that hard:
def atoi(s):
from unicodedata import digit
i = 0
for c in s:
i *= 10
i += digit(c, 0)
return i
Variations that stop on the first non-digit, instead of treating them as
zero, are not much more difficult.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-01-25 12:07 +1100 |
| Message-ID | <mailman.1025.1359076064.2939.python-list@python.org> |
| In reply to | #37642 |
On Fri, Jan 25, 2013 at 11:20 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> Chris Angelico wrote:
>
>> It's usually fine to have int() complain about any non-numerics in the
>> string, but I must confess, I do sometimes yearn for atoi() semantics:
>> atoi("123asd") == 123, and atoi("qqq") == 0. I've not seen a
>> convenient Python function for doing that. Usually it involves
>> manually getting the digits off the front. All I want is to suppress
>> the error on finding a non-digit. Oh well.
>
> It's easy enough to write your own. All you need do is decide what you
> mean by "suppress the error on finding a non-digit".
>
> Should atoi("123xyz456") return 123 or 123456?
>
> Should atoi("xyz123") return 0 or 123?
>
> And here's a good one:
>
> Should atoi("1OOl") return 1, 100, or 1001?
123, 0, and 1. That's standard atoi semantics.
> That last is a serious suggestion by the way. There are still many people
> who do not distinguish between 1 and l or 0 and O.
Sure. But I'm not trying to cater to people who get it wrong; that's a
job for a DWIM.
> def atoi(s):
> from unicodedata import digit
> i = 0
> for c in s:
> i *= 10
> i += digit(c, 0)
> return i
>
> Variations that stop on the first non-digit, instead of treating them as
> zero, are not much more difficult.
And yes, I'm fully aware that I can roll my own. Here's a shorter
version (ASCII digits only, feel free to expand to Unicode), not
necessarily better:
def atoi(s):
return int("0"+s[:-len(s.lstrip("0123456789"))])
It just seems silly that this should have to be done separately, when
it's really just a tweak to the usual string-to-int conversion: when
you come to a non-digit, take one of three options (throw error, skip,
or terminate).
Anyway, not a big deal.
ChrisA
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web