Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #17986 > unrolled thread

Possible bug in string handling (with kludgy work-around)

Started byCharles Hixson <charleshixsn@earthlink.net>
First post2011-12-26 14:23 -0800
Last post2011-12-27 16:38 -0500
Articles 11 — 7 participants

Back to article view | Back to comp.lang.python


Contents

  Possible bug in string handling (with kludgy work-around) Charles Hixson <charleshixsn@earthlink.net> - 2011-12-26 14:23 -0800
    Re: Possible bug in string handling (with kludgy work-around) Rick Johnson <rantingrickjohnson@gmail.com> - 2011-12-26 14:48 -0800
      Re: Possible bug in string handling (with kludgy work-around) Chris Angelico <rosuav@gmail.com> - 2011-12-27 10:05 +1100
    Re: Possible bug in string handling (with kludgy work-around) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-12-27 01:10 +0000
      Re: Possible bug in string handling (with kludgy work-around) Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2011-12-27 09:53 -0500
        Re: Possible bug in string handling (with kludgy work-around) Rick Johnson <rantingrickjohnson@gmail.com> - 2011-12-27 10:04 -0800
          Re: Possible bug in string handling (with kludgy work-around) Lie Ryan <lie.1296@gmail.com> - 2011-12-28 08:23 +1100
          Re: Possible bug in string handling (with kludgy work-around) Terry Reedy <tjreedy@udel.edu> - 2011-12-27 16:38 -0500
            Re: Possible bug in string handling (with kludgy work-around) Rick Johnson <rantingrickjohnson@gmail.com> - 2011-12-27 16:57 -0800
              Re: Possible bug in string handling (with kludgy work-around) Lie Ryan <lie.1296@gmail.com> - 2011-12-29 04:54 +1100
          Re: Possible bug in string handling (with kludgy work-around) Terry Reedy <tjreedy@udel.edu> - 2011-12-27 16:38 -0500

#17986 — Possible bug in string handling (with kludgy work-around)

FromCharles Hixson <charleshixsn@earthlink.net>
Date2011-12-26 14:23 -0800
SubjectPossible bug in string handling (with kludgy work-around)
Message-ID<mailman.4112.1324938867.27778.python-list@python.org>
This doesn't cause a crash, but rather incorrect results.

self.wordList    =    ["The", "quick", "brown", "fox", "carefully",
                 "jumps", "over", "the", "lazy", "dog", "as", "it",
                 "stealthily", "wends", "its", "way", "homewards", '\b.']
for    i    in    range (len (self.wordList) ):
    if    not isinstance(self.wordList[i], str):
        self.wordList = ""
   elif self.wordList[i] != "" and self.wordList[i][0] == "\b":
        print ("0: wordList[", i, "] = \"", self.wordList[i], "\"", sep 
= "")
        print ("0a: wordList[", i, "][1] = \"", self.wordList[i][1], 
"\"", sep = "")
        tmp    =    self.wordList[i][1]             ## !! Kludge -- 
remove tmp to see the error
        self.wordList[i]    =    tmp + self.wordList[i][1:-1]  ## !! 
Kludge -- remove tmp + to see the error
        print ("1: wordList[", i, "] = \"", self.wordList[i], "\"", sep 
= "")
        print    ("len(wordList[", i, "]) = ", len(self.wordList[i]) )

-- 
Charles Hixson

[toc] | [next] | [standalone]


#17990

FromRick Johnson <rantingrickjohnson@gmail.com>
Date2011-12-26 14:48 -0800
Message-ID<0063a1a0-0cbc-4352-90ce-13ab2eda6884@v24g2000yqk.googlegroups.com>
In reply to#17986
On Dec 26, 4:23 pm, Charles Hixson <charleshi...@earthlink.net> wrote:
> This doesn't cause a crash, but rather incorrect results.
>
> self.wordList    =    ["The", "quick", "brown", "fox", "carefully",
>                  "jumps", "over", "the", "lazy", "dog", "as", "it",
>                  "stealthily", "wends", "its", "way", "homewards", '\b.']
> for    i    in    range (len (self.wordList) ):
>     if    not isinstance(self.wordList[i], str):
>         self.wordList = ""
>    elif self.wordList[i] != "" and self.wordList[i][0] == "\b":
>         print ("0: wordList[", i, "] = \"", self.wordList[i], "\"", sep
> = "")
>         print ("0a: wordList[", i, "][1] = \"", self.wordList[i][1],
> "\"", sep = "")
>         tmp    =    self.wordList[i][1]             ## !! Kludge --
> remove tmp to see the error
>         self.wordList[i]    =    tmp + self.wordList[i][1:-1]  ## !!
> Kludge -- remove tmp + to see the error
>         print ("1: wordList[", i, "] = \"", self.wordList[i], "\"", sep
> = "")
>         print    ("len(wordList[", i, "]) = ", len(self.wordList[i]) )
>
> --
> Charles Hixson

Handy rules for reporting bugs:

1. Always format code properly.
2. Always trim excess fat from code.
3. Always include relative dependencies ("self.wordlist" is only valid
inside a class. In this case, change the code to a state that is NOT
dependent on a class definition.)

Most times after following these simple rules, you'll find egg on your
face BEFORE someone else has a chance to see it and ridicule you.

[toc] | [prev] | [next] | [standalone]


#17991

FromChris Angelico <rosuav@gmail.com>
Date2011-12-27 10:05 +1100
Message-ID<mailman.4115.1324940726.27778.python-list@python.org>
In reply to#17990
On Tue, Dec 27, 2011 at 9:48 AM, Rick Johnson
<rantingrickjohnson@gmail.com> wrote:
> Handy rules for reporting bugs:
>
> 1. Always format code properly.
> 2. Always trim excess fat from code.
> 3. Always include relative dependencies ("self.wordlist" is only valid
> inside a class. In this case, change the code to a state that is NOT
> dependent on a class definition.)
>
> Most times after following these simple rules, you'll find egg on your
> face BEFORE someone else has a chance to see it and ridicule you.

4. Don't take it personally when a known troll insults you. His
advice, in this case, is valid; but don't feel that you're going to be
ridiculed. We don't work that way on this list.

ChrisA

[toc] | [prev] | [next] | [standalone]


#18001

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2011-12-27 01:10 +0000
Message-ID<4ef91afb$0$29973$c3e8da3$5496439d@news.astraweb.com>
In reply to#17986
On Mon, 26 Dec 2011 14:23:03 -0800, Charles Hixson wrote:

> This doesn't cause a crash, but rather incorrect results.

Charles, your code is badly formatted and virtually unreadable. You have 
four spaces between some tokens, lines are too long to fit in an email or 
News post without word-wrapping. It is a mess of unidiomatic code filled 
with repeated indexing and unnecessary backslash escapes.

You also don't tell us what result you expect, or what result you 
actually get. What is the intention of the code? What are you trying to 
do, and what happens instead?

The code as given doesn't run -- what's self?

Despite all these problems, I can see one obvious problem in your code: 
you test to see if self.wordList[i] is a string, and if not, you replace 
the *entire* wordList with the empty string. That is unlikely to do what 
you want, although I admit I'm guessing what you are trying to do (since 
you don't tell us).

Some hints for you:

(1) Python has two string delimiters, " and ' and you should use them 
both. Instead of hard-to-read backslash escapes, just swap delimiters:

print "A string including a \" quote mark."  # No!
print 'A string including a " quote mark.'  # Yes, much easier to read.

The only time you should backslash-escape a quotation mark is if you need 
to include both sorts in a single string:

print "Python has both single ' and double \" quotation marks."
print 'Python has both single \' and double " quotation marks.'


(2) Python is not Pascal, or whatever language you seem to be writing in 
the style of. You almost never should write for-loops like this:


for i in range(len(something)):
    print something[i]


Instead, you should just iterate over "something" directly:


for obj in something:
    print obj


If you also need the index, use the enumerate function:


for i,obj in enumerate(something):
    print obj, i


If you are forced to use an ancient version of Python without enumerate, 
do yourself a favour and write your loops like this:


for i in range(len(something)):
    obj = something[i]
    print obj, i


instead of repeatedly indexing the list over and over and over and over 
again, as you do in your own code. The use of a temporary variable makes 
the code much easier to read and understand.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#18023

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2011-12-27 09:53 -0500
Message-ID<mailman.4133.1324997611.27778.python-list@python.org>
In reply to#18001
On 27 Dec 2011 01:10:19 GMT, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:


>The only time you should backslash-escape a quotation mark is if you need 
>to include both sorts in a single string:
>
>print "Python has both single ' and double \" quotation marks."
>print 'Python has both single \' and double " quotation marks.'
>

	You can get by without the backslash in this situation too, by using
triple quoting:

print """Python has both single ' and double " quotation marks."""
(substitute ''' for """ if it looks better to you, as long as you use
the same marker at both ends. I find """ clearer, ''' could be a " and '
packed tightly in some fonts, "', whereas """ can only be one construct)
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]


#18038

FromRick Johnson <rantingrickjohnson@gmail.com>
Date2011-12-27 10:04 -0800
Message-ID<cce5eb97-8e7f-48c6-8f0d-ec6442871196@k28g2000yqn.googlegroups.com>
In reply to#18023
--
Note: superfluous indention removed for clarity!
--

On Dec 27, 8:53 am, Dennis Lee Bieber <wlfr...@ix.netcom.com> wrote:
> You can get by without the backslash in this situation too, by using
> triple quoting:

I would not do that because:
1. Because Python already has TWO string literal delimiters (' and ")
2. Because triple quote string literals are SPECIFICALLY created to
solve the "multi-line issue"
3. Because you can confuse the hell out of someone who is reading
Python code and they may miss the true purpose of triple quotes in
Python

But this brings up a very important topic. Why do we even need triple
quote string literals to span multiple lines? Good question, and one i
have never really mused on until now. It's amazing how much BS we just
accept blindly! WE DON'T NEED TRIPLE QUOTE STRINGS! What we need is
single quote strings that span multiple lines and triple quotes then
become superfluous! For the problem of embedding quotes in string
literals, we should be using markup. A SIMPLISTIC MARKUP!

" This is a multi line
string with a single quote --> <SQ>
and a double quote --> <DQ>. Here is an
embedded newline --> <NL>. And a backspace <BS>.

Now we can dispense with all the BS!
"

>  I find """ clearer, ''' could be a " and '
> packed tightly in some fonts, "', whereas """ can only be one construct)

Another reason to ONLY use fixed width font when viewing code! Why
would you use ANY font that would obscure chars SO ubiquitous as " and
'?

[toc] | [prev] | [next] | [standalone]


#18054

FromLie Ryan <lie.1296@gmail.com>
Date2011-12-28 08:23 +1100
Message-ID<mailman.4152.1325021001.27778.python-list@python.org>
In reply to#18038
On 12/28/2011 05:04 AM, Rick Johnson wrote:
> --
> Note: superfluous indention removed for clarity!
> --
>
> On Dec 27, 8:53 am, Dennis Lee Bieber<wlfr...@ix.netcom.com>  wrote:
>> You can get by without the backslash in this situation too, by using
>> triple quoting:
>
> I would not do that because:
> 1. Because Python already has TWO string literal delimiters (' and ")
> 2. Because triple quote string literals are SPECIFICALLY created to
> solve the "multi-line issue"
> 3. Because you can confuse the hell out of someone who is reading
> Python code and they may miss the true purpose of triple quotes in
> Python
>
> But this brings up a very important topic. Why do we even need triple
> quote string literals to span multiple lines? Good question, and one i
> have never really mused on until now. It's amazing how much BS we just
> accept blindly! WE DON'T NEED TRIPLE QUOTE STRINGS! What we need is
> single quote strings that span multiple lines and triple quotes then
> become superfluous! For the problem of embedding quotes in string
> literals, we should be using markup. A SIMPLISTIC MARKUP!
>
> " This is a multi line
> string with a single quote -->  <SQ>
> and a double quote -->  <DQ>. Here is an
> embedded newline -->  <NL>. And a backspace<BS>.
>
> Now we can dispense with all the BS!
> "

Ok, you're trolling.

[toc] | [prev] | [next] | [standalone]


#18059

FromTerry Reedy <tjreedy@udel.edu>
Date2011-12-27 16:38 -0500
Message-ID<mailman.4155.1325021956.27778.python-list@python.org>
In reply to#18038
On 12/27/2011 1:04 PM, Rick Johnson wrote:

> But this brings up a very important topic. Why do we even need triple
> quote string literals to span multiple lines? Good question, and one i
> have never really mused on until now.

I have, and the reason I thought of is that people, including me, too 
ofter forget or accidentally fail to properly close a string literal, 
and type something like 'this is a fairly long single line string"
and wonder why they get a syntax error lines later, or, in interactive 
mode, why the interpreter does not respond to a newline.

Color coding editors make it easier to catch such errors, but they were 
less common in 1991. And there is still uncolored interactive mode.

There may also be a technical reason as to how the lexer works.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#18079

FromRick Johnson <rantingrickjohnson@gmail.com>
Date2011-12-27 16:57 -0800
Message-ID<b67bf3de-05ad-4aec-9e0c-63ddc398f2a1@p13g2000yqd.googlegroups.com>
In reply to#18059
On Dec 27, 3:38 pm, Terry Reedy <tjre...@udel.edu> wrote:
> On 12/27/2011 1:04 PM, Rick Johnson wrote:
>
> > But this brings up a very important topic. Why do we even need triple
> > quote string literals to span multiple lines? Good question, and one i
> > have never really mused on until now.
>
> I have, and the reason I thought of is that people, including me, too
> ofter forget or accidentally fail to properly close a string literal,

Yes, agreed.

> Color coding editors make it easier to catch such errors, but they were
> less common in 1991.

I would say the need for triple quote strings has passed long ago.
Like you say, since color lexers are ubiquitous now we don't need
them.

> And there is still uncolored interactive mode.

I don't see interactive command line programming as a problem. I mean,
who drops into a cmd line and starts writing paragraphs of string
literals? Typically, one would just make a few one-liner calls here or
there. Also, un-terminated string literal errors can be very
aggravating. Not because they are difficult to fix, no, but because
they are difficult to find! -- and sending me an error message
like...

 "Exception: Un-terminated string literal meets EOF! line: 50,466,638"

... is about as helpful as a bullet in my head!

If the interpreter finds itself at EOF BEFORE a string closes, don't
you think it would be more helpful to include the currently "opened"
strings START POSITION also? Heck, it would be wonderful to only have
the start position since the likely-hood of a string ending at EOF is
astronomical!

As an intelligent lad must know, the odds that the distance from any
given string's start position to it's end position is more likely to
be shorter than the distance from the string's beginning to the
freaking EOF! Ruby and Python are both guilty of this atrocity.

[toc] | [prev] | [next] | [standalone]


#18131

FromLie Ryan <lie.1296@gmail.com>
Date2011-12-29 04:54 +1100
Message-ID<mailman.4187.1325094872.27778.python-list@python.org>
In reply to#18079
On 12/28/2011 11:57 AM, Rick Johnson wrote:
> On Dec 27, 3:38 pm, Terry Reedy<tjre...@udel.edu>  wrote:
>> On 12/27/2011 1:04 PM, Rick Johnson wrote:
>>
>>> But this brings up a very important topic. Why do we even need triple
>>> quote string literals to span multiple lines? Good question, and one i
>>> have never really mused on until now.
>>
>> I have, and the reason I thought of is that people, including me, too
>> ofter forget or accidentally fail to properly close a string literal,
>
> Yes, agreed.
>
>> Color coding editors make it easier to catch such errors, but they were
>> less common in 1991.
>
> I would say the need for triple quote strings has passed long ago.
> Like you say, since color lexers are ubiquitous now we don't need
> them.
>
>> And there is still uncolored interactive mode.
>
> I don't see interactive command line programming as a problem. I mean,
> who drops into a cmd line and starts writing paragraphs of string
> literals? Typically, one would just make a few one-liner calls here or
> there. Also, un-terminated string literal errors can be very
> aggravating. Not because they are difficult to fix, no, but because
> they are difficult to find! -- and sending me an error message
> like...
>
>   "Exception: Un-terminated string literal meets EOF! line: 50,466,638"
>
> ... is about as helpful as a bullet in my head!
>
> If the interpreter finds itself at EOF BEFORE a string closes, don't
> you think it would be more helpful to include the currently "opened"
> strings START POSITION also?

No it wouldn't. Once you get an unterminated string literal, the string 
would terminate at the next string opening. Then it would fuck the 
parser since it will try to parse what was supposed to be a string 
literal as a code. For example:

hello = 'bar'
s = "boo, I missed a quote here
print 'hello = ', hello, "; s = ", s

the parser would misleadingly show that you have an unclosed string 
literal here:

                               vvv
print 'hello = ', hello, "; s = ", s
                               ^^^

instead of on line 2. While an experienced programmer should be able to 
figure out what's wrong, I can see a beginner programmer trying to "fix" 
the problem like this:

print 'hello = ', hello, "; s = ", s"

and then complaining that print doesn't print.

Limiting string literals to one line limits the possibility of damage to 
a single line. You will still have the same problem if you missed to 
close triple-quoted string, but since triple-quoted string are much 
rarer and they're pretty eye-catching, this sort of error harder are 
much harder.

[toc] | [prev] | [next] | [standalone]


#18060

FromTerry Reedy <tjreedy@udel.edu>
Date2011-12-27 16:38 -0500
Message-ID<cailman.4155.1325021956.27778.python-list@python.org>
In reply to#18038
On 12/27/2011 1:04 PM, Rick Johnson wrote:

> But this brings up a very important topic. Why do we even need triple
> quote string literals to span multiple lines? Good question, and one i
> have never really mused on until now.

I have, and the reason I thought of is that people, including me, too 
ofter forget or accidentally fail to properly close a string literal, 
and type something like 'this is a fairly long single line string"
and wonder why they get a syntax error lines later, or, in interactive 
mode, why the interpreter does not respond to a newline.

Color coding editors make it easier to catch such errors, but they were 
less common in 1991. And there is still uncolored interactive mode.

There may also be a technical reason as to how the lexer works.

-- 
Terry Jan Reedy

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web