Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #54942 > unrolled thread

Re: Weird bahaviour from shlex - line no

Started byPeter Otten <__peter__@web.de>
First post2013-09-28 16:52 +0200
Last post2013-09-28 16:59 -0400
Articles 2 — 2 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Weird bahaviour from shlex - line no Peter Otten <__peter__@web.de> - 2013-09-28 16:52 +0200
    Re: Weird bahaviour from shlex - line no Piet van Oostrum <piet@vanoostrum.org> - 2013-09-28 16:59 -0400

#54942 — Re: Weird bahaviour from shlex - line no

FromPeter Otten <__peter__@web.de>
Date2013-09-28 16:52 +0200
SubjectRe: Weird bahaviour from shlex - line no
Message-ID<mailman.412.1380379896.18130.python-list@python.org>
Dave Angel wrote:

> On 28/9/2013 02:26, Daniel Stojanov wrote:
> 
>> Can somebody explain this. The line number reported by shlex depends
>> on the previous token. I want to be able to tell if I have just popped
>> the last token on a line.
>>
> 
> I agree that it seems weird.  However, I don't think you have made
> clear why it's not what you (and I) expect.
> 
> import shlex
> 
> def parseit(string):
>     print
>     print "Parsing -", string
>     first = shlex.shlex(string)
>     token = "dummy"
>     while token:
>         token = first.get_token()
>         print token, " -- line", first.lineno
> 
> parseit("word1 word2\nword3")     #first
> parseit("word1 word2,\nword3")    #second
> parseit("word1 word2,word3\nword4")
> parseit("word1 word2+,?\nword3")
> 
> This will display the lineno attribute for every token.
> 
> shlex is documented at:
> 
> http://docs.python.org/2/library/shlex.html
> 
> And lineno is documented on that page as:
> 
> """shlex.lineno
> Source line number (count of newlines seen so far plus one).
> """
> 
> It's not at all clear what "seen so far" is intended to mean, but in
> practice, the line number is incremented for the last token on the
> line. Thus your first example
> 
> Parsing - word1 word2
> word3
> word1  -- line 1
> word2  -- line 2
> word3  -- line 2
>   -- line 2
> 
> word2 has the incremented line number.
> 
> But when the token is neither whitespace nor ASCII letters, then it
> doesn't increment lineno.  Thus second example:
> 
> Parsing - word1 word2,
> word3
> word1  -- line 1
> word2  -- line 1
> ,  -- line 1                      #we would expect this to be "line 2"
> word3 -- line 2 -- line 2
> 
> Anybody else have some explanation 

The explanation seems obvious: a word may be continued by the next character 
if that is in wordchars, so the parser has to look at that character. If it 
happens to be '\n' the lineno is immediately incremented. Non-wordchars are 
returned as single characters, so there is no need to peek ahead and the 
lineno is not altered.

In short: this looks like an implementation accident. 

OP: I don't see a usecase for the current behaviour -- I suggest that you 
file a bug report.

> or advice for Daniel, other than
> preprocessing the string by stripping any non letters off the end of the
> line?

The following gives the tokens' starting line for your examples

def shlexiter(s):
    p = shlex.shlex(s)
    p.whitespace = p.whitespace.replace("\n", "")
    while True:
        lineno = p.lineno
        token = p.get_token()
        if not token:
            break
        if token == "\n":
            continue
        yield lineno, token

def parseit(string):
    print("Parsing - {!r}".format(string))
    for lineno, token in shlexiter(string):
        print("{:3} {!r}".format(lineno, token))
    print("")

but I have no idea about the implications for more complex input.

[toc] | [next] | [standalone]


#54978

FromPiet van Oostrum <piet@vanoostrum.org>
Date2013-09-28 16:59 -0400
Message-ID<m2eh887jch.fsf@cochabamba.vanoostrum.org>
In reply to#54942
Peter Otten <__peter__@web.de> writes:

> Dave Angel wrote:
>
>> On 28/9/2013 02:26, Daniel Stojanov wrote:
>> 
>>> Can somebody explain this. The line number reported by shlex depends
>>> on the previous token. I want to be able to tell if I have just popped
>>> the last token on a line.
[...]
> The explanation seems obvious: a word may be continued by the next character 
> if that is in wordchars, so the parser has to look at that character. If it 
> happens to be '\n' the lineno is immediately incremented. Non-wordchars are 
> returned as single characters, so there is no need to peek ahead and the 
> lineno is not altered.
>
> In short: this looks like an implementation accident. 

I think shlex should be changed to give the line number of the start of
the token in self.lineno. It isn't hard.
-- 
Piet van Oostrum <piet@vanoostrum.org>
WWW: http://pietvanoostrum.com/
PGP key: [8DAE142BE17999C4]

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web