Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!multikabel.net!newsfeed20.multikabel.net!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'string.': 0.04; 'true,': 0.04; 'arguments': 0.07; 'interpreted': 0.07; '-1;': 0.09; 'be:': 0.09; 'counting': 0.09; 'failure.': 0.09; 'notation.': 0.09; 'str,': 0.09; 'sub': 0.09; 'substring': 0.09; 'unexpected': 0.09; 'index': 0.13; "(i'm": 0.16; 'adjusted': 0.16; 'doc,': 0.16; 'from:addr:mrabarnett.plus.com': 0.16; 'from:addr:python': 0.16; 'from:name:mrab': 0.16; 'index.': 0.16; 'message- id:@mrabarnett.plus.com': 0.16; 'py_ssize_t': 0.16; 'subject: \n ': 0.16; 'subject:parameters': 0.16; 'subject:when': 0.16; 'true:': 0.16; 'wed,': 0.16; 'string': 0.17; 'wrote:': 0.17; 'found,': 0.17; 'string,': 0.17; '>>>': 0.18; 'equivalent': 0.20; 'assuming': 0.22; 'smallest': 0.22; 'originally': 0.23; 'this:': 0.23; 'raise': 0.24; 'header:In-Reply-To:1': 0.25; 'header:User- Agent:1': 0.26; 'start,': 0.27; 'subject:/': 0.28; '>>>>': 0.29; 'behaviour': 0.29; 'end,': 0.29; 'optional': 0.29; 'received:192.168.1.3': 0.29; 'source': 0.29; "skip:' 10": 0.30; 'expect': 0.31; 'code': 0.31; 'generally': 0.32; 'int': 0.33; 'true.': 0.33; 'to:addr:python-list': 0.33; 'consistent': 0.35; 'nov': 0.35; 'so,': 0.35; 'something': 0.35; 'there': 0.35; 'except': 0.36; 'but': 0.36; 'should': 0.36; 'skip:p 20': 0.36; 'why': 0.37; 'subject:: ': 0.38; 'to:addr:python.org': 0.39; 'received:192': 0.39; 'where': 0.40; 'skip:" 10': 0.40; 'received:192.168': 0.40; 'end': 0.40; 'within': 0.64; 'here': 0.65; 'header:Reply-To:1': 0.68; 'believe': 0.69; 'reply-to:no real name:2**0': 0.72; 'reply-to:addr:python.org': 0.84 X-CM-Score: 0.00 X-CNFS-Analysis: v=2.0 cv=TdYURGsh c=1 sm=1 a=0nF1XD0wxitMEM03M9B4ZQ==:17 a=O2Kvzccb_dQA:10 a=fHJ554mUI9MA:10 a=ihvODaAuJD4A:10 a=OUOv7kDek9cA:10 a=8nJEP1OIZ-IA:10 a=EBOSESyhAAAA:8 a=8AHkEIZyAAAA:8 a=JLrdG8ww-nQA:10 a=pydednJkZSscnwvCgZQA:9 a=wPNLvfGTeEIA:10 a=0nF1XD0wxitMEM03M9B4ZQ==:117 X-AUTH: mrabarnett:2500 Date: Wed, 21 Nov 2012 20:58:46 +0000 From: MRAB User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: python-list@python.org Subject: Re: Inconsistent behaviour os str.find/str.index when providing optional parameters References: <9ecd357d-aaaa-4f4d-a987-a478e92b2052@googlegroups.com> <50ad2a95$0$6907$e4fe514c@news2.news.xs4all.nl> In-Reply-To: <50ad2a95$0$6907$e4fe514c@news2.news.xs4all.nl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: python-list@python.org List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 99 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1353531533 news.xs4all.nl 6896 [2001:888:2000:d::a6]:57110 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:33770 On 2012-11-21 19:25, Hans Mulder wrote: > On 21/11/12 17:59:05, Alister wrote: >> On Wed, 21 Nov 2012 04:43:57 -0800, Giacomo Alzetta wrote: >> >>> I just came across this: >>> >>>>>> 'spam'.find('', 5) >>> -1 >>> >>> >>> Now, reading find's documentation: >>> >>>>>> print(str.find.__doc__) >>> S.find(sub [,start [,end]]) -> int >>> >>> Return the lowest index in S where substring sub is found, >>> such that sub is contained within S[start:end]. Optional arguments >>> start and end are interpreted as in slice notation. >>> >>> Return -1 on failure. >>> >>> Now, the empty string is a substring of every string so how can find >>> fail? >>> find, from the doc, should be generally be equivalent to >>> S[start:end].find(substring) + start, except if the substring is not >>> found but since the empty string is a substring of the empty string it >>> should never fail. >>> >>> Looking at the source code for find(in stringlib/find.h): >>> >>> Py_LOCAL_INLINE(Py_ssize_t) >>> stringlib_find(const STRINGLIB_CHAR* str, Py_ssize_t str_len, >>> const STRINGLIB_CHAR* sub, Py_ssize_t sub_len, >>> Py_ssize_t offset) >>> { >>> Py_ssize_t pos; >>> >>> if (str_len < 0) >>> return -1; >>> >>> I believe it should be: >>> >>> if (str_len < 0) >>> return (sub_len == 0 ? 0 : -1); >>> >>> Is there any reason of having this unexpected behaviour or was this >>> simply overlooked? >> >> why would you be searching for an empty string? >> what result would you expect to get from such a search? > > > In general, if > > needle in haystack[ start: ] > > return True, then you' expect > > haystack.find(needle, start) > > to return the smallest i >= start such that > > haystack[i:i+len(needle)] == needle > > also returns True. > >>>> "" in "spam"[5:] > True >>>> "spam"[5:5+len("")] == "" > True >>>> > > So, you'd expect that spam.find("", 5) would return 5. > > The only other consistent position would be that "spam"[5:] > should raise an IndexError, because 5 is an invalid index. > > For that matter, I wouldn;t mind if "spam".find(s, 5) were > to raise an IndexError. But if slicing at position 5 > proudces an empry string, then .find should be able to > find that empty string. > You'd expect that given: found = string.find(something, start, end) if 'something' present then the following are true: 0 <= found <= len(string) start <= found <= end (I'm assuming here that 'start' and 'end' have already been adjusted for counting from the end, ie originally they might have been negative values.) The only time that you can have found == len(string) and found == end is when something == "" and start == len(string).