Path: csiph.com!feeder.erje.net!2.eu.feeder.erje.net!newsreader4.netcologne.de!news.netcologne.de!newsfeed0.kamp.net!newsfeed.kamp.net!fu-berlin.de!uni-berlin.de!not-for-mail From: Random832 Newsgroups: comp.lang.python Subject: Re: Whittle it on down Date: Thu, 05 May 2016 14:52:05 -0400 Lines: 16 Message-ID: References: <572ae25f$0$2821$c3e8da3$76491128@news.astraweb.com> <1462430766.25079.598726825.1B90C7A1@webmail.messagingengine.com> <572af811$0$1608$c3e8da3$5496439d@news.astraweb.com> <1462454499.2962191.598999745.40BB8A1E@webmail.messagingengine.com> <572b8aee$0$1589$c3e8da3$5496439d@news.astraweb.com> <1462474325.3044031.599340073.61F19F49@webmail.messagingengine.com> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Trace: news.uni-berlin.de AAuLHWtlHtz4Cqojgtcy0wN6md5me582ZSwuar6lRVeQ== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.004 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'api': 0.09; 'length.': 0.09; 'received:internal': 0.09; 'file,': 0.15; 'properly': 0.15; 'thu,': 0.15; 'anchors': 0.16; 'input.': 0.16; 'make,': 0.16; 'message-id:@webmail.messagingengine.com': 0.16; 'received:10.202': 0.16; 'received:10.202.2': 0.16; 'received:66.111': 0.16; 'received:66.111.4': 0.16; 'received:io': 0.16; 'received:messagingengine.com': 0.16; 'received:psf.io': 0.16; 'scrape': 0.16; 'wrote:': 0.16; 'string': 0.17; 'string,': 0.18; 'input': 0.18; 'prevent': 0.20; 'matching': 0.23; 'header :In-Reply-To:1': 0.24; "doesn't": 0.26; 'point.': 0.27; 'accidentally': 0.29; 'concern': 0.29; 'implicitly': 0.29; "d'aprano": 0.33; 'steven': 0.33; 'file': 0.34; 'could': 0.35; 'but': 0.36; 'possible': 0.36; 'beginning': 0.36; 'to:addr:python- list': 0.36; 'subject:: ': 0.37; 'received:10': 0.37; 'itself': 0.38; 'received:66': 0.38; 'wrong': 0.38; 'end': 0.39; 'data': 0.39; 'easily': 0.39; 'to:addr:python.org': 0.40; 'easy': 0.60; 'your': 0.60; 'header:Message-Id:1': 0.61; 'subject:down': 0.84; 'mistake': 0.91; 'imagine': 0.96 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=fastmail.com; h= content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-sasl-enc :x-sasl-enc; s=mesmtp; bh=aWC7p091JnaEVzIMF5NPszqn5GA=; b=pvABmK JT0JeiBlo5HKx5xmebCoBq8ebWEI/sB2E9G60v67+HD5jcqURvK7qmsEHSze79JW CfEfFtMrrAnhvjUW8z1stl9vrBJO+ZO+PkSu1cCCo8sH/9vxOwsJvJxjUTAe7YVD xZwWzy5NcAw85BZmDjnz+MWYLYloTWt427iZk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-sasl-enc:x-sasl-enc; s=smtpout; bh=aWC7p091JnaEVzI MF5NPszqn5GA=; b=rws7egRrz1cXGfMPMN6WtQCshDUQQ4lR7VLDI7phUvIaEeK Iz3JjACUCUasjXS2SGc9Qay6YFG3gv+x70Wie7QX78RgVn0dRyf9LSZugJbNUnhf /0S5NOWk5VGe+POTE3Bua/Bz3OhzKNI3tqwl/lkIf6kzy4ydSy2RhDdb/KPc= X-Sasl-Enc: rUbdONK55iW8g/y+gRXTRm7WxSzp0+fwMiBHSjBhsxS+ 1462474325 X-Mailer: MessagingEngine.com Webmail Interface - ajax-140377c4 In-Reply-To: <572b8aee$0$1589$c3e8da3$5496439d@news.astraweb.com> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: <1462474325.3044031.599340073.61F19F49@webmail.messagingengine.com> X-Mailman-Original-References: <572ae25f$0$2821$c3e8da3$76491128@news.astraweb.com> <1462430766.25079.598726825.1B90C7A1@webmail.messagingengine.com> <572af811$0$1608$c3e8da3$5496439d@news.astraweb.com> <1462454499.2962191.598999745.40BB8A1E@webmail.messagingengine.com> <572b8aee$0$1589$c3e8da3$5496439d@news.astraweb.com> Xref: csiph.com comp.lang.python:108190 On Thu, May 5, 2016, at 14:03, Steven D'Aprano wrote: > You failed to anchor the string at the beginning and end of the string, > an easy mistake to make, but that's the point. I don't think anchoring is properly a concern of the regex itself - .match is anchored implicitly at the beginning, and one could easily imagine an API that implicitly anchors at the end - or you can simply check that the match length == the string length. > - Data validity doesn't matter, because there's no possible way that you > might accidentally scrape data from the wrong part of a HTML file and end > up with junk input. If you've scraped data from the wrong part of the file, then nothing you do to your regex can prevent the junk input from coincidentally matching the input format.