Path: csiph.com!eternal-september.org!feeder.eternal-september.org!mx02.eternal-september.org!.POSTED!not-for-mail From: Jussi Piitulainen Newsgroups: comp.lang.python Subject: Re: Newbie: Check first two non-whitespace characters Date: Fri, 01 Jan 2016 10:16:29 +0200 Organization: A noiseless patient Spider Lines: 40 Message-ID: References: <240ab049-68a0-4a00-b911-d58cae9bfbcf@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: mx02.eternal-september.org; posting-host="305c68510616a2e7ac08bcd2ff1598bd"; logging-data="31585"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19G8jKoamsIJrtBn34inZOB7N07/ANoeIQ=" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) Cancel-Lock: sha1:ojibsVbexnVoIdON8fGF7OhHlK8= sha1:O+Q429r04nGYYaketTHnZtQsQT8= Xref: csiph.com comp.lang.python:101100 otaksoftspamtrap@gmail.com writes: > I need to check a string over which I have no control for the first 2 > non-white space characters (which should be '[{'). > > The string would ideally be: '[{...' but could also be something like > ' [ { ....'. > > Best to use re and how? Something else? No comment on whether re is good for your use case but another comment on how. First, some test data: >>> data = '\r\n {\r\n\t[ "etc" ]}\n\n\n') Then the actual comment - there's a special regex type, \S, to match a non-whitespace character, and a method to produce matches on demand: >>> black = re.compile(r'\S') >>> matches = re.finditer(black, data) Then the demonstration. This accesses the first, then second, then third match: >>> empty = re.match('', '') >>> next(matches, empty).group() '{' >>> next(matches, empty).group() '[' >>> next(matches, empty).group() '"' The empty match object provides an appropriate .group() when there is no first or second (and so on) non-whitespace character in the data: >>> matches = re.finditer(black, '\r\t\n') >>> next(matches, empty).group() '' >>> next(matches, empty).group() ''