Path: csiph.com!eternal-september.org!feeder.eternal-september.org!mx02.eternal-september.org!.POSTED!not-for-mail
From: Jussi Piitulainen <harvesting@is.invalid>
Newsgroups: comp.lang.python
Subject: Re: Newbie: Check first two non-whitespace characters
Date: Fri, 01 Jan 2016 10:16:29 +0200
Organization: A noiseless patient Spider
Lines: 40
Message-ID: <lf5mvsppvdu.fsf@ling.helsinki.fi>
References: <240ab049-68a0-4a00-b911-d58cae9bfbcf@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: mx02.eternal-september.org; posting-host="305c68510616a2e7ac08bcd2ff1598bd"; logging-data="31585"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19G8jKoamsIJrtBn34inZOB7N07/ANoeIQ="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux)
Cancel-Lock: sha1:ojibsVbexnVoIdON8fGF7OhHlK8= sha1:O+Q429r04nGYYaketTHnZtQsQT8=
Xref: csiph.com comp.lang.python:101100

otaksoftspamtrap@gmail.com writes:

> I need to check a string over which I have no control for the first 2
> non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like 
> '  [  {  ....'.
>
> Best to use re and how? Something else?

No comment on whether re is good for your use case but another comment
on how. First, some test data:

  >>> data = '\r\n  {\r\n\t[ "etc" ]}\n\n\n')

Then the actual comment - there's a special regex type, \S, to match a
non-whitespace character, and a method to produce matches on demand:

  >>> black = re.compile(r'\S')
  >>> matches = re.finditer(black, data)

Then the demonstration. This accesses the first, then second, then third
match:

  >>> empty = re.match('', '')
  >>> next(matches, empty).group()
  '{'
  >>> next(matches, empty).group()
  '['
  >>> next(matches, empty).group()
  '"'

The empty match object provides an appropriate .group() when there is no
first or second (and so on) non-whitespace character in the data:

  >>> matches = re.finditer(black, '\r\t\n')
  >>> next(matches, empty).group()
  ''
  >>> next(matches, empty).group()
  ''