Groups > comp.lang.python > #101070 > unrolled thread

Newbie: Check first two non-whitespace characters

Started by	otaksoftspamtrap@gmail.com
First post	2015-12-31 10:18 -0800
Last post	2016-01-01 10:43 +0100
Articles	13 — 10 participants

Back to article view | Back to comp.lang.python

  Newbie: Check first two non-whitespace characters otaksoftspamtrap@gmail.com - 2015-12-31 10:18 -0800
    Re: Newbie: Check first two non-whitespace characters MRAB <python@mrabarnett.plus.com> - 2015-12-31 18:38 +0000
    Re: Newbie: Check first two non-whitespace characters Karim <kliateni@gmail.com> - 2015-12-31 19:54 +0100
    Re: Newbie: Check first two non-whitespace characters Karim <kliateni@gmail.com> - 2015-12-31 20:05 +0100
      Re: Newbie: Check first two non-whitespace characters cassius.fechter@gmail.com - 2015-12-31 11:21 -0800
    Re: Newbie: Check first two non-whitespace characters Cory Madden <csmadden@gmail.com> - 2015-12-31 10:56 -0800
    Re: Newbie: Check first two non-whitespace characters Denis McMahon <denismfmcmahon@gmail.com> - 2015-12-31 20:35 +0000
    Re: Newbie: Check first two non-whitespace characters Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-12-31 23:25 +0000
      Re: Newbie: Check first two non-whitespace characters Steven D'Aprano <steve@pearwood.info> - 2016-01-01 12:12 +1100
    Re: Newbie: Check first two non-whitespace characters Steven D'Aprano <steve@pearwood.info> - 2016-01-01 12:23 +1100
    Re: Newbie: Check first two non-whitespace characters Random832 <random832@fastmail.com> - 2015-12-31 20:31 -0500
    Re: Newbie: Check first two non-whitespace characters Jussi Piitulainen <harvesting@is.invalid> - 2016-01-01 10:16 +0200
    Re: Newbie: Check first two non-whitespace characters Karim <kliateni@gmail.com> - 2016-01-01 10:43 +0100

#101070 — Newbie: Check first two non-whitespace characters

From	otaksoftspamtrap@gmail.com
Date	2015-12-31 10:18 -0800
Subject	Newbie: Check first two non-whitespace characters
Message-ID	<240ab049-68a0-4a00-b911-d58cae9bfbcf@googlegroups.com>

I need to check a string over which I have no control for the first 2 non-white space characters (which should be '[{').

The string would ideally be: '[{...' but could also be something like 
'  [  {  ....'.

Best to use re and how? Something else?

[toc] | [next] | [standalone]

#101071

From	MRAB <python@mrabarnett.plus.com>
Date	2015-12-31 18:38 +0000
Message-ID	<mailman.115.1451587270.11925.python-list@python.org>
In reply to	#101070

On 2015-12-31 18:18, otaksoftspamtrap@gmail.com wrote:
> I need to check a string over which I have no control for the first 2 non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like
> '  [  {  ....'.
>
> Best to use re and how? Something else?
>
I would use .split and then ''.join:

 >>> ''.join(' [ { ....'.split())
'[{....'

It might be faster if you provide a maximum for the number of splits:

 >>> ''.join(' [ { ....'.split(None, 1))
'[{ ....'

[toc] | [prev] | [next] | [standalone]

#101072

From	Karim <kliateni@gmail.com>
Date	2015-12-31 19:54 +0100
Message-ID	<mailman.116.1451588066.11925.python-list@python.org>
In reply to	#101070


On 31/12/2015 19:18, otaksoftspamtrap@gmail.com wrote:
> I need to check a string over which I have no control for the first 2 non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like
> '  [  {  ....'.
>
> Best to use re and how? Something else?

Use pyparsing it is straight forward:

 >>> from pyparsing import Suppress, restOfLine

 >>> mystring = Suppress('[') + Suppress('{') + restOfLine

 >>> result = mystring.parse(' [ { .... I am learning pyparsing' )

 >>> print result.asList()

['.... I am learning pyparsing']

You'll get your string inside the list.

Hope this help see pyparsing doc for in depth study.

Karim

[toc] | [prev] | [next] | [standalone]

#101075

From	Karim <kliateni@gmail.com>
Date	2015-12-31 20:05 +0100
Message-ID	<mailman.119.1451588711.11925.python-list@python.org>
In reply to	#101070


On 31/12/2015 19:54, Karim wrote:
>
>
> On 31/12/2015 19:18, otaksoftspamtrap@gmail.com wrote:
>> I need to check a string over which I have no control for the first 2 
>> non-white space characters (which should be '[{').
>>
>> The string would ideally be: '[{...' but could also be something like
>> '  [  {  ....'.
>>
>> Best to use re and how? Something else?
>
> Use pyparsing it is straight forward:
>
> >>> from pyparsing import Suppress, restOfLine
>
> >>> mystring = Suppress('[') + Suppress('{') + restOfLine
>
> >>> result = mystring.parse(' [ { .... I am learning pyparsing' )
>
> >>> print result.asList()
>
> ['.... I am learning pyparsing']
>
> You'll get your string inside the list.
>
> Hope this help see pyparsing doc for in depth study.
>
> Karim

Sorry the method to parse a string is parseString not parse, please 
replace by this line:

 >>> result = mystring.parseString(' [ { .... I am learning pyparsing' )

Regards

[toc] | [prev] | [next] | [standalone]

#101076

From	cassius.fechter@gmail.com
Date	2015-12-31 11:21 -0800
Message-ID	<3f355b97-3a94-41af-8624-9d244eb555a3@googlegroups.com>
In reply to	#101075

Thanks much to both of you!


On Thursday, December 31, 2015 at 11:05:26 AM UTC-8, Karim wrote:
> On 31/12/2015 19:54, Karim wrote:
> >
> >
> > On 31/12/2015 19:18, snailpail@gmail.com wrote:
> >> I need to check a string over which I have no control for the first 2 
> >> non-white space characters (which should be '[{').
> >>
> >> The string would ideally be: '[{...' but could also be something like
> >> '  [  {  ....'.
> >>
> >> Best to use re and how? Something else?
> >
> > Use pyparsing it is straight forward:
> >
> > >>> from pyparsing import Suppress, restOfLine
> >
> > >>> mystring = Suppress('[') + Suppress('{') + restOfLine
> >
> > >>> result = mystring.parse(' [ { .... I am learning pyparsing' )
> >
> > >>> print result.asList()
> >
> > ['.... I am learning pyparsing']
> >
> > You'll get your string inside the list.
> >
> > Hope this help see pyparsing doc for in depth study.
> >
> > Karim
> 
> Sorry the method to parse a string is parseString not parse, please 
> replace by this line:
> 
>  >>> result = mystring.parseString(' [ { .... I am learning pyparsing' )
> 
> Regards

[toc] | [prev] | [next] | [standalone]

#101078

From	Cory Madden <csmadden@gmail.com>
Date	2015-12-31 10:56 -0800
Message-ID	<mailman.120.1451590313.11925.python-list@python.org>
In reply to	#101070

I would personally use re here.

test_string = '  [{blah blah blah'
matches = re.findall(r'[^\s]', t)
result = ''.join(matches)[:2]
>> '[{'

On Thu, Dec 31, 2015 at 10:18 AM,  <otaksoftspamtrap@gmail.com> wrote:
> I need to check a string over which I have no control for the first 2 non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like
> '  [  {  ....'.
>
> Best to use re and how? Something else?
> --
> https://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [next] | [standalone]

#101082

From	Denis McMahon <denismfmcmahon@gmail.com>
Date	2015-12-31 20:35 +0000
Message-ID	<n643hu$g7k$1@dont-email.me>
In reply to	#101070

On Thu, 31 Dec 2015 10:18:52 -0800, otaksoftspamtrap wrote:

> Best to use re and how? Something else?

Split the string on the space character and check the first two non blank 
elements of the resulting list?

Maybe something similar to the following:

if [x for x in s.split(' ') if x != ''][0:3] == ['(', '(', '(']:
    # string starts '((('

-- 
Denis McMahon, denismfmcmahon@gmail.com

[toc] | [prev] | [next] | [standalone]

#101084

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2015-12-31 23:25 +0000
Message-ID	<mailman.126.1451604389.11925.python-list@python.org>
In reply to	#101070

On 31/12/2015 18:54, Karim wrote:
>
>
> On 31/12/2015 19:18, otaksoftspamtrap@gmail.com wrote:
>> I need to check a string over which I have no control for the first 2
>> non-white space characters (which should be '[{').
>>
>> The string would ideally be: '[{...' but could also be something like
>> '  [  {  ....'.
>>
>> Best to use re and how? Something else?
>
> Use pyparsing it is straight forward:
>
>  >>> from pyparsing import Suppress, restOfLine
>
>  >>> mystring = Suppress('[') + Suppress('{') + restOfLine
>
>  >>> result = mystring.parse(' [ { .... I am learning pyparsing' )
>
>  >>> print result.asList()
>
> ['.... I am learning pyparsing']
>
> You'll get your string inside the list.
>
> Hope this help see pyparsing doc for in depth study.
>
> Karim

Congratulations for writing up one of the most overengineered pile of 
cobblers I've ever seen.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]

#101091

From	Steven D'Aprano <steve@pearwood.info>
Date	2016-01-01 12:12 +1100
Message-ID	<5685d289$0$1616$c3e8da3$5496439d@news.astraweb.com>
In reply to	#101084

On Fri, 1 Jan 2016 10:25 am, Mark Lawrence wrote:

> Congratulations for writing up one of the most overengineered pile of
> cobblers I've ever seen.

You should get out more.


-- 
Steven

[toc] | [prev] | [next] | [standalone]

#101092

From	Steven D'Aprano <steve@pearwood.info>
Date	2016-01-01 12:23 +1100
Message-ID	<5685d504$0$1607$c3e8da3$5496439d@news.astraweb.com>
In reply to	#101070

On Fri, 1 Jan 2016 05:18 am, otaksoftspamtrap@gmail.com wrote:

> I need to check a string over which I have no control for the first 2
> non-white space characters (which should be '[{').
> 
> The string would ideally be: '[{...' but could also be something like
> '  [  {  ....'.
> 
> Best to use re and how? Something else?

This should work, and be very fast, for moderately-sized strings:

def starts_with_brackets(the_string):
    the_string = the_string.replace(" ", "")
    return the_string.startswith("[}")

It might be a bit slow for huge strings (tens of millions of characters),
but for short strings it will be fine.

Alternatively, use a regex:

import re
regex = re.compile(r' *\[ *\{')

if regex.match(the_string):
    print("string starts with [{ as expected")
else:
    raise ValueError("invalid string")

This will probably be slower for small strings, but faster for HUGE strings
(tens of millions of characters). But I expect it will be fast enough.

It is simple enough to skip tabs as well as spaces. Easiest way is to match
on any whitespace:

regex = re.compile(r'\w*\[\w*\{')

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#101095

From	Random832 <random832@fastmail.com>
Date	2015-12-31 20:31 -0500
Message-ID	<mailman.132.1451611887.11925.python-list@python.org>
In reply to	#101070

otaksoftspamtrap@gmail.com writes:
> I need to check a string over which I have no control for the first 2
> non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like 
> '  [  {  ....'.
>
> Best to use re and how? Something else?

Is it an arbitrary string, or is it a JSON object consisting of a list
whose first element is a dictionary? Because if you're planning on
reading it as a JSON object later you could just validate the types
after you've parsed it.

[toc] | [prev] | [next] | [standalone]

#101100

From	Jussi Piitulainen <harvesting@is.invalid>
Date	2016-01-01 10:16 +0200
Message-ID	<lf5mvsppvdu.fsf@ling.helsinki.fi>
In reply to	#101070

otaksoftspamtrap@gmail.com writes:

> I need to check a string over which I have no control for the first 2
> non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like 
> '  [  {  ....'.
>
> Best to use re and how? Something else?

No comment on whether re is good for your use case but another comment
on how. First, some test data:

  >>> data = '\r\n  {\r\n\t[ "etc" ]}\n\n\n')

Then the actual comment - there's a special regex type, \S, to match a
non-whitespace character, and a method to produce matches on demand:

  >>> black = re.compile(r'\S')
  >>> matches = re.finditer(black, data)

Then the demonstration. This accesses the first, then second, then third
match:

  >>> empty = re.match('', '')
  >>> next(matches, empty).group()
  '{'
  >>> next(matches, empty).group()
  '['
  >>> next(matches, empty).group()
  '"'

The empty match object provides an appropriate .group() when there is no
first or second (and so on) non-whitespace character in the data:

  >>> matches = re.finditer(black, '\r\t\n')
  >>> next(matches, empty).group()
  ''
  >>> next(matches, empty).group()
  ''

[toc] | [prev] | [next] | [standalone]

#101101

From	Karim <kliateni@gmail.com>
Date	2016-01-01 10:43 +0100
Message-ID	<mailman.136.1451641422.11925.python-list@python.org>
In reply to	#101070


On 01/01/2016 00:25, Mark Lawrence wrote:
> On 31/12/2015 18:54, Karim wrote:
>>
>>
>> On 31/12/2015 19:18, otaksoftspamtrap@gmail.com wrote:
>>> I need to check a string over which I have no control for the first 2
>>> non-white space characters (which should be '[{').
>>>
>>> The string would ideally be: '[{...' but could also be something like
>>> '  [  {  ....'.
>>>
>>> Best to use re and how? Something else?
>>
>> Use pyparsing it is straight forward:
>>
>>  >>> from pyparsing import Suppress, restOfLine
>>
>>  >>> mystring = Suppress('[') + Suppress('{') + restOfLine
>>
>>  >>> result = mystring.parse(' [ { .... I am learning pyparsing' )
>>
>>  >>> print result.asList()
>>
>> ['.... I am learning pyparsing']
>>
>> You'll get your string inside the list.
>>
>> Hope this help see pyparsing doc for in depth study.
>>
>> Karim
>
> Congratulations for writing up one of the most overengineered pile of 
> cobblers I've ever seen.
>

You welcome !

The intent was to make a simple introduction to pyparsing which is a 
powerful tool for more complex parser build.

Karim

[toc] | [prev] | [standalone]

csiph-web

Newbie: Check first two non-whitespace characters

Contents

#101070 — Newbie: Check first two non-whitespace characters

#101071

#101072

#101075

#101076

#101078

#101082

#101084

#101091

#101092

#101095

#101100

#101101