Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #101070 > unrolled thread

Newbie: Check first two non-whitespace characters

Started byotaksoftspamtrap@gmail.com
First post2015-12-31 10:18 -0800
Last post2016-01-01 10:43 +0100
Articles 13 — 10 participants

Back to article view | Back to comp.lang.python


Contents

  Newbie: Check first two non-whitespace characters otaksoftspamtrap@gmail.com - 2015-12-31 10:18 -0800
    Re: Newbie: Check first two non-whitespace characters MRAB <python@mrabarnett.plus.com> - 2015-12-31 18:38 +0000
    Re: Newbie: Check first two non-whitespace characters Karim <kliateni@gmail.com> - 2015-12-31 19:54 +0100
    Re: Newbie: Check first two non-whitespace characters Karim <kliateni@gmail.com> - 2015-12-31 20:05 +0100
      Re: Newbie: Check first two non-whitespace characters cassius.fechter@gmail.com - 2015-12-31 11:21 -0800
    Re: Newbie: Check first two non-whitespace characters Cory Madden <csmadden@gmail.com> - 2015-12-31 10:56 -0800
    Re: Newbie: Check first two non-whitespace characters Denis McMahon <denismfmcmahon@gmail.com> - 2015-12-31 20:35 +0000
    Re: Newbie: Check first two non-whitespace characters Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-12-31 23:25 +0000
      Re: Newbie: Check first two non-whitespace characters Steven D'Aprano <steve@pearwood.info> - 2016-01-01 12:12 +1100
    Re: Newbie: Check first two non-whitespace characters Steven D'Aprano <steve@pearwood.info> - 2016-01-01 12:23 +1100
    Re: Newbie: Check first two non-whitespace characters Random832 <random832@fastmail.com> - 2015-12-31 20:31 -0500
    Re: Newbie: Check first two non-whitespace characters Jussi Piitulainen <harvesting@is.invalid> - 2016-01-01 10:16 +0200
    Re: Newbie: Check first two non-whitespace characters Karim <kliateni@gmail.com> - 2016-01-01 10:43 +0100

#101070 — Newbie: Check first two non-whitespace characters

Fromotaksoftspamtrap@gmail.com
Date2015-12-31 10:18 -0800
SubjectNewbie: Check first two non-whitespace characters
Message-ID<240ab049-68a0-4a00-b911-d58cae9bfbcf@googlegroups.com>
I need to check a string over which I have no control for the first 2 non-white space characters (which should be '[{').

The string would ideally be: '[{...' but could also be something like 
'  [  {  ....'.

Best to use re and how? Something else?

[toc] | [next] | [standalone]


#101071

FromMRAB <python@mrabarnett.plus.com>
Date2015-12-31 18:38 +0000
Message-ID<mailman.115.1451587270.11925.python-list@python.org>
In reply to#101070
On 2015-12-31 18:18, otaksoftspamtrap@gmail.com wrote:
> I need to check a string over which I have no control for the first 2 non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like
> '  [  {  ....'.
>
> Best to use re and how? Something else?
>
I would use .split and then ''.join:

 >>> ''.join(' [ { ....'.split())
'[{....'

It might be faster if you provide a maximum for the number of splits:

 >>> ''.join(' [ { ....'.split(None, 1))
'[{ ....'

[toc] | [prev] | [next] | [standalone]


#101072

FromKarim <kliateni@gmail.com>
Date2015-12-31 19:54 +0100
Message-ID<mailman.116.1451588066.11925.python-list@python.org>
In reply to#101070

On 31/12/2015 19:18, otaksoftspamtrap@gmail.com wrote:
> I need to check a string over which I have no control for the first 2 non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like
> '  [  {  ....'.
>
> Best to use re and how? Something else?

Use pyparsing it is straight forward:

 >>> from pyparsing import Suppress, restOfLine

 >>> mystring = Suppress('[') + Suppress('{') + restOfLine

 >>> result = mystring.parse(' [ { .... I am learning pyparsing' )

 >>> print result.asList()

['.... I am learning pyparsing']

You'll get your string inside the list.

Hope this help see pyparsing doc for in depth study.

Karim

[toc] | [prev] | [next] | [standalone]


#101075

FromKarim <kliateni@gmail.com>
Date2015-12-31 20:05 +0100
Message-ID<mailman.119.1451588711.11925.python-list@python.org>
In reply to#101070

On 31/12/2015 19:54, Karim wrote:
>
>
> On 31/12/2015 19:18, otaksoftspamtrap@gmail.com wrote:
>> I need to check a string over which I have no control for the first 2 
>> non-white space characters (which should be '[{').
>>
>> The string would ideally be: '[{...' but could also be something like
>> '  [  {  ....'.
>>
>> Best to use re and how? Something else?
>
> Use pyparsing it is straight forward:
>
> >>> from pyparsing import Suppress, restOfLine
>
> >>> mystring = Suppress('[') + Suppress('{') + restOfLine
>
> >>> result = mystring.parse(' [ { .... I am learning pyparsing' )
>
> >>> print result.asList()
>
> ['.... I am learning pyparsing']
>
> You'll get your string inside the list.
>
> Hope this help see pyparsing doc for in depth study.
>
> Karim

Sorry the method to parse a string is parseString not parse, please 
replace by this line:

 >>> result = mystring.parseString(' [ { .... I am learning pyparsing' )

Regards

[toc] | [prev] | [next] | [standalone]


#101076

Fromcassius.fechter@gmail.com
Date2015-12-31 11:21 -0800
Message-ID<3f355b97-3a94-41af-8624-9d244eb555a3@googlegroups.com>
In reply to#101075
Thanks much to both of you!


On Thursday, December 31, 2015 at 11:05:26 AM UTC-8, Karim wrote:
> On 31/12/2015 19:54, Karim wrote:
> >
> >
> > On 31/12/2015 19:18, snailpail@gmail.com wrote:
> >> I need to check a string over which I have no control for the first 2 
> >> non-white space characters (which should be '[{').
> >>
> >> The string would ideally be: '[{...' but could also be something like
> >> '  [  {  ....'.
> >>
> >> Best to use re and how? Something else?
> >
> > Use pyparsing it is straight forward:
> >
> > >>> from pyparsing import Suppress, restOfLine
> >
> > >>> mystring = Suppress('[') + Suppress('{') + restOfLine
> >
> > >>> result = mystring.parse(' [ { .... I am learning pyparsing' )
> >
> > >>> print result.asList()
> >
> > ['.... I am learning pyparsing']
> >
> > You'll get your string inside the list.
> >
> > Hope this help see pyparsing doc for in depth study.
> >
> > Karim
> 
> Sorry the method to parse a string is parseString not parse, please 
> replace by this line:
> 
>  >>> result = mystring.parseString(' [ { .... I am learning pyparsing' )
> 
> Regards

[toc] | [prev] | [next] | [standalone]


#101078

FromCory Madden <csmadden@gmail.com>
Date2015-12-31 10:56 -0800
Message-ID<mailman.120.1451590313.11925.python-list@python.org>
In reply to#101070
I would personally use re here.

test_string = '  [{blah blah blah'
matches = re.findall(r'[^\s]', t)
result = ''.join(matches)[:2]
>> '[{'

On Thu, Dec 31, 2015 at 10:18 AM,  <otaksoftspamtrap@gmail.com> wrote:
> I need to check a string over which I have no control for the first 2 non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like
> '  [  {  ....'.
>
> Best to use re and how? Something else?
> --
> https://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [next] | [standalone]


#101082

FromDenis McMahon <denismfmcmahon@gmail.com>
Date2015-12-31 20:35 +0000
Message-ID<n643hu$g7k$1@dont-email.me>
In reply to#101070
On Thu, 31 Dec 2015 10:18:52 -0800, otaksoftspamtrap wrote:

> Best to use re and how? Something else?

Split the string on the space character and check the first two non blank 
elements of the resulting list?

Maybe something similar to the following:

if [x for x in s.split(' ') if x != ''][0:3] == ['(', '(', '(']:
    # string starts '((('

-- 
Denis McMahon, denismfmcmahon@gmail.com

[toc] | [prev] | [next] | [standalone]


#101084

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2015-12-31 23:25 +0000
Message-ID<mailman.126.1451604389.11925.python-list@python.org>
In reply to#101070
On 31/12/2015 18:54, Karim wrote:
>
>
> On 31/12/2015 19:18, otaksoftspamtrap@gmail.com wrote:
>> I need to check a string over which I have no control for the first 2
>> non-white space characters (which should be '[{').
>>
>> The string would ideally be: '[{...' but could also be something like
>> '  [  {  ....'.
>>
>> Best to use re and how? Something else?
>
> Use pyparsing it is straight forward:
>
>  >>> from pyparsing import Suppress, restOfLine
>
>  >>> mystring = Suppress('[') + Suppress('{') + restOfLine
>
>  >>> result = mystring.parse(' [ { .... I am learning pyparsing' )
>
>  >>> print result.asList()
>
> ['.... I am learning pyparsing']
>
> You'll get your string inside the list.
>
> Hope this help see pyparsing doc for in depth study.
>
> Karim

Congratulations for writing up one of the most overengineered pile of 
cobblers I've ever seen.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#101091

FromSteven D'Aprano <steve@pearwood.info>
Date2016-01-01 12:12 +1100
Message-ID<5685d289$0$1616$c3e8da3$5496439d@news.astraweb.com>
In reply to#101084
On Fri, 1 Jan 2016 10:25 am, Mark Lawrence wrote:

> Congratulations for writing up one of the most overengineered pile of
> cobblers I've ever seen.

You should get out more.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#101092

FromSteven D'Aprano <steve@pearwood.info>
Date2016-01-01 12:23 +1100
Message-ID<5685d504$0$1607$c3e8da3$5496439d@news.astraweb.com>
In reply to#101070
On Fri, 1 Jan 2016 05:18 am, otaksoftspamtrap@gmail.com wrote:

> I need to check a string over which I have no control for the first 2
> non-white space characters (which should be '[{').
> 
> The string would ideally be: '[{...' but could also be something like
> '  [  {  ....'.
> 
> Best to use re and how? Something else?

This should work, and be very fast, for moderately-sized strings:


def starts_with_brackets(the_string):
    the_string = the_string.replace(" ", "")
    return the_string.startswith("[}")


It might be a bit slow for huge strings (tens of millions of characters),
but for short strings it will be fine.

Alternatively, use a regex:


import re
regex = re.compile(r' *\[ *\{')

if regex.match(the_string):
    print("string starts with [{ as expected")
else:
    raise ValueError("invalid string")


This will probably be slower for small strings, but faster for HUGE strings
(tens of millions of characters). But I expect it will be fast enough.

It is simple enough to skip tabs as well as spaces. Easiest way is to match
on any whitespace:

regex = re.compile(r'\w*\[\w*\{')




-- 
Steven

[toc] | [prev] | [next] | [standalone]


#101095

FromRandom832 <random832@fastmail.com>
Date2015-12-31 20:31 -0500
Message-ID<mailman.132.1451611887.11925.python-list@python.org>
In reply to#101070
otaksoftspamtrap@gmail.com writes:
> I need to check a string over which I have no control for the first 2
> non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like 
> '  [  {  ....'.
>
> Best to use re and how? Something else?

Is it an arbitrary string, or is it a JSON object consisting of a list
whose first element is a dictionary? Because if you're planning on
reading it as a JSON object later you could just validate the types
after you've parsed it.

[toc] | [prev] | [next] | [standalone]


#101100

FromJussi Piitulainen <harvesting@is.invalid>
Date2016-01-01 10:16 +0200
Message-ID<lf5mvsppvdu.fsf@ling.helsinki.fi>
In reply to#101070
otaksoftspamtrap@gmail.com writes:

> I need to check a string over which I have no control for the first 2
> non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like 
> '  [  {  ....'.
>
> Best to use re and how? Something else?

No comment on whether re is good for your use case but another comment
on how. First, some test data:

  >>> data = '\r\n  {\r\n\t[ "etc" ]}\n\n\n')

Then the actual comment - there's a special regex type, \S, to match a
non-whitespace character, and a method to produce matches on demand:

  >>> black = re.compile(r'\S')
  >>> matches = re.finditer(black, data)

Then the demonstration. This accesses the first, then second, then third
match:

  >>> empty = re.match('', '')
  >>> next(matches, empty).group()
  '{'
  >>> next(matches, empty).group()
  '['
  >>> next(matches, empty).group()
  '"'

The empty match object provides an appropriate .group() when there is no
first or second (and so on) non-whitespace character in the data:

  >>> matches = re.finditer(black, '\r\t\n')
  >>> next(matches, empty).group()
  ''
  >>> next(matches, empty).group()
  ''

[toc] | [prev] | [next] | [standalone]


#101101

FromKarim <kliateni@gmail.com>
Date2016-01-01 10:43 +0100
Message-ID<mailman.136.1451641422.11925.python-list@python.org>
In reply to#101070

On 01/01/2016 00:25, Mark Lawrence wrote:
> On 31/12/2015 18:54, Karim wrote:
>>
>>
>> On 31/12/2015 19:18, otaksoftspamtrap@gmail.com wrote:
>>> I need to check a string over which I have no control for the first 2
>>> non-white space characters (which should be '[{').
>>>
>>> The string would ideally be: '[{...' but could also be something like
>>> '  [  {  ....'.
>>>
>>> Best to use re and how? Something else?
>>
>> Use pyparsing it is straight forward:
>>
>>  >>> from pyparsing import Suppress, restOfLine
>>
>>  >>> mystring = Suppress('[') + Suppress('{') + restOfLine
>>
>>  >>> result = mystring.parse(' [ { .... I am learning pyparsing' )
>>
>>  >>> print result.asList()
>>
>> ['.... I am learning pyparsing']
>>
>> You'll get your string inside the list.
>>
>> Hope this help see pyparsing doc for in depth study.
>>
>> Karim
>
> Congratulations for writing up one of the most overengineered pile of 
> cobblers I've ever seen.
>

You welcome !

The intent was to make a simple introduction to pyparsing which is a 
powerful tool for more complex parser build.

Karim

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web