Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #101070 > unrolled thread
| Started by | otaksoftspamtrap@gmail.com |
|---|---|
| First post | 2015-12-31 10:18 -0800 |
| Last post | 2016-01-01 10:43 +0100 |
| Articles | 13 — 10 participants |
Back to article view | Back to comp.lang.python
Newbie: Check first two non-whitespace characters otaksoftspamtrap@gmail.com - 2015-12-31 10:18 -0800
Re: Newbie: Check first two non-whitespace characters MRAB <python@mrabarnett.plus.com> - 2015-12-31 18:38 +0000
Re: Newbie: Check first two non-whitespace characters Karim <kliateni@gmail.com> - 2015-12-31 19:54 +0100
Re: Newbie: Check first two non-whitespace characters Karim <kliateni@gmail.com> - 2015-12-31 20:05 +0100
Re: Newbie: Check first two non-whitespace characters cassius.fechter@gmail.com - 2015-12-31 11:21 -0800
Re: Newbie: Check first two non-whitespace characters Cory Madden <csmadden@gmail.com> - 2015-12-31 10:56 -0800
Re: Newbie: Check first two non-whitespace characters Denis McMahon <denismfmcmahon@gmail.com> - 2015-12-31 20:35 +0000
Re: Newbie: Check first two non-whitespace characters Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-12-31 23:25 +0000
Re: Newbie: Check first two non-whitespace characters Steven D'Aprano <steve@pearwood.info> - 2016-01-01 12:12 +1100
Re: Newbie: Check first two non-whitespace characters Steven D'Aprano <steve@pearwood.info> - 2016-01-01 12:23 +1100
Re: Newbie: Check first two non-whitespace characters Random832 <random832@fastmail.com> - 2015-12-31 20:31 -0500
Re: Newbie: Check first two non-whitespace characters Jussi Piitulainen <harvesting@is.invalid> - 2016-01-01 10:16 +0200
Re: Newbie: Check first two non-whitespace characters Karim <kliateni@gmail.com> - 2016-01-01 10:43 +0100
| From | otaksoftspamtrap@gmail.com |
|---|---|
| Date | 2015-12-31 10:18 -0800 |
| Subject | Newbie: Check first two non-whitespace characters |
| Message-ID | <240ab049-68a0-4a00-b911-d58cae9bfbcf@googlegroups.com> |
I need to check a string over which I have no control for the first 2 non-white space characters (which should be '[{').
The string would ideally be: '[{...' but could also be something like
' [ { ....'.
Best to use re and how? Something else?
[toc] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2015-12-31 18:38 +0000 |
| Message-ID | <mailman.115.1451587270.11925.python-list@python.org> |
| In reply to | #101070 |
On 2015-12-31 18:18, otaksoftspamtrap@gmail.com wrote:
> I need to check a string over which I have no control for the first 2 non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like
> ' [ { ....'.
>
> Best to use re and how? Something else?
>
I would use .split and then ''.join:
>>> ''.join(' [ { ....'.split())
'[{....'
It might be faster if you provide a maximum for the number of splits:
>>> ''.join(' [ { ....'.split(None, 1))
'[{ ....'
[toc] | [prev] | [next] | [standalone]
| From | Karim <kliateni@gmail.com> |
|---|---|
| Date | 2015-12-31 19:54 +0100 |
| Message-ID | <mailman.116.1451588066.11925.python-list@python.org> |
| In reply to | #101070 |
On 31/12/2015 19:18, otaksoftspamtrap@gmail.com wrote:
> I need to check a string over which I have no control for the first 2 non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like
> ' [ { ....'.
>
> Best to use re and how? Something else?
Use pyparsing it is straight forward:
>>> from pyparsing import Suppress, restOfLine
>>> mystring = Suppress('[') + Suppress('{') + restOfLine
>>> result = mystring.parse(' [ { .... I am learning pyparsing' )
>>> print result.asList()
['.... I am learning pyparsing']
You'll get your string inside the list.
Hope this help see pyparsing doc for in depth study.
Karim
[toc] | [prev] | [next] | [standalone]
| From | Karim <kliateni@gmail.com> |
|---|---|
| Date | 2015-12-31 20:05 +0100 |
| Message-ID | <mailman.119.1451588711.11925.python-list@python.org> |
| In reply to | #101070 |
On 31/12/2015 19:54, Karim wrote:
>
>
> On 31/12/2015 19:18, otaksoftspamtrap@gmail.com wrote:
>> I need to check a string over which I have no control for the first 2
>> non-white space characters (which should be '[{').
>>
>> The string would ideally be: '[{...' but could also be something like
>> ' [ { ....'.
>>
>> Best to use re and how? Something else?
>
> Use pyparsing it is straight forward:
>
> >>> from pyparsing import Suppress, restOfLine
>
> >>> mystring = Suppress('[') + Suppress('{') + restOfLine
>
> >>> result = mystring.parse(' [ { .... I am learning pyparsing' )
>
> >>> print result.asList()
>
> ['.... I am learning pyparsing']
>
> You'll get your string inside the list.
>
> Hope this help see pyparsing doc for in depth study.
>
> Karim
Sorry the method to parse a string is parseString not parse, please
replace by this line:
>>> result = mystring.parseString(' [ { .... I am learning pyparsing' )
Regards
[toc] | [prev] | [next] | [standalone]
| From | cassius.fechter@gmail.com |
|---|---|
| Date | 2015-12-31 11:21 -0800 |
| Message-ID | <3f355b97-3a94-41af-8624-9d244eb555a3@googlegroups.com> |
| In reply to | #101075 |
Thanks much to both of you!
On Thursday, December 31, 2015 at 11:05:26 AM UTC-8, Karim wrote:
> On 31/12/2015 19:54, Karim wrote:
> >
> >
> > On 31/12/2015 19:18, snailpail@gmail.com wrote:
> >> I need to check a string over which I have no control for the first 2
> >> non-white space characters (which should be '[{').
> >>
> >> The string would ideally be: '[{...' but could also be something like
> >> ' [ { ....'.
> >>
> >> Best to use re and how? Something else?
> >
> > Use pyparsing it is straight forward:
> >
> > >>> from pyparsing import Suppress, restOfLine
> >
> > >>> mystring = Suppress('[') + Suppress('{') + restOfLine
> >
> > >>> result = mystring.parse(' [ { .... I am learning pyparsing' )
> >
> > >>> print result.asList()
> >
> > ['.... I am learning pyparsing']
> >
> > You'll get your string inside the list.
> >
> > Hope this help see pyparsing doc for in depth study.
> >
> > Karim
>
> Sorry the method to parse a string is parseString not parse, please
> replace by this line:
>
> >>> result = mystring.parseString(' [ { .... I am learning pyparsing' )
>
> Regards
[toc] | [prev] | [next] | [standalone]
| From | Cory Madden <csmadden@gmail.com> |
|---|---|
| Date | 2015-12-31 10:56 -0800 |
| Message-ID | <mailman.120.1451590313.11925.python-list@python.org> |
| In reply to | #101070 |
I would personally use re here.
test_string = ' [{blah blah blah'
matches = re.findall(r'[^\s]', t)
result = ''.join(matches)[:2]
>> '[{'
On Thu, Dec 31, 2015 at 10:18 AM, <otaksoftspamtrap@gmail.com> wrote:
> I need to check a string over which I have no control for the first 2 non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like
> ' [ { ....'.
>
> Best to use re and how? Something else?
> --
> https://mail.python.org/mailman/listinfo/python-list
[toc] | [prev] | [next] | [standalone]
| From | Denis McMahon <denismfmcmahon@gmail.com> |
|---|---|
| Date | 2015-12-31 20:35 +0000 |
| Message-ID | <n643hu$g7k$1@dont-email.me> |
| In reply to | #101070 |
On Thu, 31 Dec 2015 10:18:52 -0800, otaksoftspamtrap wrote:
> Best to use re and how? Something else?
Split the string on the space character and check the first two non blank
elements of the resulting list?
Maybe something similar to the following:
if [x for x in s.split(' ') if x != ''][0:3] == ['(', '(', '(']:
# string starts '((('
--
Denis McMahon, denismfmcmahon@gmail.com
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2015-12-31 23:25 +0000 |
| Message-ID | <mailman.126.1451604389.11925.python-list@python.org> |
| In reply to | #101070 |
On 31/12/2015 18:54, Karim wrote:
>
>
> On 31/12/2015 19:18, otaksoftspamtrap@gmail.com wrote:
>> I need to check a string over which I have no control for the first 2
>> non-white space characters (which should be '[{').
>>
>> The string would ideally be: '[{...' but could also be something like
>> ' [ { ....'.
>>
>> Best to use re and how? Something else?
>
> Use pyparsing it is straight forward:
>
> >>> from pyparsing import Suppress, restOfLine
>
> >>> mystring = Suppress('[') + Suppress('{') + restOfLine
>
> >>> result = mystring.parse(' [ { .... I am learning pyparsing' )
>
> >>> print result.asList()
>
> ['.... I am learning pyparsing']
>
> You'll get your string inside the list.
>
> Hope this help see pyparsing doc for in depth study.
>
> Karim
Congratulations for writing up one of the most overengineered pile of
cobblers I've ever seen.
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.
Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2016-01-01 12:12 +1100 |
| Message-ID | <5685d289$0$1616$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #101084 |
On Fri, 1 Jan 2016 10:25 am, Mark Lawrence wrote: > Congratulations for writing up one of the most overengineered pile of > cobblers I've ever seen. You should get out more. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2016-01-01 12:23 +1100 |
| Message-ID | <5685d504$0$1607$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #101070 |
On Fri, 1 Jan 2016 05:18 am, otaksoftspamtrap@gmail.com wrote:
> I need to check a string over which I have no control for the first 2
> non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like
> ' [ { ....'.
>
> Best to use re and how? Something else?
This should work, and be very fast, for moderately-sized strings:
def starts_with_brackets(the_string):
the_string = the_string.replace(" ", "")
return the_string.startswith("[}")
It might be a bit slow for huge strings (tens of millions of characters),
but for short strings it will be fine.
Alternatively, use a regex:
import re
regex = re.compile(r' *\[ *\{')
if regex.match(the_string):
print("string starts with [{ as expected")
else:
raise ValueError("invalid string")
This will probably be slower for small strings, but faster for HUGE strings
(tens of millions of characters). But I expect it will be fast enough.
It is simple enough to skip tabs as well as spaces. Easiest way is to match
on any whitespace:
regex = re.compile(r'\w*\[\w*\{')
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Random832 <random832@fastmail.com> |
|---|---|
| Date | 2015-12-31 20:31 -0500 |
| Message-ID | <mailman.132.1451611887.11925.python-list@python.org> |
| In reply to | #101070 |
otaksoftspamtrap@gmail.com writes:
> I need to check a string over which I have no control for the first 2
> non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like
> ' [ { ....'.
>
> Best to use re and how? Something else?
Is it an arbitrary string, or is it a JSON object consisting of a list
whose first element is a dictionary? Because if you're planning on
reading it as a JSON object later you could just validate the types
after you've parsed it.
[toc] | [prev] | [next] | [standalone]
| From | Jussi Piitulainen <harvesting@is.invalid> |
|---|---|
| Date | 2016-01-01 10:16 +0200 |
| Message-ID | <lf5mvsppvdu.fsf@ling.helsinki.fi> |
| In reply to | #101070 |
otaksoftspamtrap@gmail.com writes:
> I need to check a string over which I have no control for the first 2
> non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like
> ' [ { ....'.
>
> Best to use re and how? Something else?
No comment on whether re is good for your use case but another comment
on how. First, some test data:
>>> data = '\r\n {\r\n\t[ "etc" ]}\n\n\n')
Then the actual comment - there's a special regex type, \S, to match a
non-whitespace character, and a method to produce matches on demand:
>>> black = re.compile(r'\S')
>>> matches = re.finditer(black, data)
Then the demonstration. This accesses the first, then second, then third
match:
>>> empty = re.match('', '')
>>> next(matches, empty).group()
'{'
>>> next(matches, empty).group()
'['
>>> next(matches, empty).group()
'"'
The empty match object provides an appropriate .group() when there is no
first or second (and so on) non-whitespace character in the data:
>>> matches = re.finditer(black, '\r\t\n')
>>> next(matches, empty).group()
''
>>> next(matches, empty).group()
''
[toc] | [prev] | [next] | [standalone]
| From | Karim <kliateni@gmail.com> |
|---|---|
| Date | 2016-01-01 10:43 +0100 |
| Message-ID | <mailman.136.1451641422.11925.python-list@python.org> |
| In reply to | #101070 |
On 01/01/2016 00:25, Mark Lawrence wrote:
> On 31/12/2015 18:54, Karim wrote:
>>
>>
>> On 31/12/2015 19:18, otaksoftspamtrap@gmail.com wrote:
>>> I need to check a string over which I have no control for the first 2
>>> non-white space characters (which should be '[{').
>>>
>>> The string would ideally be: '[{...' but could also be something like
>>> ' [ { ....'.
>>>
>>> Best to use re and how? Something else?
>>
>> Use pyparsing it is straight forward:
>>
>> >>> from pyparsing import Suppress, restOfLine
>>
>> >>> mystring = Suppress('[') + Suppress('{') + restOfLine
>>
>> >>> result = mystring.parse(' [ { .... I am learning pyparsing' )
>>
>> >>> print result.asList()
>>
>> ['.... I am learning pyparsing']
>>
>> You'll get your string inside the list.
>>
>> Hope this help see pyparsing doc for in depth study.
>>
>> Karim
>
> Congratulations for writing up one of the most overengineered pile of
> cobblers I've ever seen.
>
You welcome !
The intent was to make a simple introduction to pyparsing which is a
powerful tool for more complex parser build.
Karim
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web