Groups > comp.lang.python > #9680 > unrolled thread

a little parsing challenge ☺

Started by	Xah Lee <xahlee@gmail.com>
First post	2011-07-17 00:47 -0700
Last post	2011-07-19 22:43 -0700
Articles	20 on this page of 72 — 28 participants

Back to article view | Back to comp.lang.python

  a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-17 00:47 -0700
    Re: a little parsing challenge ☺ Raymond Hettinger <python@rcn.com> - 2011-07-17 02:48 -0700
      Re: a little parsing challenge ☺ Robert Klemme <shortcutter@googlemail.com> - 2011-07-17 15:20 +0200
        Re: a little parsing challenge ☺ mhenn <michihenn@hotmail.com> - 2011-07-17 15:55 +0200
          Re: a little parsing challenge ☺ Robert Klemme <shortcutter@googlemail.com> - 2011-07-17 18:01 +0200
            Re: a little parsing challenge ☺ Robert Klemme <shortcutter@googlemail.com> - 2011-07-17 18:54 +0200
      Re: a little parsing challenge ☺ Thomas Boell <tboell@domain.invalid> - 2011-07-17 17:49 +0200
        Re: a little parsing challenge ☺ Raymond Hettinger <python@rcn.com> - 2011-07-17 12:16 -0700
      Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-18 07:39 -0700
        Re: a little parsing challenge ☺ Robert Klemme <shortcutter@googlemail.com> - 2011-07-20 08:23 +0200
        Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-20 03:31 -0700
          Re: a little parsing challenge ☺ "Uri Guttman" <uri@StemSystems.com> - 2011-07-20 12:31 -0400
            Re: a little parsing challenge ☺ rusi <rustompmody@gmail.com> - 2011-07-20 10:30 -0700
            Re: a little parsing challenge ☺ merlyn@stonehenge.com (Randal L. Schwartz) - 2011-07-20 12:06 -0700
              Re: a little parsing challenge ☺ Jason Earl <jearl@notengoamigos.org> - 2011-07-20 14:57 -0600
      Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-19 09:54 -0700
        Re: a little parsing challenge ☺ Thomas Jollans <t@jollybox.de> - 2011-07-19 20:07 +0200
          Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-21 05:58 -0700
            Re: a little parsing challenge ☺ Ian Kelly <ian.g.kelly@gmail.com> - 2011-07-21 08:26 -0600
              Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-21 08:36 -0700
                Re: a little parsing challenge ☺ python@bdurham.com - 2011-07-21 12:43 -0400
                  Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-21 11:53 -0700
                    Re: a little parsing challenge ☺ Terry Reedy <tjreedy@udel.edu> - 2011-07-21 18:37 -0400
            Re: a little parsing challenge ☺ John O'Hagan <research@johnohagan.com> - 2011-07-25 15:57 +1000
        Re: a little parsing challenge ☺ Ian Kelly <ian.g.kelly@gmail.com> - 2011-07-19 12:08 -0600
    Re: a little parsing challenge ☺ Chris Angelico <rosuav@gmail.com> - 2011-07-17 21:34 +1000
      Re: a little parsing challenge ☺ rusi <rustompmody@gmail.com> - 2011-07-17 04:52 -0700
      Re: a little parsing challenge ☺ Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-07-17 16:15 +0200
        Re: a little parsing challenge ☺ Raymond Hettinger <python@rcn.com> - 2011-07-17 12:18 -0700
          Re: a little parsing challenge ☺ Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-07-17 22:16 +0200
            Re: a little parsing challenge ☺ Thomas Jollans <t@jollybox.de> - 2011-07-17 22:57 +0200
        Re: a little parsing challenge ☺ Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-07-17 23:43 +0200
        Re: a little parsing challenge ☺ Rouslan Korneychuk <rouslank@msn.com> - 2011-07-18 03:09 -0400
          Re: a little parsing challenge ☺ Stefan Behnel <stefan_ml@behnel.de> - 2011-07-18 09:24 +0200
            Re: a little parsing challenge ☺ Rouslan Korneychuk <rouslank@msn.com> - 2011-07-18 04:04 -0400
          Re: a little parsing challenge ☺ Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-07-18 18:46 +0200
            Re: a little parsing challenge ☺ Rouslan Korneychuk <rouslank@msn.com> - 2011-07-18 14:14 -0400
          Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-21 06:23 -0700
            Re: a little parsing challenge ☺ Rouslan Korneychuk <rouslank@msn.com> - 2011-07-21 17:54 -0400
    Re: a little parsing challenge ☺ gene heskett <gheskett@wdtv.com> - 2011-07-17 10:26 -0400
    Re: a little parsing challenge ☺ Thomas Jollans <t@jollybox.de> - 2011-07-17 08:31 -0700
      Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-19 10:49 -0700
        Re: a little parsing challenge ☺ Thomas Jollans <t@jollybox.de> - 2011-07-19 20:14 +0200
          Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-21 05:29 -0700
            Re: a little parsing challenge ☺ Thomas Jollans <t@jollybox.de> - 2011-07-21 15:21 +0200
        Re: a little parsing challenge ☺ Thomas Jollans <t@jollybox.de> - 2011-07-19 20:17 +0200
    Re: a little parsing challenge ☺ rantingrick <rantingrick@gmail.com> - 2011-07-17 18:52 -0700
    Re: a little parsing challenge ☺ Billy Mays <81282ed9a88799d21e77957df2d84bd6514d9af6@myhashismyemail.com> - 2011-07-18 13:12 -0400
      Re: a little parsing challenge ☺ Ian Kelly <ian.g.kelly@gmail.com> - 2011-07-18 12:10 -0600
        Re: a little parsing challenge ☺ Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-07-18 23:59 +0200
          Re: a little parsing challenge ☺ Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-07-19 08:09 +0200
          Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-19 10:32 -0700
      Re: a little parsing challenge ☺ Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-07-19 09:56 +1000
        Re: a little parsing challenge ☺ Billy Mays <noway@nohow.com> - 2011-07-18 22:07 -0400
          Re: a little parsing challenge ☺ rusi <rustompmody@gmail.com> - 2011-07-18 19:50 -0700
            Re: a little parsing challenge ☺ Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-07-19 13:11 +1000
              Re: a little parsing challenge ☺ rusi <rustompmody@gmail.com> - 2011-07-18 21:59 -0700
                Re: a little parsing challenge ☺ Chris Angelico <rosuav@gmail.com> - 2011-07-19 15:36 +1000
          Re: a little parsing challenge ☺ MRAB <python@mrabarnett.plus.com> - 2011-07-19 04:08 +0100
          Re: a little parsing challenge ☺ Benjamin Kaplan <benjamin.kaplan@case.edu> - 2011-07-18 20:54 -0700
          Re: a little parsing challenge ☺ Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-07-19 14:30 +1000
          Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-19 01:58 -0700
      Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-19 10:14 -0700
        Re: a little parsing challenge ☺ Billy Mays <81282ed9a88799d21e77957df2d84bd6514d9af6@myhashismyemail.com> - 2011-07-19 13:33 -0400
          Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-19 11:12 -0700
            Re: a little parsing challenge ☺ Terry Reedy <tjreedy@udel.edu> - 2011-07-19 15:09 -0400
              Re: a little parsing challenge ☺ jmfauth <wxjmfauth@gmail.com> - 2011-07-19 23:29 -0700
                Re: a little parsing challenge ☺ Ian Kelly <ian.g.kelly@gmail.com> - 2011-07-20 01:29 -0600
                  Re: a little parsing challenge ☺ jmfauth <wxjmfauth@gmail.com> - 2011-07-20 00:54 -0700
                    Re: a little parsing challenge ☺ Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-07-20 18:18 +1000
    Re: a little parsing challenge ? sln@netherlands.com - 2011-07-18 12:34 -0700
    Re: a little parsing challenge ☺ Mark Tarver <dr.mtarver@gmail.com> - 2011-07-19 22:43 -0700

Page 2 of 4 — ← Prev page 1 [2] 3 4 Next page →

#10029

From	python@bdurham.com
Date	2011-07-21 12:43 -0400
Message-ID	<mailman.1322.1311267083.1164.python-list@python.org>
In reply to	#10028

Xah,

1. Is the following string considered legal?

[ { ( ] ) }

Note: Each type of brace opens and closes in the proper sequence. But
inter-brace opening and closing does not make sense.

Or must a closing brace always balance out with the most recent opening
brace like so?

[ { ( ) } ]

2. If there are multiple unclosed braces at EOF, is the answer you're
looking for the position of the first open brace that hasn't been closed
out yet?

Malcolm

[toc] | [prev] | [next] | [standalone]

#10044

From	Xah Lee <xahlee@gmail.com>
Date	2011-07-21 11:53 -0700
Message-ID	<262ea0fe-1152-42aa-9f5b-93aa76ed6c25@q29g2000prj.googlegroups.com>
In reply to	#10029

On Jul 21, 9:43 am, pyt...@bdurham.com wrote:
> Xah,
>
> 1. Is the following string considered legal?
>
> [ { ( ] ) }
>
> Note: Each type of brace opens and closes in the proper sequence. But
> inter-brace opening and closing does not make sense.

nu!

> Or must a closing brace always balance out with the most recent opening
> brace like so?
>
> [ { ( ) } ]

yeah!

> 2. If there are multiple unclosed braces at EOF, is the answer you're
> looking for the position of the first open brace that hasn't been closed
> out yet?

well, as many pointed out, i really haven't thought it out well.

originally, i just want to know the position of a un-matched char.

i haven't taken the time to think about what really should be the
desired behavior. For me, the problem started because i wanted to use
the script to check my 5k html files, in particular, classic novels
that involves double curly quotes and french quotes. So, the desired
behavior is one based on the question of what would best for the user
to see in order to correct a bracket mismatch error in a file. (which,
can get quite complex for nested syntax, because, usually, once you
have one missed, it's all hell from there. I think this is similar to
the problem when a compiler/interpreter encounters a bad syntax in
source code, and thus the poplar situation where error code of
computer programs are hard to understand...)

but anyway, just for this exercise, the requirement needn't be
stringent. I still think that at least the reported position should be
a matching char in the file. (and if we presume this, then only my
code works. LOL)

PS this is a warmup problem for writing a HTML tag validator. I looked
high and lo in past years, but just couldn't find a script that does
simple validation in batch. The w3c one is based on SGML, really huge
amount of un-unstandable irregular historical baggage. XML lexical
validator is much closer, but still not regular. I simply wanted one
just like the match-pair validator in our problem, except the opening
char is not a single char but string of the form <xyz …> and the
*matching* closing one is of the form </xyz>, and with just one
exception: when a tag has “/>” in ending such as <br/> then it is
skipped (i.e. not considered as opening or closing).

I'll be writing this soon in elisp… since i haven't studied parsers, i
had hopes that parser expert would show some proper parser solutions…
in particular i think such can be expressed in Parsing Expression
Grammar in just a few lines… but so far no deity came forward to show
the light. lol

getting ranty… it's funny, somehow the tech geekers all want regex to
solve the problem. Regex, regex, regex, a 40 years old deviant bastard
that by some twist of luck became a tool for matching text patterns.
One bloke came forward to show-off a perl regex obfuscation. That's
like, lol. But it might be good for the lulz if his code is actually
complete and worked. Then, you have a few who'd nonchalantly remark
“O, you just need push-down automata”. LOL, unless they show actual
working code, its Automata their asses.

folks, don't get angry with me. I'm a learner. I'm curious. I always
am eager to learn. And there's always things we can learn. Don't get
into a fit and do the troll dance in a pit with me. Nobody's gonna
give a shit if you think u knew it all. If u are not the master of one
thousand and one languages yet, you can learn with me. ☺ troll!!!!

 Xah

[toc] | [prev] | [next] | [standalone]

#10051

From	Terry Reedy <tjreedy@udel.edu>
Date	2011-07-21 18:37 -0400
Message-ID	<mailman.1335.1311287848.1164.python-list@python.org>
In reply to	#10044

On 7/21/2011 2:53 PM, Xah Lee wrote:

> had hopes that parser expert would show some proper parser solutions…
> in particular i think such can be expressed in Parsing Expression
> Grammar in just a few lines… but so far no deity came forward to show
> the light. lol

I am not a parser expert but 20 years ago, I wrote a program in C to 
analyze C programs for proper fence matching. My motivation was the 
often obsurity of parser error messages derived from mis-matched fences. 
I just found the printed copy and an article I wrote but did not get 
published.

Balance.c matches tokens, not characters (and hence can deal with /* and 
*/). It properly takes into account allowed nestings. For C, {[]} is 
legal, [{}] is not. Ditto for () instead of []. Nothing nests within '', 
"", and /* */. (I know some C compilers do nest /* */, but not the ones 
I used).

I initially started with a recursive descent parser but 1) this 
hard-coded the rules for one language and make changes difficult and 2) 
made the low-level parsing difficult. So I switched to a table-driven 
recursive state/action machine. The tables for your challenge would be 
much simpler as you did not specify any nesting rules, although they 
would be needed for html checking.

A key point that simplifies things a bit is that every file is 
surrounded by an unwritten BOF-EOF pair. So the machine starts with 
having 'seen' BOF and is 'looking' for EOF. So it is always looking to 
match *something*.

The total program is nearly four pages, but one page is mostly 
declarations and command-line processing, another two pages have 
typedefs, #DEFINEs, and tables. The actual main loop is about 25 lines, 
and 10 lines of that is error reporting. The output is lines with file 
name, row and columns of the two tokens matched (optional) or 
mismatched, and what the two tokens are.

Since this program would be a useful example for my book, both 
didactically and practically, I will try to brush-up a bit on C and 
translate it to Python. I will use the re module for some of the 
low-level token parsing, like C multibyte characters. I will then change 
to tables for Python and perhaps for your challenge.

The current program assumes ascii byte input at it uses an array of 
length 128 to classify ascii chars into 14 classes: 13 special for the 
matching and 1 'normal' class for everything else. This could be 
replaced in Python with a dict 'special' that only maps special 
characters to their token class and used as "special.get(char, NORMAL)" 
so that the thousands of normal characters are mapped by default to 
NORMAL without a humongous array.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#10244

From	John O'Hagan <research@johnohagan.com>
Date	2011-07-25 15:57 +1000
Message-ID	<mailman.1444.1311573438.1164.python-list@python.org>
In reply to	#10021

On Thu, 21 Jul 2011 05:58:48 -0700 (PDT)
Xah Lee <xahlee@gmail.com> wrote:

[...]

> > > On Sunday, July 17, 2011 2:48:42 AM UTC-7, Raymond Hettinger wrote:
> > >> On Jul 17, 12:47 am, Xah Lee <xah...@gmail.com> wrote:
> > >>> i hope you'll participate. Just post solution here. Thanks.
> >
> > >>http://pastebin.com/7hU20NNL
> >
> > > just installed py3.
> > > there seems to be a bug.
> > > in this file
> >
> > >http://xahlee.org/p/time_machine/tm-ch04.html
> >
> > > there's a mismatched double curly quote. at position 28319.
> >
> > > the python code above doesn't seem to spot it?

[...]

> >
> > That script doesn't check that the balance is zero at the end of file.
> >
> > Patch:
> >
> > --- ../xah-raymond-old.py       2011-07-19 20:05:13.000000000 +0200
> > +++ ../xah-raymond.py   2011-07-19 20:03:14.000000000 +0200
> > @@ -16,6 +16,8 @@
> >          elif c in closers:
> >              if not stack or c != stack.pop():
> >                  return i
> > +    if stack:
> > +        return i
> >      return -1
> >
> >  def scan(directory, encoding='utf-8'):
> 
> Thanks a lot for the fix Raymond.
> 
> Though, the code seems to have a minor problem.
> It works, but the report is wrong.
> e.g. output:
> 
> 30068: c:/Users/h3/web/xahlee_org/p/time_machine\tm-ch04.html
> 
> that 30068 position is the last char in the file.
> The correct should be 28319. (or at least point somewhere in the file
> at a bracket char that doesn't match.)
> 

[...]

If you want to know where brackets were opened which remain unclosed at EOF, then you have to keep the indices as well as the characters in the stack, and not return until the scan is complete, because anything still in the stack might turn out to be the earliest error. Easy enough to implement:

def checkmatch(string): #skipping the file handling
    openers = {'[': ']', '(': ')', '{': '}' } #etc
    closers = openers.values() 
    still_open, close_errors = [], []
    for index, char in enumerate(string, start=1):
        if char in openers:
            still_open.append((index, char))
        elif char in closers:
            if still_open and char == openers[still_open[-1][1]]:
                still_open.pop()
            else:
                close_errors.append((index, char))
    if still_open or close_errors:
        return min(still_open[:1] + close_errors[:1])[0]


although you might as well return still_open + close_errors and see them all.

Regards,

John

[toc] | [prev] | [next] | [standalone]

#9905

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2011-07-19 12:08 -0600
Message-ID	<mailman.1267.1311098919.1164.python-list@python.org>
In reply to	#9893

On Tue, Jul 19, 2011 at 10:54 AM, Xah Lee <xahlee@gmail.com> wrote:
> On Sunday, July 17, 2011 2:48:42 AM UTC-7, Raymond Hettinger wrote:
>> On Jul 17, 12:47 am, Xah Lee <xah...@gmail.com> wrote:
>> > i hope you'll participate. Just post solution here. Thanks.
>>
>> http://pastebin.com/7hU20NNL
>
> just installed py3.
> there seems to be a bug.
> in this file
>
> http://xahlee.org/p/time_machine/tm-ch04.html
>
> there's a mismatched double curly quote. at position 28319.
>
> the python code above doesn't seem to spot it?

It would appear that Raymond forgot to check that the stack is empty
at the end of the check_balance function.  It's an easy enough thing
to fix.

[toc] | [prev] | [next] | [standalone]

#9696

From	Chris Angelico <rosuav@gmail.com>
Date	2011-07-17 21:34 +1000
Message-ID	<mailman.1166.1310902484.1164.python-list@python.org>
In reply to	#9680

On Sun, Jul 17, 2011 at 5:47 PM, Xah Lee <xahlee@gmail.com> wrote:
> the problem is to write a script that can check a dir of text files
> (and all subdirs) and reports if a file has any mismatched matching
> brackets.

I wonder will it be possible to code the whole thing as a single
regular expression... I'm pretty sure it could be done as a sed or awk
script, but I'm insufficiently expert in either to do the job.

ChrisA

[toc] | [prev] | [next] | [standalone]

#9698

From	rusi <rustompmody@gmail.com>
Date	2011-07-17 04:52 -0700
Message-ID	<77ced9cc-824b-4cca-bbe7-d7075d17030d@g3g2000prf.googlegroups.com>
In reply to	#9696

On Jul 17, 4:34 pm, Chris Angelico <ros...@gmail.com> wrote:
> On Sun, Jul 17, 2011 at 5:47 PM, Xah Lee <xah...@gmail.com> wrote:
> > the problem is to write a script that can check a dir of text files
> > (and all subdirs) and reports if a file has any mismatched matching
> > brackets.
>
> I wonder will it be possible to code the whole thing as a single
> regular expression... I'm pretty sure it could be done as a sed or awk
> script, but I'm insufficiently expert in either to do the job.
>
> ChrisA

No possible: See http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_languages#Use_of_lemma

Informally stated as regexes cant count.

[toc] | [prev] | [next] | [standalone]

#9705

From	Thomas 'PointedEars' Lahn <PointedEars@web.de>
Date	2011-07-17 16:15 +0200
Message-ID	<2396442.Iyju66GlRV@PointedEars.de>
In reply to	#9696

Chris Angelico wrote:

> On Sun, Jul 17, 2011 at 5:47 PM, Xah Lee <xahlee@gmail.com> wrote:
>> the problem is to write a script that can check a dir of text files
>> (and all subdirs) and reports if a file has any mismatched matching
>> brackets.
> 
> I wonder will it be possible to code the whole thing as a single
> regular expression... I'm pretty sure it could be done as a sed or awk
> script, but I'm insufficiently expert in either to do the job.

Did you notice the excessive crosspost?  Please do not feed the troll.

In the classical sense is not possible, as classical regular expressions 
have no concept of recursion.  Indeed, matching brackets are a textbook 
example for a non-regular¹ context-free language L = {a^n b^n; n > 0} that 
can only be recognized by a pushdown automaton (PDA).  (Finite automata 
"cannot count".)

However, it is possible (and done) to use classical regular expressions or 
non-recursive Regular Expressions (note the different character case) to 
match tokens more efficiently with a PDA implementation.  This is commonly 
called a parser.  (Programming languages tend to be specified in terms of a 
context-free grammar – they tend to be context-free languages –, which is 
why a parser is a part of a compiler or interpreter.  See for example 
Python.²)

It is possible, with Perl-compatible Regular Expressions (PCRE), provided 
that you have enough memory, to use such an extended Regular Expression (not 
to be confused with EREs³)⁴:

  \((([^()]*|(?R))*)\)

However, even Python 3.2 does not support those expressions (although it 
supports some other PCRE patterns, like named subexpressions)⁵, neither do 
standard and forked versions of sed(1) (BREs, EREs, using an NFA) nor awk 
(EREs, using a DFA or NFA).  [That is not to say it would not be possible 
with Python, or sed or awk (both of which are off-topic here), but that more 
than a Regular Expression would be required.]

On this subject, I recommend reading the preview chapters of the second and 
third editions, respectively, of Jeffrey E. F. Friedl's "Mastering Regular 
Expressions", which are available online for free at O'Reilly.com⁶.

HTH.

______
¹ because it can be proved that the pumping lemma for regular languages
  does not apply to it; see also
  <http://en.wikipedia.org/wiki/Chomsky_hierarchy> pp.
² <http://docs.python.org/reference/>
³ <http://en.wikipedia.org/wiki/Regular_expression>
⁴ Cf. <http://php.net/manual/en/regexp.reference.recursive.php>
⁵ <http://docs.python.org/library/re.html>
⁶ <http://oreilly.com/catalog/regex/chapter/ch04.html>,
  <http://oreilly.com/catalog/9780596528126/preview#preview>
-- 
PointedEars

Bitte keine Kopien per E-Mail. / Please do not Cc: me.

[toc] | [prev] | [next] | [standalone]

#9735

From	Raymond Hettinger <python@rcn.com>
Date	2011-07-17 12:18 -0700
Message-ID	<fdb56b69-0d3a-4ed2-9ebf-e30b304ce775@q29g2000prj.googlegroups.com>
In reply to	#9705

On Jul 17, 7:15 am, Thomas 'PointedEars' Lahn <PointedE...@web.de>
wrote:
> Did you notice the excessive crosspost?  Please do not feed the troll.

IMO, this was a legitimate cross post since it is for a multi-language
programming challenge and everyone can learn from comparing the
results.


Raymond

[toc] | [prev] | [next] | [standalone]

#9748

From	Thomas 'PointedEars' Lahn <PointedEars@web.de>
Date	2011-07-17 22:16 +0200
Message-ID	<2917459.EXGvIs61Ab@PointedEars.de>
In reply to	#9735

Raymond Hettinger wrote:

> Thomas 'PointedEars' Lahn wrote:
>> Did you notice the excessive crosspost?  Please do not feed the troll.
> 
> IMO, this was a legitimate cross post since it is for a multi-language
> programming challenge and everyone can learn from comparing the
> results.

Even if so (which I seriously doubt, see also my sig), you cannot reasonably 
deny that "Xah Lee" is a well-known Usenet troll, and that this "challenge" 
is nothing more than yet another sophisticated attempt at trolling.  Please 
do not feed.

PointedEars
-- 
No article in the world is relevant to more than a small handful of
groups. If WW III is announced, it will be announced in
net.announce.important. -- Peter da Silva, bofh.cabal, "Usenet II rules"

[toc] | [prev] | [next] | [standalone]

#9751

From	Thomas Jollans <t@jollybox.de>
Date	2011-07-17 22:57 +0200
Message-ID	<mailman.1189.1310936266.1164.python-list@python.org>
In reply to	#9748

On 07/17/2011 10:16 PM, Thomas 'PointedEars' Lahn wrote:
> Raymond Hettinger wrote:
> 
>> Thomas 'PointedEars' Lahn wrote:
>>> Did you notice the excessive crosspost?  Please do not feed the troll.
>>
>> IMO, this was a legitimate cross post since it is for a multi-language
>> programming challenge and everyone can learn from comparing the
>> results.
> 
> Even if so (which I seriously doubt, see also my sig), you cannot reasonably 
> deny that "Xah Lee" is a well-known Usenet troll, and that this "challenge" 
> is nothing more than yet another sophisticated attempt at trolling.  Please 
> do not feed.

You know what you're doing? You're feeding the troll.

Yes, I know Xah Lee. Yes, he is known for trolling. So what? That alone
does not mean that every single thing he posts has to be bad. I'm all
with Raymond here.

There's nothing wrong with this post. This is the one time when it's
okay to feed the troll: reinforce good behaviour.

[toc] | [prev] | [next] | [standalone]

#9752

From	Thomas 'PointedEars' Lahn <PointedEars@web.de>
Date	2011-07-17 23:43 +0200
Message-ID	<3038846.NWcSDcC0EJ@PointedEars.de>
In reply to	#9705

Thomas 'PointedEars' Lahn wrote:

> It is possible [to parse the parentheses language], with Perl-compatible
> Regular Expressions (PCRE), provided that you have enough memory, to use
> such an extended Regular Expression (not to be confused with EREs³)⁴:
> 
>   \((([^()]*|(?R))*)\)
> 
> However, even Python 3.2 does not support those expressions (although it
> supports some other PCRE patterns, like named subexpressions)⁵, neither do
> standard and forked versions of sed(1) (BREs, EREs, using an NFA) nor awk
> (EREs, using a DFA or NFA).  [That is not to say it would not be possible
> with Python, or sed or awk (both of which are off-topic here), but that
> more than a Regular Expression would be required.]

Supplemental: Further research shows that the Python `re' module is not 
going to implement (PCRE) recursive Regular Expressions.  The maintainer's, 
Matthew Barnett's, argument (of 2009-03-24) is that such things are better 
left to modules such as `pyparsing' [1][2].

However, FWIW, here is the Python port of the start of a language parser 
originally written in (and for) ECMAScript:

import re

def matchingBraces(s):
    level = 0

    for match in re.finditer(r'[{}]', s):
        paren = match.group(0)

        if paren == "{":
            level += 1
        else:
            if level == 0: return False
            level -= 1

    return level == 0

As you can see, the theoretically necessary PDA¹ implementation can be 
simplified to a braces counter with range checks by iterative use of a 
Regular Expression.  Extensions to meet the "challenge" are left as an 
exercise to the reader.

It has also occurred to me that because parentheses (`(', `)') and brackets 
(`[', `]') have special meaning in Regular Expressions (grouping and 
character classes, respectively), you could escape all other special 
characters in a text and use the RE evaluator itself to find out whether 
they are properly nested, having it evaluate one large RE.  But I have not 
tested this idea yet.  (Obviously it cannot be used to satisfy the 
"challenge"'s condition that braces – `{', `}' – and other parenthesis-like 
characters are to be considered as well, and that the parenthesis-like 
characters are to be printed.)

______
¹ Pushdown automaton

References:
[1] <http://bugs.python.org/issue694374>
[2] <http://pyparsing.wikispaces.com/>
-- 
PointedEars

Bitte keine Kopien per E-Mail. / Please do not Cc: me.

[toc] | [prev] | [next] | [standalone]

#9784

From	Rouslan Korneychuk <rouslank@msn.com>
Date	2011-07-18 03:09 -0400
Message-ID	<S_QUp.4632$kI4.1159@newsfe02.iad>
In reply to	#9705

I don't know why, but I just had to try it (even though I don't usually 
use Perl and had to look up a lot of stuff). I came up with this:

/(?|
     (\()(?&matched)([\}\]”›»】〉》」』]|$) |
     (\{)(?&matched)([\)\]”›»】〉》」』]|$) |
     (\[)(?&matched)([\)\}”›»】〉》」』]|$) |
     (“)(?&matched)([\)\}\]›»】〉》」』]|$) |
     (‹)(?&matched)([\)\}\]”»】〉》」』]|$) |
     («)(?&matched)([\)\}\]”›】〉》」』]|$) |
     (【)(?&matched)([\)\}\]”›»〉》」』]|$) |
     (〈)(?&matched)([\)\}\]”›»】》」』]|$) |
     (《)(?&matched)([\)\}\]”›»】〉」』]|$) |
     (「)(?&matched)([\)\}\]”›»】〉》』]|$) |
     (『)(?&matched)([\)\}\]”›»】〉》」]|$))
(?(DEFINE)(?<matched>(?:
     \((?&matched)\) |
     \{(?&matched)\} |
     \[(?&matched)\] |
     “(?&matched)” |
     ‹(?&matched)› |
     «(?&matched)» |
     【(?&matched)】 |
     〈(?&matched)〉 |
     《(?&matched)》 |
     「(?&matched)」 |
     『(?&matched)』 |
     [^\(\{\[“‹«【〈《「『\)\}\]”›»】〉》」』]++)*+))
/sx;

If the pattern matches, there is a mismatched bracket. $1 is set to the 
mismatched opening bracket. $-[1] is its location. $2 is the mismatched 
closing bracket or '' if the bracket was never closed. $-[2] is set to 
the location of the closing bracket or the end of the string if the 
bracket wasn't closed.


I didn't write all that manually; it was generated with this:

my @open = ('\(','\{','\[','“','‹','«','【','〈','《','「','『');
my @close = ('\)','\}','\]','”','›','»','】','〉','》','」','』');

'(?|'.join('|',map 
{'('.$open[$_].')(?&matched)(['.join('',@close[0..($_-1),($_+1)..$#close]).']|$)'} 
(0 .. $#open)).')(?(DEFINE)(?<matched>(?:'.join('|',map 
{$open[$_].'(?&matched)'.$close[$_]} (0 .. 
$#open)).'|[^'.join('',@open,@close).']++)*+))'

[toc] | [prev] | [next] | [standalone]

#9785

From	Stefan Behnel <stefan_ml@behnel.de>
Date	2011-07-18 09:24 +0200
Message-ID	<mailman.1204.1310973876.1164.python-list@python.org>
In reply to	#9784

Rouslan Korneychuk, 18.07.2011 09:09:
> I don't know why, but I just had to try it (even though I don't usually use
> Perl and had to look up a lot of stuff). I came up with this:
>
> /(?|
> (\()(?&matched)([\}\]”›»】〉》」』]|$) |
> (\{)(?&matched)([\)\]”›»】〉》」』]|$) |
> (\[)(?&matched)([\)\}”›»】〉》」』]|$) |
> (“)(?&matched)([\)\}\]›»】〉》」』]|$) |
> (‹)(?&matched)([\)\}\]”»】〉》」』]|$) |
> («)(?&matched)([\)\}\]”›】〉》」』]|$) |
> (【)(?&matched)([\)\}\]”›»〉》」』]|$) |
> (〈)(?&matched)([\)\}\]”›»】》」』]|$) |
> (《)(?&matched)([\)\}\]”›»】〉」』]|$) |
> (「)(?&matched)([\)\}\]”›»】〉》』]|$) |
> (『)(?&matched)([\)\}\]”›»】〉》」]|$))
> (?(DEFINE)(?<matched>(?:
> \((?&matched)\) |
> \{(?&matched)\} |
> \[(?&matched)\] |
> “(?&matched)” |
> ‹(?&matched)› |
> «(?&matched)» |
> 【(?&matched)】 |
> 〈(?&matched)〉 |
> 《(?&matched)》 |
> 「(?&matched)」 |
> 『(?&matched)』 |
> [^\(\{\[“‹«【〈《「『\)\}\]”›»】〉》」』]++)*+))
> /sx;
>
> If the pattern matches, there is a mismatched bracket. $1 is set to the
> mismatched opening bracket. $-[1] is its location. $2 is the mismatched
> closing bracket or '' if the bracket was never closed. $-[2] is set to the
> location of the closing bracket or the end of the string if the bracket
> wasn't closed.
>
>
> I didn't write all that manually; it was generated with this:
>
> my @open = ('\(','\{','\[','“','‹','«','【','〈','《','「','『');
> my @close = ('\)','\}','\]','”','›','»','】','〉','》','」','』');
>
> '(?|'.join('|',map
> {'('.$open[$_].')(?&matched)(['.join('',@close[0..($_-1),($_+1)..$#close]).']|$)'}
> (0 .. $#open)).')(?(DEFINE)(?<matched>(?:'.join('|',map
> {$open[$_].'(?&matched)'.$close[$_]} (0 ..
> $#open)).'|[^'.join('',@open,@close).']++)*+))'


That's solid Perl. Both the code generator and the generated code are 
unreadable. Well done!

Stefan

[toc] | [prev] | [next] | [standalone]

#9789

From	Rouslan Korneychuk <rouslank@msn.com>
Date	2011-07-18 04:04 -0400
Message-ID	<IORUp.8380$nj1.2190@newsfe19.iad>
In reply to	#9785

On 07/18/2011 03:24 AM, Stefan Behnel wrote:
> That's solid Perl. Both the code generator and the generated code are
> unreadable. Well done!
>
> Stefan
>

Why, thank you.

[toc] | [prev] | [next] | [standalone]

#9813

From	Thomas 'PointedEars' Lahn <PointedEars@web.de>
Date	2011-07-18 18:46 +0200
Message-ID	<1918104.nKmheAe9J7@PointedEars.de>
In reply to	#9784

Rouslan Korneychuk wrote:

> I don't know why, but I just had to try it (even though I don't usually
> use Perl and had to look up a lot of stuff). I came up with this:

I don't know why … you replied to my posting/e-mail (but quoted nothing from 
it, much less referred to its content), and posted a lot of Perl code in a 
Python newsgroup/on a Python mailing list.

-- 
PointedEars

Bitte keine Kopien per E-Mail. / Please do not Cc: me.

[toc] | [prev] | [next] | [standalone]

#9822

From	Rouslan Korneychuk <rouslank@msn.com>
Date	2011-07-18 14:14 -0400
Message-ID	<QJ_Up.60753$5v5.4006@newsfe11.iad>
In reply to	#9813

On 07/18/2011 12:46 PM, Thomas 'PointedEars' Lahn wrote:
> Rouslan Korneychuk wrote:
>
>> I don't know why, but I just had to try it (even though I don't usually
>> use Perl and had to look up a lot of stuff). I came up with this:
>
> I don't know why … you replied to my posting/e-mail (but quoted nothing from
> it, much less referred to its content), and posted a lot of Perl code in a
> Python newsgroup/on a Python mailing list.
>

Well, when I said I had to try *it*, I was referring to using a Perl 
compatible regular expression, which you brought up. I guess I should 
have quoted that part. As for what I posted, the crux of it was a single 
regular expression. The Perl code at the bottom was just to point out 
that I didn't type that monstrosity out manually. I was going to put 
that part in brackets but there were already so many.

[toc] | [prev] | [next] | [standalone]

#10023

From	Xah Lee <xahlee@gmail.com>
Date	2011-07-21 06:23 -0700
Message-ID	<09e533d2-543f-4fb7-8355-a9c6d5635a97@f17g2000prf.googlegroups.com>
In reply to	#9784

2011-07-21

On Jul 18, 12:09 am, Rouslan Korneychuk <rousl...@msn.com> wrote:
> I don't know why, but I just had to try it (even though I don't usually
> use Perl and had to look up a lot of stuff). I came up with this:
>
> /(?|
>      (\()(?&matched)([\}\]”›»】〉》」』]|$) |
>      (\{)(?&matched)([\)\]”›»】〉》」』]|$) |
>      (\[)(?&matched)([\)\}”›»】〉》」』]|$) |
>      (“)(?&matched)([\)\}\]›»】〉》」』]|$) |
>      (‹)(?&matched)([\)\}\]”»】〉》」』]|$) |
>      («)(?&matched)([\)\}\]”›】〉》」』]|$) |
>      (【)(?&matched)([\)\}\]”›»〉》」』]|$) |
>      (〈)(?&matched)([\)\}\]”›»】》」』]|$) |
>      (《)(?&matched)([\)\}\]”›»】〉」』]|$) |
>      (「)(?&matched)([\)\}\]”›»】〉》』]|$) |
>      (『)(?&matched)([\)\}\]”›»】〉》」]|$))
> (?(DEFINE)(?<matched>(?:
>      \((?&matched)\) |
>      \{(?&matched)\} |
>      \[(?&matched)\] |
>      “(?&matched)” |
>      ‹(?&matched)› |
>      «(?&matched)» |
>      【(?&matched)】 |
>      〈(?&matched)〉 |
>      《(?&matched)》 |
>      「(?&matched)」 |
>      『(?&matched)』 |
>      [^\(\{\[“‹«【〈《「『\)\}\]”›»】〉》」』]++)*+))
> /sx;
>
> If the pattern matches, there is a mismatched bracket. $1 is set to the
> mismatched opening bracket. $-[1] is its location. $2 is the mismatched
> closing bracket or '' if the bracket was never closed. $-[2] is set to
> the location of the closing bracket or the end of the string if the
> bracket wasn't closed.
>
> I didn't write all that manually; it was generated with this:
>
> my @open = ('\(','\{','\[','“','‹','«','【','〈','《','「','『');
> my @close = ('\)','\}','\]','”','›','»','】','〉','》','」','』');
>
> '(?|'.join('|',map
> {'('.$open[$_].')(?&matched)(['.join('',@close[0..($_-1),($_+1)..$#close]). ']|$)'}
> (0 .. $#open)).')(?(DEFINE)(?<matched>(?:'.join('|',map
> {$open[$_].'(?&matched)'.$close[$_]} (0 ..
> $#open)).'|[^'.join('',@open,@close).']++)*+))'

Thanks for the code.

are you willing to make it complete and standalone? i.e. i can run it
like this:

perl Rouslan_Korneychuk.pl dirPath

and it prints any file that has mismatched pair and line/column number
or the char position?

i'd do it myself but so far i tried 5 codes, 3 fixes, all failed. Not
a complain, but it does take time to gather the code, of different
langs by different people, properly document their authors and
original source urls, etc, and test it out on my envirenment. All
together in the past 3 days i spent perhaps a total of 4 hours running
several code and writing back etc and so far not one really worked.

i know perl well, but your code is a bit out of the ordinary ☺. If
past days have been good experience, i might dive in and study for
fun.

 Xah

[toc] | [prev] | [next] | [standalone]

#10049

From	Rouslan Korneychuk <rouslank@msn.com>
Date	2011-07-21 17:54 -0400
Message-ID	<be1Wp.287583$lW4.64964@newsfe07.iad>
In reply to	#10023

On 07/21/2011 09:23 AM, Xah Lee wrote:
> Thanks for the code.
>
> are you willing to make it complete and standalone? i.e. i can run it
> like this:
>
> perl Rouslan_Korneychuk.pl dirPath
>
> and it prints any file that has mismatched pair and line/column number
> or the char position?
>

Since you asked, I put up a complete program at http://pastebin.com/d8GNL0kx

I don't know if it will run on Perl earlier than version 5.10 and I'm 
pretty sure it wont run below version 5.8.

Also, I realized that I had completely neglected the case of a closing 
bracket that is never opened (e.g. "stuff] stuff"). The program I put on 
paste bin has an updated regex that handles this case.

[toc] | [prev] | [next] | [standalone]

#9706

From	gene heskett <gheskett@wdtv.com>
Date	2011-07-17 10:26 -0400
Message-ID	<mailman.1170.1310912829.1164.python-list@python.org>
In reply to	#9680

On Sunday, July 17, 2011 10:12:27 AM Xah Lee did opine:

> 2011-07-16
> 
> folks, this one will be interesting one.
> 
> the problem is to write a script that can check a dir of text files
> (and all subdirs) and reports if a file has any mismatched matching
> brackets.
> 
> • The files will be utf-8 encoded (unix style line ending).
> 
> • If a file has mismatched matching-pairs, the script will display the
> file name, and the  line number and column number of the first
> instance where a mismatched bracket occures. (or, just the char number
> instead (as in emacs's “point”))
> 
> • the matching pairs are all single unicode chars. They are these and
> nothing else: () {} [] “” ‹› «» 【】 〈〉 《》 「」 『』
> Note that ‘single curly quote’ is not consider matching pair here.
> 
> • You script must be standalone. Must not be using some parser tools.
> But can call lib that's part of standard distribution in your lang.
> 
> Here's a example of mismatched bracket: ([)], (“[[”), ((, 】etc. (and
> yes, the brackets may be nested. There are usually text between these
> chars.)
> 
> I'll be writing a emacs lisp solution and post in 2 days. Ι welcome
> other lang implementations. In particular, perl, python, php, ruby,
> tcl, lua, Haskell, Ocaml. I'll also be able to eval common lisp
> (clisp) and Scheme lisp (scsh), Java. Other lang such as Clojure,
> Scala, C, C++, or any others, are all welcome, but i won't be able to
> eval it. javascript implementation will be very interesting too, but
> please indicate which and where to install the command line version.
> 
> I hope you'll find this a interesting “challenge”. This is a parsing
> problem. I haven't studied parsers except some Wikipedia reading, so
> my solution will probably be naive. I hope to see and learn from your
> solution too.
> 
> i hope you'll participate. Just post solution here. Thanks.
> 
>  Xah

This is a very old solution, some of it nearly 30 years old.
Written in C, the trick is in doing it in python I guess.
======================begin cntx.c=======================
/*^k^s
.ds2
.hb
.fb^k^s^b                     Cntx.c, page #^k^s^b
*****************************************************************
*                                                               *
*                       CC (C Checker)                          *
*                                                               *
*             C Source Brackets, Parenthesis, brace,            *
*                    quote and comment Checker                  *
*                                                               *
*                T. Jennings  -- Sometime in 1983               *
*                Slightly tweaked and renamed MINILINT          *
*                       KAB Aug 84                              *
*                Ported to OS9 and further tweaked              *
*                       Brian Paquette March 91                 *
*          Brackets, single, double quote counters added        *
*                   & renamed Cntx  04/09/91                    *
*                       by   Gene Heskett                       *
*                                                               *
*  some additional code for ignoring "syntax" chars inside of   *  
*      double quoted strings added 3/21/93 by Gene Heskett      *
*  same for single quoted stuffs 7/10/93 by Gene Heskett        *
* And long lines handling ability added too.                    *
* Adding tab ignorers and a counter to tally how many 11/21/93  *
****************************************************************/
#define OS9           /* used for nested comment handling*/
                      /* comment out for non OS9/6809*/

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#define  FALSE 0
#define  TRUE  1

#ifdef   OS9
#define  NO  " No "
#define  YES " Yes "
char *cmnt;
#endif

/* Very crude but very effective C source debugger. Counts the numbers of
matching braces, parentheses and comments, and displays them at the left 
edge of the screen. The best way to see what it does is to do it; try

        cntx -v cntx.c

Properly handles parens and braces inside comments; they are ignored.
Also ignores single quotes if doubles are odd number, so singles
can be used in a printf string for punctuation.  IF you see the doubles
are odd at line end (the printout tally), all bets are OFF! 
Enter cntx on the command line by itself for a usage note.
*/

main(argc,argv)
int argc;
char *argv[];
{
     FILE *f;
     char c,secnd_c,lastc;
     int bracket,parens,braces,squote,dquote,comments;
     int perr,bkerr,brerr,sqerr,dqerr;
     int verbose,okay;
     int filearg = 0;
     int line, col, tabc;

     if ((argc < 2)||(argc > 3)) getout(0);
     if (argc == 3)
     {
       verbose = TRUE;      /* already tested for -v switch  */
       if((argv[1][0] == '-') && (toupper(argv[1][1]) == 'V'))
         filearg = 2;       /*file name pointed to by argv[2] */
       if((argv[2][0] == '-') && (toupper(argv[2][1]) == 'V'))
         filearg = 1;
       if(!filearg) getout(192);
     }
     else
     {
       verbose = FALSE;
       filearg = 1;
     }
     if ((f = fopen(argv[filearg],"r")) == NULL)
     {
       fprintf(stderr,"Cntx: can't open '%s'\n",argv[1]);
       getout(216);
     }
     bracket= braces= parens= comments= squote= dquote= 0;
     perr= bkerr= brerr= sqerr= dqerr= 0;
     line=  col= tabc= 0;
     secnd_c= lastc= '\0';
     
     while ((c = getc(f)) != EOF)
     {
        while(c==0x09) /* ignore, but tally the count */
        {
           tabc+=1;
           c=getc(f);
        }

/* print running tally if in verbose mode and at beginning of line*/
/* OS9 version prints status of whether or not one is in a comment rather*/
/* than a count, as the Microware C compiler does not nest comments*/
     
       if ((col == 0) && verbose )
       {
#ifdef OS9
         if (comments)
           cmnt = YES;
         else cmnt = NO;
         printf("%d:   [%d]   {%d}   (%d)   \'%d\'   \"%d\"   /*%s*/ 
tabcnt=%d\n\n",
                      line,bracket,braces,parens,squote,dquote,cmnt,tabc);
#else
         printf("%d:   [%d]   {%d}   (%d)   \'%d\'   \"%d\"   /*%d*/\n\n",
                      line,bracket,braces,parens,squote,dquote,comments);
#endif
       }
     
/* additions to help tally squote & dquote errors at line end,
squotes and dquotes should match if we don't count those squotes
present when dquotes are odd number as in inside a printf or
puts statement.  Also if they are part of an escape sequence,
don't count */
     
       if (col == 0 && (squote % 2) ) ++sqerr;
       if (col == 0 && (dquote % 2) ) ++dqerr;
       if (col == 0 && bracket )     ++bkerr;

/* now clears the error to get back in step */
       if (col == 0) squote=dquote=0; 

/* Don't count parens and braces that are inside comments. This of course
assumes that comments are properly matched; in any case, that will be the
first thing to look for. */
     
       if (comments <= 0)
       {     /* 3/20/93, 7/10/93 taking sensitivity out of quoted stuffs */
		/* here, do ++dquote if its not a char constant like this 
*/
	 if ( c == '"' ) ++dquote; /* a little simpler */

         if ( !(dquote & 1) ) /* was the && of those */
         {
           if (c == '{' ) ++braces;
           if (c == '(' ) ++parens;
           if (lastc != '\'' && secnd_c == '[' && c != '\'' ) ++bracket;
/* here, skip squotes in a "text string's" */
           if ( secnd_c != '\\' && c== '\'' && !(dquote) ) ++squote;
           if ( lastc == '\\' && secnd_c == '\\' && c == '\'' ) ++squote;
           if (c == '}' ) --braces;
           if (c == ')' ) --parens;
           if (lastc != '\'' && secnd_c == ']' && c != '\'' ) --bracket;
         } 
       }

/* Now do comments. This properly handles nested comments;
whether or not the compiler does is your responsibility */

#ifdef OS9

/* The Microware C compiler for OS9 does NOT nest comments. */
/* The comment-close-mark (asterisk-backslash) will terminate */
/* (see K & R) a comment no matter how many '/*' come before it*/

       if ((c == '/') && (secnd_c == '*'))
         comments = 0;
       if ((c == '*') && (secnd_c == '/') && (comments == 0))
         ++comments;
#else
       if ( (c == '/' ) && (secnd_c == '*' ) ) --comments;
       if ( (c == '*' ) && (secnd_c == '/' ) ) ++comments;
#endif
       ++col;
       if (c == '\n' && secnd_c != '\\' )
       {            /* non-escaped newline == New Line */
         col= 0;                 /* set column 0 */
         ++line;
       }
       if (verbose)
         putchar(c);                 /* display text */
       lastc= secnd_c;                       /* update last char */
       secnd_c= c;
     }
     if (verbose)
     {
#ifdef OS9
       if (comments)
         cmnt = YES;
       else cmnt = NO;
       printf("EOF:   [%d]   {%d}   (%d)   \'%d\'   \"%d\"   /*%s*/\n", 
                bracket,braces,parens,squote,dquote,cmnt);
#else
       printf("EOF:   [%d]   {%d}   (%d)   \'%d\'   \"%d\"   /*%d*/\n", 
                bracket,braces,parens,squote,dquote,comments);
#endif
     }
     okay = TRUE;
     if (bracket||bkerr) puts("Unbalanced brackets\n"), okay = FALSE;
     if (braces) puts("Unbalanced braces\n"),okay = FALSE;
     if (parens) puts("Unbalanced parentheses\n"),okay = FALSE;
     if (sqerr||(squote%2)) puts("Unmatched single quotes\n"),okay=FALSE;
     if (dqerr||(dquote%2)) puts("Unmatched double quotes\n"),okay=FALSE;
     if (comments) puts("Unbalanced comments\n"),okay = FALSE;
     if (okay)  puts("No errors found\n");
}
getout(errex)
int    errex;
{
     fprintf(stderr,"Usage: Cntx [-v] <filename> [-v]\n");
     fprintf(stderr,"       -v = verbose mode \n");
     exit(errex);
}
=====================end cntx.c====================
=================begin cntx.hlp====================
This   "Cntx"  is based rather loosely on the  previously uploaded
file  called  MINILINT, in that if you use the -v option, it  will
show  you  the  file  and its report on a line by  line  basis  as
MINILINT  did.  Cntx however will also check for use and misuse of
more  of the usual "C" punctuation.  Its smart enough to ignore an
"escaped"  character,  or those buried in a text string  inside  a
printf("[[{{'etc");  statement.   The basic organization  is  from
"MINILINT",  but much expanded in checking scope.  It still is NOT
a "lint" which is why I didn't call it that, but it has turned out
to  be awfully handy. Ported to the Amiga, it found some stuff  in
the  code I was feeding DICE that I had totally missed, and  which
was  not being properly reported by DICE either, the errors it was
spitting  out  made  no  sense whatsoever. I had  somehow  lost  a
terminating  "}" in one of the PRINTFORM files in the  translation
to  a  C that required proto statements.  Cntx found it,  even  if
Dillons Integrated C Environments "dcc" didn't. But it still isn't
a "lint", not yet.

Usage: cntx [-v] filename

Without  the -v, it rapidly scans the whole source file and  gives
only  a final report of "no errors found" or "mismatched brackets",
etc.

Added 3/20/93 MEH: One more conditional test now causes it to skip
thru  any parens, braces or brackets found within a double  quoted
string  such as the format string for printf.  As the tally  needs
to  be  reset  at the start of a new line to  maintain  the  error
checking  phasing  in  case  there is an error, the  total  double
quote  count for the whole file is no longer kept. Only the  error
tally  now  shows at the end of a file scan.  So to see  the  line
with  the  error,  you  must use the -v option,  preferably  on  a
pauseing  screen so that one screen full of data can be seen at  a
time.   I liked the totals myself, but this does work better.  Now
Edition 4.

Added 7/10/93 MEH: Essentially the same as the above paragraph but
for  stuff  inside a pair of single quotes, so now *any* character
can be single quoted without being an error.  Now Edition 5.
===================end cntx.hlp=====================

Sometimes this list is hilarious with its re-inventions of the wheel. :)

The above code isn't the final version, I had it running on linux too, but 
one of fedoras (or W.D.s pissy drives) infamous crashes caused that version 
to come up missing, forgotten when I was doing the salvage operation to a 
new hard drive.

Cheers, gene
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
A holding company is a thing where you hand an accomplice the goods while
the policeman searches you.

[toc] | [prev] | [next] | [standalone]

Page 2 of 4 — ← Prev page 1 [2] 3 4 Next page →

csiph-web

a little parsing challenge ☺

Contents

#10029

#10044

#10051

#10244

#9905

#9696

#9698

#9705

#9735

#9748

#9751

#9752

#9784

#9785

#9789

#9813

#9822

#10023

#10049

#9706