Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #22863 > unrolled thread
| Started by | qwertmonkey@syberianoutpost.ru |
|---|---|
| First post | 2013-03-10 02:27 +0000 |
| Last post | 2013-03-10 11:16 -0700 |
| Articles | 20 on this page of 25 — 10 participants |
Back to article view | Back to comp.lang.java.programmer
regexp(ing) Backus-Naurish expressions ... qwertmonkey@syberianoutpost.ru - 2013-03-10 02:27 +0000
Re: regexp(ing) Backus-Naurish expressions ... Arne Vajhøj <arne@vajhoej.dk> - 2013-03-09 21:33 -0500
Re: regexp(ing) Backus-Naurish expressions ... Joshua Cranmer 🐧 <Pidgeot18@verizon.invalid> - 2013-03-09 21:00 -0600
Re: regexp(ing) Backus-Naurish expressions ... Leif Roar Moldskred <leifm@dimnakorr.com> - 2013-03-09 23:33 -0600
Re: regexp(ing) Backus-Naurish expressions ... lipska the kat <"nospam at neversurrender dot co dot uk"> - 2013-03-10 10:27 +0000
Re: regexp(ing) Backus-Naurish expressions ... Martin Gregorie <martin@address-in-sig.invalid> - 2013-03-10 12:55 +0000
Re: regexp(ing) Backus-Naurish expressions ... Roedy Green <see_website@mindprod.com.invalid> - 2013-03-10 07:57 -0700
Re: regexp(ing) Backus-Naurish expressions ... Robert Klemme <shortcutter@googlemail.com> - 2013-03-10 22:39 +0100
Re: regexp(ing) Backus-Naurish expressions ... Roedy Green <see_website@mindprod.com.invalid> - 2013-03-10 15:54 -0700
Re: regexp(ing) Backus-Naurish expressions ... Robert Klemme <shortcutter@googlemail.com> - 2013-03-11 21:03 +0100
Re: regexp(ing) Backus-Naurish expressions ... Joshua Cranmer 🐧 <Pidgeot18@verizon.invalid> - 2013-03-11 17:00 -0500
Re: regexp(ing) Backus-Naurish expressions ... Eric Sosman <esosman@comcast-dot-net.invalid> - 2013-03-11 18:31 -0400
Re: regexp(ing) Backus-Naurish expressions ... Arne Vajhøj <arne@vajhoej.dk> - 2013-03-11 18:40 -0400
Re: regexp(ing) Backus-Naurish expressions ... Eric Sosman <esosman@comcast-dot-net.invalid> - 2013-03-11 21:39 -0400
Re: regexp(ing) Backus-Naurish expressions ... Martin Gregorie <martin@address-in-sig.invalid> - 2013-03-11 23:06 +0000
Re: regexp(ing) Backus-Naurish expressions ... Joshua Cranmer 🐧 <Pidgeot18@verizon.invalid> - 2013-03-11 20:56 -0500
Re: regexp(ing) Backus-Naurish expressions ... Arne Vajhøj <arne@vajhoej.dk> - 2013-03-11 22:06 -0400
Re: regexp(ing) Backus-Naurish expressions ... Eric Sosman <esosman@comcast-dot-net.invalid> - 2013-03-12 09:30 -0400
Re: regexp(ing) Backus-Naurish expressions ... Roedy Green <see_website@mindprod.com.invalid> - 2013-03-10 16:24 -0700
Re: regexp(ing) Backus-Naurish expressions ... Robert Klemme <shortcutter@googlemail.com> - 2013-03-11 21:08 +0100
Re: regexp(ing) Backus-Naurish expressions ... Arne Vajhøj <arne@vajhoej.dk> - 2013-03-11 16:59 -0400
Re: regexp(ing) Backus-Naurish expressions ... Robert Klemme <shortcutter@googlemail.com> - 2013-03-11 22:24 +0100
Re: regexp(ing) Backus-Naurish expressions ... Robert Klemme <shortcutter@googlemail.com> - 2013-03-11 21:00 +0100
Re: regexp(ing) Backus-Naurish expressions ... Robert Klemme <shortcutter@googlemail.com> - 2013-03-13 08:07 +0100
Re: regexp(ing) Backus-Naurish expressions ... markspace <markspace@nospam.nospam> - 2013-03-10 11:16 -0700
Page 1 of 2 [1] 2 Next page →
| From | qwertmonkey@syberianoutpost.ru |
|---|---|
| Date | 2013-03-10 02:27 +0000 |
| Subject | regexp(ing) Backus-Naurish expressions ... |
| Message-ID | <khgr2k$u3b$1@speranza.aioe.org> |
I need to set up some code's running context via properties files and I want
to make sure that users don't get too playful messing with them, because that
could alter results greatly and in unexpected ways (they must probably won't
be able to make sense of and then they would bother the hell out of you)
~
So, I must do some sanity check the running parameters if entered via the
command prompt or if the defaults are used from the properties files
~
I am telling you all of that because you many know of libraries to do such
thing
~
I think one possible way to do that is via a regexp, which should match all
the options included in the test array aISAr
~
One of the problems I am having is that if you enter as options say [true|t],
the matcher would match just the "t" of "true" and I want for "true" to be
actually matched another one is that, say, " true ", should be matched, as well
as "false [ nix |mac| windows ] line.separator" ...
~
Any ideas you would share?
~
thanks,
lbrtchx
~
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ TEST CODE ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
import java.util.regex.Matcher;
import java.util.regex.Pattern;
// __
public class RegexMatches02Test{
// __
public static void main( String args[] ){
String aRegEx;
String aIS;
Pattern Ptrn;
Matcher Mtchr;
int iCnt, iMtxStart, iMtxEnd;
// __
aRegEx = "^\\s*[true|false|t|f]{1}\\s*\\[";
aRegEx = "^\\s*[true|false|t|f]{1}";
aRegEx = "^\\s*[true|false|t|f]{1}\\s*";
aRegEx = "^\\s*[true|false t|f]{1}\\s*";
// __
String[] aISAr = new String[]{
" true[a|b |c ] q"
, " true [a|b |c ] q"
, "true [a|b |c ] q"
, "true[a|b|c] b"
, "true[a|b|c]q"
, "False[ y | n | q ] q"
, "false[nix|windows|mac]line.separator"
, "false [ nix |mac| windows ] line.separator"
, "T[y|n]q"
, "T[y]"
, "false"
, "faLse"
, "true"
, "TrUe"
, "F"
, "T"
};
int iISArL = aISAr.length, i = 0;
// __
boolean IsLoop;
Ptrn = Pattern.compile(aRegEx, Pattern.CASE_INSENSITIVE);
System.err.println("// __ matching pattern: |" + aRegEx + "|");
Mtchr = Ptrn.matcher(aISAr[i]); // get a matcher object
IsLoop = (i < iISArL);
while(IsLoop){
System.err.println("// __ |" + i + "|" + aISAr[i] + "|");
iCnt = 0;
// __
while(Mtchr.find()){
iMtxStart = Mtchr.start();
iMtxEnd = Mtchr.end();
System.err.println("|" + iCnt + "|" + iMtxStart + "|" + iMtxEnd + "|" +
aISAr[i].substring(iMtxStart, iMtxEnd) + "|");
++iCnt;
}// (Mtchr.find())
System.err.println("~");
// __
++i;
IsLoop = (i < iISArL);
if(IsLoop){ Mtchr.reset(aISAr[i]); }
}// while(IsLoop)
}
}
[toc] | [next] | [standalone]
| From | Arne Vajhøj <arne@vajhoej.dk> |
|---|---|
| Date | 2013-03-09 21:33 -0500 |
| Message-ID | <513bf104$0$32108$14726298@news.sunsite.dk> |
| In reply to | #22863 |
On 3/9/2013 9:27 PM, qwertmonkey@syberianoutpost.ru wrote: > I need to set up some code's running context via properties files and I want > to make sure that users don't get too playful messing with them, because that > could alter results greatly and in unexpected ways (they must probably won't > be able to make sense of and then they would bother the hell out of you) > ~ > So, I must do some sanity check the running parameters if entered via the > command prompt or if the defaults are used from the properties files > ~ > I am telling you all of that because you many know of libraries to do such > thing > ~ > I think one possible way to do that is via a regexp, which should match all > the options included in the test array aISAr > ~ > One of the problems I am having is that if you enter as options say [true|t], > the matcher would match just the "t" of "true" and I want for "true" to be > actually matched another one is that, say, " true ", should be matched, as well > as "false [ nix |mac| windows ] line.separator" ... > ~ > Any ideas you would share? I would do it as: - switch from properties to XML - define a schema for the XML with strict restrictions on data - let the application parse that with a validating parser and read it into some config object, this will ensure that required information is there and that the data types are correct - let the application apply business validation rules in Java code on the config objects - this will ensure that the various information is consistent Arne
[toc] | [prev] | [next] | [standalone]
| From | Joshua Cranmer 🐧 <Pidgeot18@verizon.invalid> |
|---|---|
| Date | 2013-03-09 21:00 -0600 |
| Message-ID | <khgst1$tjp$1@dont-email.me> |
| In reply to | #22863 |
On 3/9/2013 8:27 PM, qwertmonkey@syberianoutpost.ru wrote: > One of the problems I am having is that if you enter as options say [true|t], > the matcher would match just the "t" of "true" and I want for "true" to be > actually matched another one is that, say, " true ", should be matched, as well > as "false [ nix |mac| windows ] line.separator" ... Do you know the syntax of Java's regular expressions? See <http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html>. In short, anything contained within square brackets is considered to be a set of characters to match on, so [true|t] succeeds if the character it's matching against is a t, r, u, e, or |. The syntax you probably wanted was (true|t), which would either match the string "true" or the string "t". -- Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth
[toc] | [prev] | [next] | [standalone]
| From | Leif Roar Moldskred <leifm@dimnakorr.com> |
|---|---|
| Date | 2013-03-09 23:33 -0600 |
| Message-ID | <kvOdnR7MkYC2hqHMnZ2dnUVZ8gGdnZ2d@giganews.com> |
| In reply to | #22863 |
qwertmonkey@syberianoutpost.ru wrote:
>
> I think one possible way to do that is via a regexp, which should match all
> the options included in the test array aISAr
> ~
> One of the problems I am having is that if you enter as options say [true|t],
> the matcher would match just the "t" of "true" and I want for "true" to be
> actually matched another one is that, say, " true ", should be matched, as well
> as "false [ nix |mac| windows ] line.separator" ...
> ~
> Any ideas you would share?
When working with regular expressions you should always remember that
you don't need to do everything in a single expression. There's no law
against splitting things up into sub-expressions or using "boring old
code" for parts of the match.
You should also bear in mind that some parsing tasks are just not
suited to regular expressions and if the regular expression starts
getting complicated you should consider if the task might be solved
more easily with another approach.
Here, assuming I've understood the problem right, I might do something
as below (I'm not on my development computer, so note that this has
not been checked for errors):
Set<String> VALID_FIRST_WORDS = toSet( "true", "false", "t", "f" );
String WORD = "(\\w+)";
String BRACKETED_WORD = "(\\[([^]])+\\])";
Pattern LINE_MATCH = Pattern.compile( WORD + "\\s*" +
BRACKETED_WORD + "?\\s+" + WORD + "?" );
boolean validLine( String inputLine ) {
String line = inputLine.toLowerCase().trim();
Matcher matcher = LINE_MATCH.matcher( line );
if( matcher.matches() ) {
String firstWord = matcher.group(1);
// Not .group(2) as that would include the brackets.
String bracketedWord = matcher.group(3).trim();
String lastWord = matcher.group(4);
return firstValid( firstWord ) &&
bracketedValid( firstWord, bracketedWord ) &&
lastValid( firstWord, bracketedWorld, lastWord );
}
return false;
}
boolean firstValid( String firstWord ) {
// Alternatively, use a HashSet
switch( firstWord ) {
case "true" : /* Fall through */
case "t" : /* Fall through */
case "false" : /* Fall through */
case "f" : return true;
default : return false;
}
}
// This is assuming the valid values of the bracketed
// expression depends on what the first word was
Map<String, Set<String>> LEGAL_BRACKETED = ...;
boolean bracketedValid( String firstWord, String bracketed ) {
if( bracketed == null ) {
return true;
}
Set<String> legalBracketed = LEGAL_BRACKETED.get( firstWord );
return legalBracketed != null &&
legalBracketed.contains( bracketed );
}
boolean lastValid( String first, String bracketed, String last ) {
if( bracketed == null && last == null ) {
return true;
}
// Implementation depends on the particulars of when certain
// last words are valid and when not.
...
}
--
Leif Roar Moldskred
[toc] | [prev] | [next] | [standalone]
| From | lipska the kat <"nospam at neversurrender dot co dot uk"> |
|---|---|
| Date | 2013-03-10 10:27 +0000 |
| Message-ID | <ybednSpj74C-_aHMnZ2dnUVZ7t6dnZ2d@bt.com> |
| In reply to | #22863 |
On 10/03/13 02:27, qwertmonkey@syberianoutpost.ru wrote: > I need to set up some code's running context via properties files and I want > to make sure that users don't get too playful messing with them, because that > could alter results greatly and in unexpected ways (they must probably won't > be able to make sense of and then they would bother the hell out of you) > ~ > So, I must do some sanity check the running parameters if entered via the > command prompt or if the defaults are used from the properties files > ~ > I am telling you all of that because you many know of libraries to do such > thing Not sure if this is what you are after as I've never used it myself but http://commons.apache.org/proper/commons-cli/ might help. I've always had a lot of luck with other commons stuff. lipska -- Lipska the Kat©: Troll hunter, sandbox destroyer and farscape dreamer of Aeryn Sun
[toc] | [prev] | [next] | [standalone]
| From | Martin Gregorie <martin@address-in-sig.invalid> |
|---|---|
| Date | 2013-03-10 12:55 +0000 |
| Message-ID | <khhvrq$jrb$1@localhost.localdomain> |
| In reply to | #22863 |
On Sun, 10 Mar 2013 02:27:32 +0000, qwertmonkey wrote: > I need to set up some code's running context via properties files and I > want to make sure that users don't get too playful messing with them, > because that could alter results greatly and in unexpected ways (they > must probably won't be able to make sense of and then they would bother > the hell out of you) > I wrote my own extended Java equivalent of the C getopt() function, which separates arguments from options, accepts both short form (-h) and long form (--help) and allows you to specify whether an option must never, may, or must have an associated value, which may be written as -xvalue -x value -x=value in short form or --opt=value --opt value in long form. Option validity and value presence are checked by the parser but both argument checks and option value checks are left for the calling code. This is implemented as the ArgParser class in my environ.jar library and can be found at: http://sourceforge.net/projects/cdocumenter/files/cdocumenter/ environment/ and is fully documented in javadoc comments at class and method level. -- martin@ | Martin Gregorie gregorie. | Essex, UK org |
[toc] | [prev] | [next] | [standalone]
| From | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| Date | 2013-03-10 07:57 -0700 |
| Message-ID | <vo7pj8p9b0rv8pfkp64902inolmml9vm01@4ax.com> |
| In reply to | #22863 |
On Sun, 10 Mar 2013 02:27:32 +0000 (UTC), qwertmonkey@syberianoutpost.ru wrote, quoted or indirectly quoted someone who said : > Any ideas you would share? Regexes are quite limited. When you bang into their limits you can write a finite state machine or use a parser. see http://mindprod.com/jgloss/parser.html http://mindprod.com/jgloss/finitestate.html -- Roedy Green Canadian Mind Products http://mindprod.com Software gets slower faster than hardware gets faster. ~ Niklaus Wirth (born: 1934-02-15 age: 79) Wirth's Law
[toc] | [prev] | [next] | [standalone]
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Date | 2013-03-10 22:39 +0100 |
| Message-ID | <aq4cssFsm5rU1@mid.individual.net> |
| In reply to | #22873 |
On 10.03.2013 15:57, Roedy Green wrote: > On Sun, 10 Mar 2013 02:27:32 +0000 (UTC), > qwertmonkey@syberianoutpost.ru wrote, quoted or indirectly quoted > someone who said : > >> Any ideas you would share? > > Regexes are quite limited. I beg to differ: it's amazing what you can do with them. Especially modern RX engines are usually much more powerful than those needed for the class of regular languages. > When you bang into their limits you can > write a finite state machine or use a parser. What limitations would make me want to write a FSM instead by hand? Cheers robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
[toc] | [prev] | [next] | [standalone]
| From | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| Date | 2013-03-10 15:54 -0700 |
| Message-ID | <ke3qj8daj46td8g3lklac14mtr56na4rv4@4ax.com> |
| In reply to | #22875 |
Examples where regexes run out of steam: parsing Java, HTML, BAT language ... to do syntax colouring. screen scraping, where what you want can appear in arbiter orders, be missing, or enclosed in a variety of delimiters. creating code to simulate the output of forms. You have to do it in stages. You pick out a string then you pick out strings of that -- Roedy Green Canadian Mind Products http://mindprod.com Software gets slower faster than hardware gets faster. ~ Niklaus Wirth (born: 1934-02-15 age: 79) Wirth's Law
[toc] | [prev] | [next] | [standalone]
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Date | 2013-03-11 21:03 +0100 |
| Message-ID | <aq6rkjFf1tsU2@mid.individual.net> |
| In reply to | #22877 |
On 10.03.2013 23:54, Roedy Green wrote: > Examples where regexes run out of steam: I never said you can do anything with regexps. You said they are "quite limited" to which I responded "I beg to differ: it's amazing what you can do with them." I think you are talking completely past me. > parsing Java, HTML, BAT language ... to do syntax colouring. For that you need a context free parser anyway and would not create a FSM by hand. > screen scraping, where what you want can appear in arbiter orders, be > missing, or enclosed in a variety of delimiters. Still, I haven't seen a single reason to create a FSM by hand. > creating code to simulate the output of forms. You have to do it in > stages. You pick out a string then you pick out strings of that Regexps are for _parsing_ and not for _generating_. Cheers robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
[toc] | [prev] | [next] | [standalone]
| From | Joshua Cranmer 🐧 <Pidgeot18@verizon.invalid> |
|---|---|
| Date | 2013-03-11 17:00 -0500 |
| Message-ID | <khlk24$bqu$1@dont-email.me> |
| In reply to | #22877 |
On 3/10/2013 5:54 PM, Roedy Green wrote:
> Examples where regexes run out of steam:
> parsing Java, HTML, BAT language ... to do syntax colouring.
Actually, all of those examples fall under the category of lexing, which
is very easy to do with regular expressions; the python equivalent of
flex uses regular expressions internally to do the lexing. Basically,
what you'd have to do is this:
1. For each token, compute the regex that matches the token and enclose
it in a named capturing group
2. Combine the token regexes into a single regex using disjunctions
3. Run the large regex on the input string by continually finding
matches until it runs out of them.
4. For each match, use the named capturing group to do actions for that
part of the input string.
> screen scraping, where what you want can appear in arbiter orders, be
> missing, or enclosed in a variety of delimiters.
([()<>,:;@])|(?:[^\\"]|\\.)*|\[(?:[^\\\]]|\\.)*\]|(?:\\.|[^
\t\r\n()<>,:;@["])+
That is an example of a production regular expression I use specifically
for tokenizing. Note in particular that I am matching two separate kinds
of string literals ("foo" and [foo]). The hard part here is that I'm
dealing with an idiot language that made comment-parsing context-free,
but I decided to say "to hell with this" and ignore that fact, banking
that it's a rare edge case I never have to deal with.
Granted, such large regular expressions can become extremely unwieldly
(said regex is actually composed out of about five lines of code plus
detailed comments above each part explaining what it does), but it's
still very simple to do in a regex.
--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth
[toc] | [prev] | [next] | [standalone]
| From | Eric Sosman <esosman@comcast-dot-net.invalid> |
|---|---|
| Date | 2013-03-11 18:31 -0400 |
| Message-ID | <khllr5$mg9$1@dont-email.me> |
| In reply to | #22904 |
On 3/11/2013 6:00 PM, Joshua Cranmer 🐧 wrote:
> [...]
> ([()<>,:;@])|(?:[^\\"]|\\.)*|\[(?:[^\\\]]|\\.)*\]|(?:\\.|[^
> \t\r\n()<>,:;@["])+
>
> That is an example of a production regular expression I use specifically
> for tokenizing. [...]
As Ed Post noted nearly thirty years ago:
It has been observed that a TECO command sequence
more closely resembles transmission line noise
than readable text.
-- "Real Programmers Don't Use PASCAL"
Nobody I know of uses TECO any more, but regexes satisfy
people's craving for gibberish.
--
Eric Sosman
esosman@comcast-dot-net.invalid
[toc] | [prev] | [next] | [standalone]
| From | Arne Vajhøj <arne@vajhoej.dk> |
|---|---|
| Date | 2013-03-11 18:40 -0400 |
| Message-ID | <513e5d5a$0$32110$14726298@news.sunsite.dk> |
| In reply to | #22908 |
On 3/11/2013 6:31 PM, Eric Sosman wrote: > On 3/11/2013 6:00 PM, Joshua Cranmer 🐧 wrote: >> [...] >> ([()<>,:;@])|(?:[^\\"]|\\.)*|\[(?:[^\\\]]|\\.)*\]|(?:\\.|[^ >> \t\r\n()<>,:;@["])+ >> >> That is an example of a production regular expression I use specifically >> for tokenizing. [...] > > As Ed Post noted nearly thirty years ago: > > It has been observed that a TECO command sequence > more closely resembles transmission line noise > than readable text. > -- "Real Programmers Don't Use PASCAL" > > Nobody I know of uses TECO any more, but regexes satisfy > people's craving for gibberish. $ edit/teco z.z %Can't find file "Z.Z" %Creating new file *ex$$ :-) (sorry - the only thing I know about TECO is how to exit) Arne
[toc] | [prev] | [next] | [standalone]
| From | Eric Sosman <esosman@comcast-dot-net.invalid> |
|---|---|
| Date | 2013-03-11 21:39 -0400 |
| Message-ID | <khm0r4$aqb$1@dont-email.me> |
| In reply to | #22910 |
On 3/11/2013 6:40 PM, Arne Vajhøj wrote:
> On 3/11/2013 6:31 PM, Eric Sosman wrote:
>>[...]
>> Nobody I know of uses TECO any more, but regexes satisfy
>> people's craving for gibberish.
>
> $ edit/teco z.z
> %Can't find file "Z.Z"
> %Creating new file
> *ex$$
>
> :-)
>
> (sorry - the only thing I know about TECO is how to exit)
Perhaps the most important lesson of all! ;-)
--
Eric Sosman
esosman@comcast-dot-net.invalid
[toc] | [prev] | [next] | [standalone]
| From | Martin Gregorie <martin@address-in-sig.invalid> |
|---|---|
| Date | 2013-03-11 23:06 +0000 |
| Message-ID | <khlo1h$jaf$1@localhost.localdomain> |
| In reply to | #22904 |
On Mon, 11 Mar 2013 22:28:42 +0000, Stefan Ram wrote: > =?UTF-8?B?Sm9zaHVhIENyYW5tZXIg8J+Qpw==?= <Pidgeot18@verizon.invalid> > writes: >>On 3/10/2013 5:54 PM, Roedy Green wrote: >>>parsing Java >>Actually, all of those examples fall under the category of lexing, > > Parsing is not lexing, usually parsing comes after lexing. When I need to do that in Java I use the Coco/R parser generator. It generates both lexer and parser and IMO is more understandable than the classic C equivalent (Lex + Yacc or Flax + Bison), at least partly because its easy to modify or extend the framework it runs in and the generated code is fairly readable. -- martin@ | Martin Gregorie gregorie. | Essex, UK org |
[toc] | [prev] | [next] | [standalone]
| From | Joshua Cranmer 🐧 <Pidgeot18@verizon.invalid> |
|---|---|
| Date | 2013-03-11 20:56 -0500 |
| Message-ID | <khm1ti$el1$1@dont-email.me> |
| In reply to | #22904 |
On 3/11/2013 5:28 PM, Stefan Ram wrote: > =?UTF-8?B?Sm9zaHVhIENyYW5tZXIg8J+Qpw==?= <Pidgeot18@verizon.invalid> writes: >> On 3/10/2013 5:54 PM, Roedy Green wrote: >>> parsing Java >> Actually, all of those examples fall under the category of lexing, > > Parsing is not lexing, usually parsing comes after lexing. > I agree, but Roedy wrote: parsing Java, HTML, BAT language ... to do syntax colouring. Syntax coloring generally requires nothing more than lexing the input to figure which tokens are which. -- Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth
[toc] | [prev] | [next] | [standalone]
| From | Arne Vajhøj <arne@vajhoej.dk> |
|---|---|
| Date | 2013-03-11 22:06 -0400 |
| Message-ID | <513e8d96$0$32110$14726298@news.sunsite.dk> |
| In reply to | #22920 |
On 3/11/2013 9:56 PM, Joshua Cranmer 🐧 wrote: > On 3/11/2013 5:28 PM, Stefan Ram wrote: >> =?UTF-8?B?Sm9zaHVhIENyYW5tZXIg8J+Qpw==?= <Pidgeot18@verizon.invalid> >> writes: >>> On 3/10/2013 5:54 PM, Roedy Green wrote: >>>> parsing Java >>> Actually, all of those examples fall under the category of lexing, >> >> Parsing is not lexing, usually parsing comes after lexing. >> > > I agree, but Roedy wrote: > parsing Java, HTML, BAT language ... to do syntax colouring. > > Syntax coloring generally requires nothing more than lexing the input to > figure which tokens are which. Some languages are tricky. C# has contextual keywords. dynamic dynamic; is a valid declaration and the first is a keyword and the second is a name. Arne
[toc] | [prev] | [next] | [standalone]
| From | Eric Sosman <esosman@comcast-dot-net.invalid> |
|---|---|
| Date | 2013-03-12 09:30 -0400 |
| Message-ID | <khnahl$trr$1@dont-email.me> |
| In reply to | #22920 |
On 3/11/2013 9:56 PM, Joshua Cranmer 🐧 wrote:
> On 3/11/2013 5:28 PM, Stefan Ram wrote:
>> =?UTF-8?B?Sm9zaHVhIENyYW5tZXIg8J+Qpw==?= <Pidgeot18@verizon.invalid>
>> writes:
>>> On 3/10/2013 5:54 PM, Roedy Green wrote:
>>>> parsing Java
>>> Actually, all of those examples fall under the category of lexing,
>>
>> Parsing is not lexing, usually parsing comes after lexing.
>>
>
> I agree, but Roedy wrote:
> parsing Java, HTML, BAT language ... to do syntax colouring.
>
> Syntax coloring generally requires nothing more than lexing the input to
> figure which tokens are which.
Is that how the NetBeans editor knows to display local
variables in black but class and instance fields in green?
;-)
--
Eric Sosman
esosman@comcast-dot-net.invalid
[toc] | [prev] | [next] | [standalone]
| From | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| Date | 2013-03-10 16:24 -0700 |
| Message-ID | <g95qj8t80ona2h7ut8msl3hdomg10d40j6@4ax.com> |
| In reply to | #22875 |
On Sun, 10 Mar 2013 22:39:22 +0100, Robert Klemme <shortcutter@googlemail.com> wrote, quoted or indirectly quoted someone who said : >What limitations would make me want to write a FSM instead by hand? Compacting out nugatory space in HTML would be another example. Though they are quite complicated, I find FSMs very easy to write, and they almost always work first time. You can narrow your thinking to a tiny case and ignore the big picture quite safely. In contrast, I find my regexes (of any complexity) nearly always have some unexpected behaviour, often than does not show up immediately. The other complicating factor is I use three different regex schemes in a day: Java, Funduc and SlickEdit. I keep borrowing syntax from one of the other schemes than the one I am using. Some day I will have to write replacements that use Java syntax. -- Roedy Green Canadian Mind Products http://mindprod.com Software gets slower faster than hardware gets faster. ~ Niklaus Wirth (born: 1934-02-15 age: 79) Wirth's Law
[toc] | [prev] | [next] | [standalone]
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Date | 2013-03-11 21:08 +0100 |
| Message-ID | <aq6rtkFf5ceU1@mid.individual.net> |
| In reply to | #22878 |
On 11.03.2013 00:24, Roedy Green wrote: > On Sun, 10 Mar 2013 22:39:22 +0100, Robert Klemme > <shortcutter@googlemail.com> wrote, quoted or indirectly quoted > someone who said : > >> What limitations would make me want to write a FSM instead by hand? > > Compacting out nugatory space in HTML would be another example. There are tools for processing tag based languages. Why would I want to create a FSM by hand for that? > Though they are quite complicated, I find FSMs very easy to write, and > they almost always work first time. You can narrow your thinking to a > tiny case and ignore the big picture quite safely. Certainly you can write FSMs for a lot of things. But you were claiming that a manual FSM should be used instead of a regexp engine; so the question remains unanswered: why would anyone create a FSM by hand for parsing? > In contrast, I find my regexes (of any complexity) nearly always have > some unexpected behaviour, often than does not show up immediately. Well, that certainly depends on your familiarity with the tool. To me this sounds suspiciously like NIH syndrome. I am so familiar with using regular expressions of various kinds that it would not occur to me to start writing a FSM for parsing by hand. That is such a waste of time. > The other complicating factor is I use three different regex schemes > in a day: Java, Funduc and SlickEdit. I keep borrowing syntax from > one of the other schemes than the one I am using. And how exactly do you implement a FSM in SlickEdit? > Some day I will > have to write replacements that use Java syntax. Not sure what you mean by that. Cheers robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.lang.java.programmer
csiph-web