Groups | Search | Server Info | Keyboard shortcuts | Login | Register


Groups > comp.text > #20

Re: character classes & regular expressions

From Kaz Kylheku <kaz@kylheku.com>
Newsgroups comp.text, comp.unix.shell
Subject Re: character classes & regular expressions
Followup-To comp.unix.shell
Date 2012-05-05 02:42 +0000
Organization Aioe.org NNTP Server
Message-ID <20120504193314.15@kylheku.com> (permalink)
References (1 earlier) <7QGor.22191$em4.6868@newsfe21.iad> <20120503211141.506@kylheku.com> <86r4uz3kzo.fsf@gray.siamics.net> <20120504103639.939@kylheku.com> <86mx5n2w3m.fsf_-_@gray.siamics.net>

Cross-posted to 2 groups.

Followups directed to: comp.unix.shell

Show all headers | View raw


["Followup-To:" header set to comp.unix.shell.]
On 2012-05-05, Ivan Shmakov <oneingray@gmail.com> wrote:
>>>>>> Kaz Kylheku <kaz@kylheku.com> writes:
>>>>>> On 2012-05-04, Ivan Shmakov <oneingray@gmail.com> wrote:
>>>>>> Kaz Kylheku <kaz@kylheku.com> writes:
>
> 	[Cross-posting to news:comp.text, for the subject being
> 	discussed is hardly specific to Unix shells; really, this time.]
>
> [...]
>
> >>> In my defense, I've never used this committee-designed dog of a
> >>> syntax until today, which was only because I was groping for a
> >>> quick workaround, and likely never will again.
>
> >>> (I also refuse to implement it in my regex engine, though I have
> >>> caved in to Perl's \w, \d, \s, \W, \D and \S, which is probably as
> >>> far as I will go.)
>
> >> How do you specify a "single character, either an upper-case letter
> >> or a digit" within such a regular expression, then?
>
> > [A-Z0-9]
> > [A-Z\d]
>
> 	It happens that the native languages of the most people of the
> 	world either use extensions to the Latin script (beyond those in
> 	ASCII, such as J or W), or use a script not derived from Latin
> 	at all.  (Greek-based scripts are not uncommon, for instance;
> 	FWIW, the Latin script is based on the Greek one itself.)
>
> 	Good luck selling your product to anyone speaking French, Greek,
> 	Polish or Russian.

Sorry, you're mistaken. In Russia, France, Poland, Japan, you name it,
coders still want [A-Z] to actually denote A-Z.

In GNU flex, 

  [A-Z] { action(); }

matches A, B, C ... Z. This is nicely baked at the time flex is run,
and not perturbed by any environment variables in the run time of the
generated scanner.

A Russian parser-writing hacker expects this behavior.

[A-Z] meaning anything else is a fuckup by morons, regardless of what is
in the environment or whether or not the application invoked setlocust.
Errr, setlocale, damn it. Why do I keep doing that?

Back to comp.text | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

character classes & regular expressions Ivan Shmakov <oneingray@gmail.com> - 2012-05-05 09:35 +0700
  Re: character classes & regular expressions Kaz Kylheku <kaz@kylheku.com> - 2012-05-05 02:42 +0000
    Re: character classes & regular expressions Ivan Shmakov <oneingray@gmail.com> - 2012-05-05 10:00 +0700
  Re: character classes & regular expressions Cydrome Leader <presence@MUNGEpanix.com> - 2012-05-06 02:31 +0000
    Re: character classes & regular expressions Ivan Shmakov <oneingray@gmail.com> - 2012-05-06 12:15 +0700
      Re: character classes & regular expressions Cydrome Leader <presence@MUNGEpanix.com> - 2012-05-06 07:47 +0000
        copyright in Russia Ivan Shmakov <oneingray@gmail.com> - 2012-05-10 19:28 +0700

csiph-web