Groups | Search | Server Info | Keyboard shortcuts | Login | Register
| From | Kaz Kylheku <kaz@kylheku.com> |
|---|---|
| Newsgroups | comp.text, comp.unix.shell |
| Subject | Re: character classes & regular expressions |
| Followup-To | comp.unix.shell |
| Date | 2012-05-05 02:42 +0000 |
| Organization | Aioe.org NNTP Server |
| Message-ID | <20120504193314.15@kylheku.com> (permalink) |
| References | (1 earlier) <7QGor.22191$em4.6868@newsfe21.iad> <20120503211141.506@kylheku.com> <86r4uz3kzo.fsf@gray.siamics.net> <20120504103639.939@kylheku.com> <86mx5n2w3m.fsf_-_@gray.siamics.net> |
Cross-posted to 2 groups.
Followups directed to: comp.unix.shell
["Followup-To:" header set to comp.unix.shell.]
On 2012-05-05, Ivan Shmakov <oneingray@gmail.com> wrote:
>>>>>> Kaz Kylheku <kaz@kylheku.com> writes:
>>>>>> On 2012-05-04, Ivan Shmakov <oneingray@gmail.com> wrote:
>>>>>> Kaz Kylheku <kaz@kylheku.com> writes:
>
> [Cross-posting to news:comp.text, for the subject being
> discussed is hardly specific to Unix shells; really, this time.]
>
> [...]
>
> >>> In my defense, I've never used this committee-designed dog of a
> >>> syntax until today, which was only because I was groping for a
> >>> quick workaround, and likely never will again.
>
> >>> (I also refuse to implement it in my regex engine, though I have
> >>> caved in to Perl's \w, \d, \s, \W, \D and \S, which is probably as
> >>> far as I will go.)
>
> >> How do you specify a "single character, either an upper-case letter
> >> or a digit" within such a regular expression, then?
>
> > [A-Z0-9]
> > [A-Z\d]
>
> It happens that the native languages of the most people of the
> world either use extensions to the Latin script (beyond those in
> ASCII, such as J or W), or use a script not derived from Latin
> at all. (Greek-based scripts are not uncommon, for instance;
> FWIW, the Latin script is based on the Greek one itself.)
>
> Good luck selling your product to anyone speaking French, Greek,
> Polish or Russian.
Sorry, you're mistaken. In Russia, France, Poland, Japan, you name it,
coders still want [A-Z] to actually denote A-Z.
In GNU flex,
[A-Z] { action(); }
matches A, B, C ... Z. This is nicely baked at the time flex is run,
and not perturbed by any environment variables in the run time of the
generated scanner.
A Russian parser-writing hacker expects this behavior.
[A-Z] meaning anything else is a fuckup by morons, regardless of what is
in the environment or whether or not the application invoked setlocust.
Errr, setlocale, damn it. Why do I keep doing that?
Back to comp.text | Previous | Next — Previous in thread | Next in thread | Find similar
character classes & regular expressions Ivan Shmakov <oneingray@gmail.com> - 2012-05-05 09:35 +0700
Re: character classes & regular expressions Kaz Kylheku <kaz@kylheku.com> - 2012-05-05 02:42 +0000
Re: character classes & regular expressions Ivan Shmakov <oneingray@gmail.com> - 2012-05-05 10:00 +0700
Re: character classes & regular expressions Cydrome Leader <presence@MUNGEpanix.com> - 2012-05-06 02:31 +0000
Re: character classes & regular expressions Ivan Shmakov <oneingray@gmail.com> - 2012-05-06 12:15 +0700
Re: character classes & regular expressions Cydrome Leader <presence@MUNGEpanix.com> - 2012-05-06 07:47 +0000
copyright in Russia Ivan Shmakov <oneingray@gmail.com> - 2012-05-10 19:28 +0700
csiph-web