Groups > comp.text > #19 > unrolled thread

character classes & regular expressions

Started by	Ivan Shmakov <oneingray@gmail.com>
First post	2012-05-05 09:35 +0700
Last post	2012-05-10 19:28 +0700
Articles	7 — 3 participants

Back to article view | Back to comp.text

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  character classes & regular expressions Ivan Shmakov <oneingray@gmail.com> - 2012-05-05 09:35 +0700
    Re: character classes & regular expressions Kaz Kylheku <kaz@kylheku.com> - 2012-05-05 02:42 +0000
      Re: character classes & regular expressions Ivan Shmakov <oneingray@gmail.com> - 2012-05-05 10:00 +0700
    Re: character classes & regular expressions Cydrome Leader <presence@MUNGEpanix.com> - 2012-05-06 02:31 +0000
      Re: character classes & regular expressions Ivan Shmakov <oneingray@gmail.com> - 2012-05-06 12:15 +0700
        Re: character classes & regular expressions Cydrome Leader <presence@MUNGEpanix.com> - 2012-05-06 07:47 +0000
          copyright in Russia Ivan Shmakov <oneingray@gmail.com> - 2012-05-10 19:28 +0700

#19 — character classes & regular expressions

From	Ivan Shmakov <oneingray@gmail.com>
Date	2012-05-05 09:35 +0700
Subject	character classes & regular expressions
Message-ID	<86mx5n2w3m.fsf_-_@gray.siamics.net>

>>>>> Kaz Kylheku <kaz@kylheku.com> writes:
>>>>> On 2012-05-04, Ivan Shmakov <oneingray@gmail.com> wrote:
>>>>> Kaz Kylheku <kaz@kylheku.com> writes:

	[Cross-posting to news:comp.text, for the subject being
	discussed is hardly specific to Unix shells; really, this time.]

[...]

 >>> In my defense, I've never used this committee-designed dog of a
 >>> syntax until today, which was only because I was groping for a
 >>> quick workaround, and likely never will again.

 >>> (I also refuse to implement it in my regex engine, though I have
 >>> caved in to Perl's \w, \d, \s, \W, \D and \S, which is probably as
 >>> far as I will go.)

 >> How do you specify a "single character, either an upper-case letter
 >> or a digit" within such a regular expression, then?

 > [A-Z0-9]
 > [A-Z\d]

	It happens that the native languages of the most people of the
	world either use extensions to the Latin script (beyond those in
	ASCII, such as J or W), or use a script not derived from Latin
	at all.  (Greek-based scripts are not uncommon, for instance;
	FWIW, the Latin script is based on the Greek one itself.)

	Good luck selling your product to anyone speaking French, Greek,
	Polish or Russian.

-- 
FSF associate member #7257

[toc] | [next] | [standalone]

#20

From	Kaz Kylheku <kaz@kylheku.com>
Date	2012-05-05 02:42 +0000
Message-ID	<20120504193314.15@kylheku.com>
In reply to	#19

["Followup-To:" header set to comp.unix.shell.]
On 2012-05-05, Ivan Shmakov <oneingray@gmail.com> wrote:
>>>>>> Kaz Kylheku <kaz@kylheku.com> writes:
>>>>>> On 2012-05-04, Ivan Shmakov <oneingray@gmail.com> wrote:
>>>>>> Kaz Kylheku <kaz@kylheku.com> writes:
>
> 	[Cross-posting to news:comp.text, for the subject being
> 	discussed is hardly specific to Unix shells; really, this time.]
>
> [...]
>
> >>> In my defense, I've never used this committee-designed dog of a
> >>> syntax until today, which was only because I was groping for a
> >>> quick workaround, and likely never will again.
>
> >>> (I also refuse to implement it in my regex engine, though I have
> >>> caved in to Perl's \w, \d, \s, \W, \D and \S, which is probably as
> >>> far as I will go.)
>
> >> How do you specify a "single character, either an upper-case letter
> >> or a digit" within such a regular expression, then?
>
> > [A-Z0-9]
> > [A-Z\d]
>
> 	It happens that the native languages of the most people of the
> 	world either use extensions to the Latin script (beyond those in
> 	ASCII, such as J or W), or use a script not derived from Latin
> 	at all.  (Greek-based scripts are not uncommon, for instance;
> 	FWIW, the Latin script is based on the Greek one itself.)
>
> 	Good luck selling your product to anyone speaking French, Greek,
> 	Polish or Russian.

Sorry, you're mistaken. In Russia, France, Poland, Japan, you name it,
coders still want [A-Z] to actually denote A-Z.

In GNU flex, 

  [A-Z] { action(); }

matches A, B, C ... Z. This is nicely baked at the time flex is run,
and not perturbed by any environment variables in the run time of the
generated scanner.

A Russian parser-writing hacker expects this behavior.

[A-Z] meaning anything else is a fuckup by morons, regardless of what is
in the environment or whether or not the application invoked setlocust.
Errr, setlocale, damn it. Why do I keep doing that?

[toc] | [prev] | [next] | [standalone]

#21

From	Ivan Shmakov <oneingray@gmail.com>
Date	2012-05-05 10:00 +0700
Message-ID	<86havv2uxi.fsf@gray.siamics.net>
In reply to	#20

>>>>> Kaz Kylheku <kaz@kylheku.com> writes:
>>>>> On 2012-05-05, Ivan Shmakov <oneingray@gmail.com> wrote:
>>>>> Kaz Kylheku <kaz@kylheku.com> writes:
>>>>> On 2012-05-04, Ivan Shmakov <oneingray@gmail.com> wrote:

 > ["Followup-To:" header set to comp.unix.shell.]

	Fow what reason, I wonder?  The issue being discussed has little
	relation to the Unix Shells per se.  (And indeed, you provide an
	example in GNU flex in your own post.)

[...]

 >>>> How do you specify a "single character, either an upper-case
 >>>> letter or a digit" within such a regular expression, then?

 >>> [A-Z0-9]
 >>> [A-Z\d]

 >> It happens that the native languages of the most people of the
 >> world either use extensions to the Latin script (beyond those in
 >> ASCII, such as J or W), or use a script not derived from Latin
 >> at all.  (Greek-based scripts are not uncommon, for instance;
 >> FWIW, the Latin script is based on the Greek one itself.)

 >> Good luck selling your product to anyone speaking French, Greek,
 >> Polish or Russian.

 > Sorry, you're mistaken.  In Russia, France, Poland, Japan, you name
 > it, coders still want [A-Z] to actually denote A-Z.

	Yes.

	Still, they want for a way to denote a "single character, either
	an upper-case letter or a digit", which is precisely what
	[[:upper:][:digit:]] is for (and which is the notation that,
	IIUC, you were opposed to.)

[...]

-- 
FSF associate member #7257

[toc] | [prev] | [next] | [standalone]

#26

From	Cydrome Leader <presence@MUNGEpanix.com>
Date	2012-05-06 02:31 +0000
Message-ID	<jo4np9$k3t$1@reader1.panix.com>
In reply to	#19

In comp.unix.shell Ivan Shmakov <oneingray@gmail.com> wrote:
>>>>>> Kaz Kylheku <kaz@kylheku.com> writes:
>>>>>> On 2012-05-04, Ivan Shmakov <oneingray@gmail.com> wrote:
>>>>>> Kaz Kylheku <kaz@kylheku.com> writes:
> 
>        [Cross-posting to news:comp.text, for the subject being
>        discussed is hardly specific to Unix shells; really, this time.]
> 
> [...]
> 
> >>> In my defense, I've never used this committee-designed dog of a
> >>> syntax until today, which was only because I was groping for a
> >>> quick workaround, and likely never will again.
> 
> >>> (I also refuse to implement it in my regex engine, though I have
> >>> caved in to Perl's \w, \d, \s, \W, \D and \S, which is probably as
> >>> far as I will go.)
> 
> >> How do you specify a "single character, either an upper-case letter
> >> or a digit" within such a regular expression, then?
> 
> > [A-Z0-9]
> > [A-Z\d]
> 
>        It happens that the native languages of the most people of the
>        world either use extensions to the Latin script (beyond those in
>        ASCII, such as J or W), or use a script not derived from Latin
>        at all.  (Greek-based scripts are not uncommon, for instance;
>        FWIW, the Latin script is based on the Greek one itself.)
> 
>        Good luck selling your product to anyone speaking French, Greek,
>        Polish or Russian.

like greeks have money to buy software or a russian has ever made a legit 
software purchase.

[toc] | [prev] | [next] | [standalone]

#29

From	Ivan Shmakov <oneingray@gmail.com>
Date	2012-05-06 12:15 +0700
Message-ID	<86mx5lzy7z.fsf@gray.siamics.net>
In reply to	#26

>>>>> Cydrome Leader <presence@MUNGEpanix.com> writes:
>>>>> In comp.unix.shell Ivan Shmakov <oneingray@gmail.com> wrote:
>>>>> Kaz Kylheku <kaz@kylheku.com> writes:
>>>>> On 2012-05-04, Ivan Shmakov <oneingray@gmail.com> wrote:

	[Cross-posting into news:alt.conspiracy.microsoft, as a possible
	aid for kill-filing.]

[...]

 >>>> How do you specify a "single character, either an upper-case
 >>>> letter or a digit" within such a regular expression, then?

 >>> [A-Z0-9]
 >>> [A-Z\d]

 >> It happens that the native languages of the most people of the world
 >> either use extensions to the Latin script (beyond those in ASCII,
 >> such as J or W), or use a script not derived from Latin at all.
 >> (Greek-based scripts are not uncommon, for instance; FWIW, the Latin
 >> script is based on the Greek one itself.)

 >> Good luck selling your product to anyone speaking French, Greek,
 >> Polish or Russian.

 > like greeks have money to buy software or a russian has ever made a
 > legit software purchase.

	There were the rumors that Sberbank is the largest partner of
	Microsoft in Europe.  (Perhaps [1] may shed some light on this.)

	And not to mention all those gamers on Steam.

	One may sell services based on software just as well, BTW.

[1] http://download.microsoft.com/documents/customerevidence/6062_Sberbank.doc

-- 
FSF associate member #7257

[toc] | [prev] | [next] | [standalone]

#30

From	Cydrome Leader <presence@MUNGEpanix.com>
Date	2012-05-06 07:47 +0000
Message-ID	<jo5ab5$glq$1@reader1.panix.com>
In reply to	#29

In comp.unix.shell Ivan Shmakov <oneingray@gmail.com> wrote:
>>>>>> Cydrome Leader <presence@MUNGEpanix.com> writes:
>>>>>> In comp.unix.shell Ivan Shmakov <oneingray@gmail.com> wrote:
>>>>>> Kaz Kylheku <kaz@kylheku.com> writes:
>>>>>> On 2012-05-04, Ivan Shmakov <oneingray@gmail.com> wrote:
> 
>        [Cross-posting into news:alt.conspiracy.microsoft, as a possible
>        aid for kill-filing.]
> 
> [...]
> 
> >>>> How do you specify a "single character, either an upper-case
> >>>> letter or a digit" within such a regular expression, then?
> 
> >>> [A-Z0-9]
> >>> [A-Z\d]
> 
> >> It happens that the native languages of the most people of the world
> >> either use extensions to the Latin script (beyond those in ASCII,
> >> such as J or W), or use a script not derived from Latin at all.
> >> (Greek-based scripts are not uncommon, for instance; FWIW, the Latin
> >> script is based on the Greek one itself.)
> 
> >> Good luck selling your product to anyone speaking French, Greek,
> >> Polish or Russian.
> 
> > like greeks have money to buy software or a russian has ever made a
> > legit software purchase.
> 
>        There were the rumors that Sberbank is the largest partner of
>        Microsoft in Europe.  (Perhaps [1] may shed some light on this.)
> 
>        And not to mention all those gamers on Steam.
> 
>        One may sell services based on software just as well, BTW.
> 
> [1] http://download.microsoft.com/documents/customerevidence/6062_Sberbank.doc

So 10 years ago, one bank in russia may have had some legit microsoft 
licenses. This alone is actually impressive.

Everything else is russia is still pirated.

[toc] | [prev] | [next] | [standalone]

#33 — copyright in Russia

From	Ivan Shmakov <oneingray@gmail.com>
Date	2012-05-10 19:28 +0700
Subject	copyright in Russia
Message-ID	<86ehqsw785.fsf_-_@gray.siamics.net>
In reply to	#30

>>>>> Cydrome Leader <presence@MUNGEpanix.com> writes:
>>>>> In comp.unix.shell Ivan Shmakov <oneingray@gmail.com> wrote:
>>>>> Cydrome Leader <presence@MUNGEpanix.com> writes:
>>>>> In comp.unix.shell Ivan Shmakov <oneingray@gmail.com> wrote:

	[Cross-posting to news:comp.software.licensing and
	news:misc.int-property, and setting Followup-To: there, for the
	discussion doesn't belong to the Newsgroups: currently in
	effect.]

[...]

 >>>> Good luck selling your product to anyone speaking French, Greek,
 >>>> Polish or Russian.

 >>> like greeks have money to buy software or a russian has ever made a
 >>> legit software purchase.

 >> There were the rumors that Sberbank is the largest partner of
 >> Microsoft in Europe.  (Perhaps [1] may shed some light on this.)

 >> And not to mention all those gamers on Steam.

 >> One may sell services based on software just as well, BTW.

 >> [1] http://download.microsoft.com/documents/customerevidence/6062_Sberbank.doc

 > So 10 years ago, one bank in russia may have had some legit microsoft
 > licenses.  This alone is actually impressive.

	Actually, free software (as in freedom) is quite popular in
	Russia, as is freeware (as in beer), although license terms
	violations also occur with these two.

	Also to note is that the copyright law in Russia was extended to
	cover software in 1994, IIRC, and it took a decade for the
	common people, as well as the judicial system itself, to get
	accustomed to the concept.

	In the recent years, the laws made a shift towards more severe
	punishments, and there were some widely-publicized court cases
	related to the copyright law.  The net result is that illegal
	copies of software are now rarely seen at least in state-owned
	enterprise (while being commonplace there in the mid-1990s.)
	The proliferation of mobile computers (which typically come with
	an OEM-licensed version of an OS pre-installed) also made such
	copies somewhat harder (though not impossible altogether) to
	find at home.

 > Everything else is russia is still pirated.

	When it comes to the terms, I doubt that the victims of the real
	pirates (say, [1, 2]) would readily accept the very notion of
	the corporations being "piracy victims, too."

	That being said, I share the opinion of that the copyright law,
	in its current form, /impedes/ progress, instead of facilitating
	it.  I've briefly read through [3], and I'd like to recommend it
	to anyone interested in this view.

[1] http://seattletimes.nwsource.com/html/nationworld/2014376628_apuspiracyvictimsmemorial.html
[2] http://en.wikipedia.org/wiki/Piracy_in_Somalia
[3] http://mitpress.mit.edu/books/full_pdfs/Access_to_Knowledge_in_the_Age_of_Intellectual_Property.pdf

-- 
FSF associate member #7257

[toc] | [prev] | [standalone]

csiph-web

character classes & regular expressions

Contents

#19 — character classes & regular expressions

#20

#21

#26

#29

#30

#33 — copyright in Russia