Groups > comp.lang.javascript > #7691 > unrolled thread

Regular expression question

Started by	cerr <ron.eggler@gmail.com>
First post	2011-10-25 21:43 -0700
Last post	2011-10-28 04:12 +0200
Articles	15 — 7 participants

Back to article view | Back to comp.lang.javascript

  Regular expression question cerr <ron.eggler@gmail.com> - 2011-10-25 21:43 -0700
    Re: Regular expression question Mike Duffy <never@you.mind.com> - 2011-10-26 12:11 +0000
    Re: Regular expression question Denis McMahon <denismfmcmahon@gmail.com> - 2011-10-26 16:54 +0000
    Re: Regular expression question Lasse Reichstein Nielsen <lrn.unread@gmail.com> - 2011-10-26 19:33 +0200
      Re: Regular expression question Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-10-26 23:40 +0200
        Re: Regular expression question Antony Scriven <adscriven@gmail.com> - 2011-10-27 17:30 -0700
          Re: Regular expression question Antony Scriven <adscriven@gmail.com> - 2011-10-27 17:39 -0700
            Re: Regular expression question Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-10-28 03:50 +0200
              Re: Regular expression question Antony Scriven <adscriven@gmail.com> - 2011-10-28 08:46 -0700
              Re: Regular expression question Lasse Reichstein Nielsen <lrn.unread@gmail.com> - 2011-10-28 18:50 +0200
                Re: Regular expression question Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-10-28 19:50 +0200
                  Re: Regular expression question Antony Scriven <adscriven@gmail.com> - 2011-10-28 11:41 -0700
                    Re: Regular expression question Antony Scriven <adscriven@gmail.com> - 2011-10-28 12:54 -0700
    Re: Regular expression question Dr J R Stockton <reply1143@merlyn.demon.co.uk> - 2011-10-27 19:41 +0100
      Re: Regular expression question Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-10-28 04:12 +0200

#7691 — Regular expression question

From	cerr <ron.eggler@gmail.com>
Date	2011-10-25 21:43 -0700
Subject	Regular expression question
Message-ID	<a9fa509f-5f3c-4926-abc6-c77a21427d8f@j36g2000prh.googlegroups.com>

Hi There,

First thing, I'm a regular expression newbie.... somewhat anyways...
I would like to recognize the difference between this url:
http://quaaoutlodge.com/site/the-lodge/our-history.html
and that url:
http://quaaoutlodge.com/site/the-lodge.html
and at the same time extract the document name (our-history or the-
lodge) and the directory name if present (the-lodge).
I got stuck at how rto rcognize the second directory instead of the
first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
the second one only?
Thanks!
Ron

[toc] | [next] | [standalone]

#7694

From	Mike Duffy <never@you.mind.com>
Date	2011-10-26 12:11 +0000
Message-ID	<Xns9F8A535826ADBnevermind@94.75.214.39>
In reply to	#7691

cerr <ron.eggler@gmail.com> wrote in news:a9fa509f-5f3c-4926-abc6-
c77a21427d8f@j36g2000prh.googlegroups.com:

> I would like to recognize the difference between this url:
> http://quaaoutlodge.com/site/the-lodge/our-history.html
> and that url:
> http://quaaoutlodge.com/site/the-lodge.html

Since you say you are a beginner, it might be easier to first strip away 
the leading "http://quaaoutlodge.com" and the trailing ".html".

Now your problem is recognizing the difference between:

"/site/the-lodge/our-history" and "/site/the-lodge". Your task has been 
reduced simply to counting "/"s.

-- 
http://pages.videotron.ca/duffym/index.htm#

[toc] | [prev] | [next] | [standalone]

#7698

From	Denis McMahon <denismfmcmahon@gmail.com>
Date	2011-10-26 16:54 +0000
Message-ID	<4ea83b5c$0$28640$a8266bb1@newsreader.readnews.com>
In reply to	#7691

On Tue, 25 Oct 2011 21:43:35 -0700, cerr wrote:

> Hi There,
> 
> First thing, I'm a regular expression newbie.... somewhat anyways... I
> would like to recognize the difference between this url:
> http://quaaoutlodge.com/site/the-lodge/our-history.html and that url:
> http://quaaoutlodge.com/site/the-lodge.html and at the same time extract
> the document name (our-history or the- lodge) and the directory name if
> present (the-lodge). I got stuck at how rto rcognize the second
> directory instead of the first (the-lodge/ instead of site/) with
> "\b\/[a-z]+\/" how do i get the second one only?

First of all, it seems that your structure is to have a "lodge-file" for 
every lodge in the "site" directory. It would make more sense to use the 
per-lodge file as the index file in the lodge directory:

eg:

http://quaaoutlodge.com/site/the-lodge.html

becomes

http://quaaoutlodge.com/site/the-lodge/index.html

Now, in your "site" directory, you only need a single "index.htm[l]" file 
that has a list with elements something like:

<li><a href='http://quaaoutlodge.com/site/the-lodge/'>the-lodge</a></li>

Now instead of having the files for each lodge spread across two 
directories, all the files for a single lodge are in a single directory. 

If you made this change, it might make your regex problem easier, because 
for any lodge file in any directory, the url will always be:

http://quaaoutlodge.com/site/the-lodge/[filename]

And now you can find the filename and the dir (lodge) without having to 
use any regex:

var url = window.location;
var parts = url.split("/");
var fileName = parts[parts.length-1];
var lodgeDir = parts[parts.length-2];

See http://www.sined.co.uk/tmp/pathinfo.htm for an implementation.

Rgds

Denis McMahon

[toc] | [prev] | [next] | [standalone]

#7699

From	Lasse Reichstein Nielsen <lrn.unread@gmail.com>
Date	2011-10-26 19:33 +0200
Message-ID	<mxcnirot.fsf@gmail.com>
In reply to	#7691

cerr <ron.eggler@gmail.com> writes:

> First thing, I'm a regular expression newbie.... somewhat anyways...
> I would like to recognize the difference between this url:
> http://quaaoutlodge.com/site/the-lodge/our-history.html
> and that url:
> http://quaaoutlodge.com/site/the-lodge.html
> and at the same time extract the document name (our-history or the-
> lodge) and the directory name if present (the-lodge).
> I got stuck at how rto rcognize the second directory instead of the
> first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
> the second one only?

When you think a RegExp might solve your problem - stop for a moment
and think whether there is also a simpler solution :)

In this case, I'd just do:

  function name(url) {
    var name_end = url.lastIndexOf(".");
    var name_start = url.lastIndexOf("/", name_end) + 1;
    return url.substr(name_start, name_end);
  }

If your URLs aren't always that simple, you'd need to adapt a RegExp too.
/L
-- 
Lasse Reichstein Holst Nielsen
 'Javascript frameworks is a disruptive technology'

[toc] | [prev] | [next] | [standalone]

#7707

From	Thomas 'PointedEars' Lahn <PointedEars@web.de>
Date	2011-10-26 23:40 +0200
Message-ID	<2997930.SPkdTlGXAF@PointedEars.de>
In reply to	#7699

Lasse Reichstein Nielsen wrote:

> cerr <ron.eggler@gmail.com> writes:
>> First thing, I'm a regular expression newbie.... somewhat anyways...
>> I would like to recognize the difference between this url:
>> http://quaaoutlodge.com/site/the-lodge/our-history.html
>> and that url:
>> http://quaaoutlodge.com/site/the-lodge.html
>> and at the same time extract the document name (our-history or the-
>> lodge) and the directory name if present (the-lodge).
>> I got stuck at how rto rcognize the second directory instead of the
>> first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
>> the second one only?
> 
> When you think a RegExp might solve your problem - stop for a moment
> and think whether there is also a simpler solution :)

I cannot think of anything that is simpler than

  var matches = url.match(/(.*)\/([^\/]+)$/);

and then have a look at matches[1] ("directory") and matches[2] ("document 
name").  But that's me.

> In this case, I'd just do:
> 
>   function name(url) {

That is a poor function identifier.

>     var name_end = url.lastIndexOf(".");
>     var name_start = url.lastIndexOf("/", name_end) + 1;

Paths may contain dots.  Resource names do not need to.

>     return url.substr(name_start, name_end);

You meant

  return url.substring(name_start, name_end);

String.prototyp.substr(), OTOH, is proprietary – which is why it should not 
be used – and has ifferent semantics:

| B.2.3 String.prototype.substr (start, length)

>   }
> 
> If your URLs aren't always that simple, you'd need to adapt a RegExp too.

The general solution to this problem is so simple that you really could have 
posted it (BTDT).  OTOH, that is also why the OP could have found it by 
STFW.


PointedEars
-- 
Prototype.js was written by people who don't know javascript for people
who don't know javascript. People who don't know javascript are not
the best source of advice on designing systems that use javascript.
  -- Richard Cornford, cljs, <f806at$ail$1$8300dec7@news.demon.co.uk>

[toc] | [prev] | [next] | [standalone]

#7752

From	Antony Scriven <adscriven@gmail.com>
Date	2011-10-27 17:30 -0700
Message-ID	<d20558c5-26d2-4464-9068-db92e87fec9e@a17g2000yqj.googlegroups.com>
In reply to	#7707

On Oct 26, 10:40 pm, Thomas 'PointedEars' Lahn wrote:

 > Lasse Reichstein Nielsen wrote:
 > > cerr <ron.egg...@gmail.com> writes:
 > > > First thing, I'm a regular expression newbie.... somewhat
anyways...
 > > > I would like to recognize the difference between this url:
 > > >http://quaaoutlodge.com/site/the-lodge/our-history.html
 > > > and that url:
 > > >http://quaaoutlodge.com/site/the-lodge.html
 > > > and at the same time extract the document name (our-history or
the-
 > > > lodge) and the directory name if present (the-lodge).
 > > > I got stuck at how rto rcognize the second directory instead of
the
 > > > first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do
i get
 > > > the second one only?
 >
 > > When you think a RegExp might solve your problem - stop for a
moment
 > > and think whether there is also a simpler solution :)
 >
 > I cannot think of anything that is simpler than
 >
 >   var matches = url.match(/(.*)\/([^\/]+)$/);

var matches = url.match(/(.*)\/(.*)/);

--Antony

[toc] | [prev] | [next] | [standalone]

#7753

From	Antony Scriven <adscriven@gmail.com>
Date	2011-10-27 17:39 -0700
Message-ID	<c21d1200-abd0-430b-a97e-77c5bcecab47@i15g2000yqm.googlegroups.com>
In reply to	#7752

On Oct 28, 1:30 am, Antony Scriven wrote:

 > On Oct 26, 10:40 pm, Thomas 'PointedEars' Lahn wrote:
 >
 > > Lasse Reichstein Nielsen wrote:
 > > > cerr <ron.egg...@gmail.com> writes:
 > > > > First thing, I'm a regular expression newbie....
 > > > > somewhat anyways... I would like to recognize the
 > > > > difference between this url:
 > > > >
 > > > >     http://quaaoutlodge.com/site/the-lodge/our-history.html
 > > > >
 > > > > and that url:
 > > > >
 > > > >     http://quaaoutlodge.com/site/the-lodge.html
 > > > >
 > > > > and at the same time extract the document name
 > > > > (our-history or the- lodge) and the directory name
 > > > > if present (the-lodge). I got stuck at how rto
 > > > > rcognize the second directory instead of the first
 > > > > (the-lodge/ instead of site/) with "\b\/[a-z]+\/"
 > > > > how do i get the second one only?
 > >
 > > > When you think a RegExp might solve your problem
 > > > - stop for a moment and think whether there is also
 > > > a simpler solution :)
 > >
 > > I cannot think of anything that is simpler than
 > >
 > >   var matches = url.match(/(.*)\/([^\/]+)$/);
 >
 > var matches = url.match(/(.*)\/(.*)/);

And the reason you didn't spot that is also the reason why
Lasse's solution (using String.prototype.lastIndexOf) is
preferable IMHO. --Antony

P.S. Sorry about the mangled quoting earlier.

[toc] | [prev] | [next] | [standalone]

#7754

From	Thomas 'PointedEars' Lahn <PointedEars@web.de>
Date	2011-10-28 03:50 +0200
Message-ID	<1674123.KyhKTayAOI@PointedEars.de>
In reply to	#7753

Antony Scriven wrote:

> On Oct 28, 1:30 am, Antony Scriven wrote:
>> On Oct 26, 10:40 pm, Thomas 'PointedEars' Lahn wrote:
>> > I cannot think of anything that is simpler than
>> >   var matches = url.match(/(.*)\/([^\/]+)$/);
>> var matches = url.match(/(.*)\/(.*)/);
> 
> And the reason you didn't spot that

Spot what?  That your way is _not_ better?

> is also the reason why Lasse's solution (using
> String.prototype.lastIndexOf) is preferable IMHO. --Antony

http://foo.example/bar

> P.S. Sorry about the mangled quoting earlier.

Don't be sorry about *that*.


PointedEars
-- 
When all you know is jQuery, every problem looks $olvable.

[toc] | [prev] | [next] | [standalone]

#7789

From	Antony Scriven <adscriven@gmail.com>
Date	2011-10-28 08:46 -0700
Message-ID	<e1100967-7467-41c1-822b-125486c6d002@n38g2000yqm.googlegroups.com>
In reply to	#7754

On Oct 28, 2:50 am, Thomas 'PointedEars' Lahn wrote:

 > Antony Scriven wrote:
 > > On Oct 28, 1:30 am, Antony Scriven wrote:
 > > > On Oct 26, 10:40 pm, Thomas 'PointedEars' Lahn wrote:
 > > > > I cannot think of anything that is simpler than
 > > > >   var matches = url.match(/(.*)\/([^\/]+)$/);
 > > > var matches = url.match(/(.*)\/(.*)/);
 >
 > > And the reason you didn't spot that
 >
 > Spot what?  That your way is _not_ better?

How so? And, really, url.match(/site\/(.*\/)?(.*)/) is much
closer to what the OP actually asked for. And if the
complexity of the URLs increase at all, so does that of the
regexp. Regexps are a great way to hide bugs. --Antony

[toc] | [prev] | [next] | [standalone]

#7794

From	Lasse Reichstein Nielsen <lrn.unread@gmail.com>
Date	2011-10-28 18:50 +0200
Message-ID	<vcr93vsq.fsf@gmail.com>
In reply to	#7754

Thomas 'PointedEars' Lahn <PointedEars@web.de> writes:

> Antony Scriven wrote:

>> is also the reason why Lasse's solution (using
>> String.prototype.lastIndexOf) is preferable IMHO. --Antony
>
> http://foo.example/bar

My "solution" was very hardcoded to the format that the OP used, i.e.,
ending in "/somename.html".
Since that was all the examples he gave, and no real textual explanation,
it's impossible to generalize further.

Maybe I should have said that :)

/L
-- 
Lasse Reichstein Holst Nielsen
 'Javascript frameworks is a disruptive technology'

[toc] | [prev] | [next] | [standalone]

#7797

From	Thomas 'PointedEars' Lahn <PointedEars@web.de>
Date	2011-10-28 19:50 +0200
Message-ID	<4702008.Z4qSRW7Nsu@PointedEars.de>
In reply to	#7794

Lasse Reichstein Nielsen wrote:

> Thomas 'PointedEars' Lahn <PointedEars@web.de> writes:
>> Antony Scriven wrote:
>>> is also the reason why Lasse's solution (using
>>> String.prototype.lastIndexOf) is preferable IMHO. --Antony
>> http://foo.example/bar
> 
> My "solution" was very hardcoded to the format that the OP used, i.e.,
> ending in "/somename.html".
> Since that was all the examples he gave, and no real textual explanation,

It was clear enough to me that they wanted to know the last path component 
of a URI.

> it's impossible to generalize further.

Well, it wasn't.

> Maybe I should have said that :)

It was clear to me that your code was limited, however I saw and still see 
no good reason for doing that when the general solution – the one using 
RegExp, which was being asked for – is so obvious.

PointedEars
-- 
Danny Goodman's books are out of date and teach practices that are
positively harmful for cross-browser scripting.
 -- Richard Cornford, cljs, <cife6q$253$1$8300dec7@news.demon.co.uk> (2004)

[toc] | [prev] | [next] | [standalone]

#7800

From	Antony Scriven <adscriven@gmail.com>
Date	2011-10-28 11:41 -0700
Message-ID	<7623c451-a261-4495-a124-9ea0dbddc827@er6g2000vbb.googlegroups.com>
In reply to	#7797

On Oct 28, 6:50 pm, Thomas 'PointedEars' Lahn wrote:

 > Lasse Reichstein Nielsen wrote:
 > > Thomas 'PointedEars' Lahn <PointedE...@web.de> writes:
 > > > Antony Scriven wrote:
 > > > > is also the reason why Lasse's solution (using
 > > > > String.prototype.lastIndexOf) is preferable IMHO. --Antony
 > > > http://foo.example/bar
 >
 > > My "solution" was very hardcoded to the format that the
 > > OP used, i.e., ending in "/somename.html". Since that
 > > was all the examples he gave, and no real textual
 > > explanation,
 >
 > It was clear enough to me that they wanted to know the
 > last path component of a URI.
 >
 > > it's impossible to generalize further.
 >
 > Well, it wasn't.

Cough.

 > > Maybe I should have said that :)

Unless you have Asperger's or some other similar condition,
I don't think there's any difficulty in understanding what
Lasse wrote, and its implications.

 > It was clear to me that your code was limited, however
 > I saw and still see no good reason for doing that when
 > the general solution -- the one using RegExp, which
 > was being asked for -- is so obvious.

Well, I already showed that that isn't so. And if an expert
such as yourself can't make an obvious regexp match the
specification, then I think there is a lesson to be learnt
there. Regexps can be powerful, terse, and convenient, but
they can be very tricky things to get right, even the simple
ones. --Antony

P.S. Having said what I've said, I think it's a good thing
that its regexps are somewhat limited compared to some other
implementations.

[toc] | [prev] | [next] | [standalone]

#7801

From	Antony Scriven <adscriven@gmail.com>
Date	2011-10-28 12:54 -0700
Message-ID	<3e809505-bbd2-4019-8115-fb875010bc6a@f36g2000vbm.googlegroups.com>
In reply to	#7800

On Oct 28, 7:41 pm, Antony Scriven wrote:

 > [...]
 >
 > P.S. Having said what I've said, I think it's a good thing
 > that its regexps are somewhat limited compared to some other
 > implementations.

s/its/JS's/

[toc] | [prev] | [next] | [standalone]

#7749

From	Dr J R Stockton <reply1143@merlyn.demon.co.uk>
Date	2011-10-27 19:41 +0100
Message-ID	<VZn6PnGMXaqOFwQs@invalid.uk.co.demon.merlyn.invalid>
In reply to	#7691

In comp.lang.javascript message <a9fa509f-5f3c-4926-abc6-c77a21427d8f@j3
6g2000prh.googlegroups.com>, Tue, 25 Oct 2011 21:43:35, cerr
<ron.eggler@gmail.com> posted:

>First thing, I'm a regular expression newbie.... somewhat anyways...
>I would like to recognize the difference between this url:
>http://quaaoutlodge.com/site/the-lodge/our-history.html
>and that url:
>http://quaaoutlodge.com/site/the-lodge.html
>and at the same time extract the document name (our-history or the-
>lodge) and the directory name if present (the-lodge).
>I got stuck at how rto rcognize the second directory instead of the
>first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
>the second one only?

The easiest way, teaching nothing about RegExps, should be to use the
string method 'split' with an argument "/", and contemplate the result
and its length.

Also, see <http://www.merlyn.demon.co.uk/js-valid.htm> generally.

That's assuming that your datum starts as a string.

If you are writing quaaoutlodge, and use include files, then you might
be including a location.href evaluation in your pages, in order that a
page can tell which it is.  In that case, look up the other properties
of location.

-- 
 (c) John Stockton, nr London, UK. ?@merlyn.demon.co.uk  Turnpike v6.05  MIME.
  Web  <http://www.merlyn.demon.co.uk/> - FAQqish topics, acronyms and links;
  Astro stuff via astron-1.htm, gravity0.htm ; quotings.htm, pascal.htm, etc.
 No Encoding. Quotes before replies. Snip well. Write clearly. Don't Mail News.

[toc] | [prev] | [next] | [standalone]

#7755

From	Thomas 'PointedEars' Lahn <PointedEars@web.de>
Date	2011-10-28 04:12 +0200
Message-ID	<1326734.ds2RrIXCgJ@PointedEars.de>
In reply to	#7749

Dr J R Stockton wrote:

> <ron.eggler@gmail.com> posted:
>> First thing, I'm a regular expression newbie.... somewhat anyways...
>> I would like to recognize the difference between this url:
>> http://quaaoutlodge.com/site/the-lodge/our-history.html
>> and that url:
>> http://quaaoutlodge.com/site/the-lodge.html
>> and at the same time extract the document name (our-history or the-
>> lodge) and the directory name if present (the-lodge).
>> I got stuck at how rto rcognize the second directory instead of the
>> first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
>> the second one only?
> 
> The easiest way, teaching nothing about RegExps, should be to use the
> string method 'split' with an argument "/", and contemplate the result
> and its length.

By contrast, that requires accessing the `length' property of the resulting 
array, too, and is inflexible with regard to potential query and fragment 
parts.


PointedEars
-- 
var bugRiddenCrashPronePieceOfJunk = (
    navigator.userAgent.indexOf('MSIE 5') != -1
    && navigator.userAgent.indexOf('Mac') != -1
)  // Plone, register_function.js:16

[toc] | [prev] | [standalone]

csiph-web

Regular expression question

Contents

#7691 — Regular expression question

#7694

#7698

#7699

#7707

#7752

#7753

#7754

#7789

#7794

#7797

#7800

#7801

#7749

#7755