Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > gnu.bash.bug > #14312 > unrolled thread

Re: Word boundary anchors \< and \> not parsed correctly on the right side of =~

Started byChet Ramey <chet.ramey@case.edu>
First post2018-07-10 10:37 -0400
Last post2018-07-10 10:37 -0400
Articles 1 — 1 participant

Back to article view | Back to gnu.bash.bug

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Word boundary anchors \< and \> not parsed correctly on the right side of =~ Chet Ramey <chet.ramey@case.edu> - 2018-07-10 10:37 -0400

#14312 — Re: Word boundary anchors \< and \> not parsed correctly on the right side of =~

FromChet Ramey <chet.ramey@case.edu>
Date2018-07-10 10:37 -0400
SubjectRe: Word boundary anchors \< and \> not parsed correctly on the right side of =~
Message-ID<mailman.3374.1531233460.1292.bug-bash@gnu.org>
On 7/9/18 9:46 PM, marcelpaulo@gmail.com wrote:

> Bash Version: 4.4
> Patch Level: 19
> Release Status: release
> 
> Description:
> Word boundary anchors \< and \> are not parsed correctly on the right side of a =~ regex match expression. 

Bash assumes Posix regular expressions (EREs), and those don't have any
kind of boundary anchors.

> This evaluates as false:
> 
>     [[ 'foo bar' =~ \<foo\> ]]

As it should; in a Posix RE, you're matching a literal <foo>.

> From the bash reference manual:
> 
>     An additional binary operator, ‘=~’, is available, with the same precedence
>     as ‘==’ and ‘!=’. When it is used, the string to the right of the operator
>     is consid- ered an extended regular expression and matched accordingly (as
>     in regex 3)).

A Posix extended regular expression.

> 
> Reading regex(3), I presumed the regexes would be parsed as C strings, so the
> backslashes would need to be escaped:
> 
>     [[ 'foo bar' =~ \\<foo\\> ]]
> 
> but this results in:
> 
>     bash: syntax error in conditional expression: unexpected token `<'
>     bash: syntax error near `\\<f'

The `<' is not a character that's special to Posix EREs, so it retains its
usual meaning as an operator and needs to be quoted.

>     
> If the regex is stored in a variable, the expression evaluates as true:
> 
>     re='\<foo\>'
>     [[ 'foo bar' =~ $re ]]

Yes, this is the usual idiom to pass special characters that may include
quotes or operators to the regular expression engine.

Bash supports the notion that you can quote characters that are special to
EREs to match them literally. To do this, it has to know which characters
are special to EREs, and it uses POSIX to determine those characters. A
system can add extensions to the standard POSIX interface, but bash can't
assume the presence of any such extension.

> 
> Treating the regex as C strings works for the \b anchor, so that this evaluates as true:

Because `b' is not an operator and doesn't need to be escaped.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/

[toc] | [standalone]


Back to top | Article view | gnu.bash.bug


csiph-web