Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > gnu.utils.bug > #2224
| Path | csiph.com!feeder.erje.net!2.us.feeder.erje.net!nntp.club.cc.cmu.edu!micro-heart-of-gold.mit.edu!bloom-beacon.mit.edu!bloom-beacon.mit.edu!171.64.64.130.MISMATCH!usenet.stanford.edu!not-for-mail |
|---|---|
| From | Eric Blake <eblake@redhat.com> |
| Newsgroups | gnu.utils.bug |
| Subject | Re: Bug of grep -E |
| Date | Wed, 6 Dec 2017 09:32:57 -0600 |
| Organization | Red Hat, Inc. |
| Lines | 96 |
| Approved | bug-gnu-utils@gnu.org |
| Message-ID | <mailman.5226.1512574386.27995.bug-gnu-utils@gnu.org> (permalink) |
| References | <tencent_33C2D0CF6BB7A5EA5B8D5D4EDFAB5894320A@qq.com> |
| NNTP-Posting-Host | lists.gnu.org |
| Mime-Version | 1.0 |
| Content-Type | multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="xbuKLA82Cc61QHNkeILJQJag2Qo8AjMqm" |
| X-Trace | usenet.stanford.edu 1512574386 19479 208.118.235.17 (6 Dec 2017 15:33:06 GMT) |
| X-Complaints-To | action@cs.stanford.edu |
| To | iPack <2741547153@qq.com>, bug-gnu-utils <bug-gnu-utils@gnu.org> |
| Envelope-to | bug-gnu-utils@gnu.org |
| Openpgp | url=http://people.redhat.com/eblake/eblake.gpg |
| User-Agent | Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 |
| In-Reply-To | <tencent_33C2D0CF6BB7A5EA5B8D5D4EDFAB5894320A@qq.com> |
| X-Scanned-By | MIMEDefang 2.79 on 10.5.11.15 |
| X-Greylist | Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Wed, 06 Dec 2017 15:32:59 +0000 (UTC) |
| X-detected-operating-system | by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] |
| X-Received-From | 209.132.183.28 |
| X-Content-Filtered-By | Mailman/MimeDel 2.1.21 |
| X-BeenThere | bug-gnu-utils@gnu.org |
| X-Mailman-Version | 2.1.21 |
| Precedence | list |
| List-Id | Bug reports for the GNU utilities <bug-gnu-utils.gnu.org> |
| List-Unsubscribe | <https://lists.gnu.org/mailman/options/bug-gnu-utils>, <mailto:bug-gnu-utils-request@gnu.org?subject=unsubscribe> |
| List-Archive | <http://lists.gnu.org/archive/html/bug-gnu-utils/> |
| List-Post | <mailto:bug-gnu-utils@gnu.org> |
| List-Help | <mailto:bug-gnu-utils-request@gnu.org?subject=help> |
| List-Subscribe | <https://lists.gnu.org/mailman/listinfo/bug-gnu-utils>, <mailto:bug-gnu-utils-request@gnu.org?subject=subscribe> |
| Xref | csiph.com gnu.utils.bug:2224 |
Show key headers only | View raw
[Multipart message — attachments visible in raw view] - view raw
On 12/06/2017 09:02 AM, iPack wrote:
> [urain39@urain39-pc ~]$ cat test
> https://konachan.com/image/a4ff5caad2fa35faa2271df9badacd35/Konachan.com%20-%20255941%20blush%20brown_eyes%20crying%20fate_kaleid_liner_prisma_illya%20fate_%28series%29%20japanese_clothes%20kimono%20long_hair%20miyu_edelfelt%20purple_hair%20tagme_%28artist%29%20tears.jpg
>
> [urain39@urain39-pc ~]$ cat test | grep -Eo '[0-9a-f]{32}/[0-9A-Za-z%_\.\-]+'
> a4ff5caad2fa35faa2271df9badacd35/Konachan.com%20-%20255941%20blush%20brown_eyes%20crying%20fate_kaleid_liner_prisma_illya%20fate_%28series%29%20japanese_clothes%20kimono%20long_hair%20miyu_edelfelt%20purple_hair%20tagme_%28artist%29%20tears.jpg
>
> [urain39@urain39-pc ~]$ cat test | grep -Eo '[0-9a-f]{32}/[0-9A-Za-z\-%_\.]+'
> a4ff5caad2fa35faa2271df9badacd35/Konachan.com%20
>
> It is bug ? or just my syntax error ?
Your syntax error.
In the C locale,
[0-9A-Za-z%_\.\-] matches digits, letters, %, _, \ (listed twice, but
the second listing is ignored), ., and -.
[0-9A-Za-z\-%_\.] matches digits, letters, the range of ASCII bytes
between \ and % (whoops - in ASCII, \ is 47 but % is 37 - you have a
backwards range, so that portion of the range expression matches nothing
at all), then _, \, and . Hence, '-' is not one of the characters
matched, and grep's output is shorter. POSIX permits the implementation
you saw; it also permits an implementation that refuses to grep at all
by declaring your regex invalid because of the backwards range.
In non-C locales, use of - in a [] expression that is not either the
first or the last member of the set is implementation-defined, and all
bets are off on what it matches (lately, GNU tools have been moving
towards rational-range-interpretation, which means treating the range as
the same bytes as it would match in the C locale; but other
implementations, or even older versions of GNU tools, tried to get fancy
and match any character that would collate between the two endpoints,
which gets weird fast).
It _looks_ like you were trying to use \- and \. as escape characters.
But inside [] (at least, the Extended Regular Expression syntax of 'grep
-E' as defined by POSIX), \ is not an escape character; and nothing
needs escaping (there are only special rules about where ], ^, and - are
handled). Yes, there are other flavors of regex engines (perl, for
example) where \ DOES act as an escape even inside []. Which is why it
is essential that you know the quirks of each regex engine you are
targetting.
By the way, bug-gnu-utils is no longer the preferred bug reporting
address for grep; it means your version of grep is probably quite
outdated. These days, 'grep --help' suggests bug-grep@gnu.org for
reporting bugs.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
Back to gnu.utils.bug | Previous | Next | Find similar
Re: Bug of grep -E Eric Blake <eblake@redhat.com> - 2017-12-06 09:32 -0600
csiph-web