Path: csiph.com!feeder.erje.net!2.us.feeder.erje.net!nntp.club.cc.cmu.edu!micro-heart-of-gold.mit.edu!bloom-beacon.mit.edu!bloom-beacon.mit.edu!171.64.64.130.MISMATCH!usenet.stanford.edu!not-for-mail From: Eric Blake Newsgroups: gnu.utils.bug Subject: Re: Bug of grep -E Date: Wed, 6 Dec 2017 09:32:57 -0600 Organization: Red Hat, Inc. Lines: 96 Approved: bug-gnu-utils@gnu.org Message-ID: References: NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="xbuKLA82Cc61QHNkeILJQJag2Qo8AjMqm" X-Trace: usenet.stanford.edu 1512574386 19479 208.118.235.17 (6 Dec 2017 15:33:06 GMT) X-Complaints-To: action@cs.stanford.edu To: iPack <2741547153@qq.com>, bug-gnu-utils Envelope-to: bug-gnu-utils@gnu.org Openpgp: url=http://people.redhat.com/eblake/eblake.gpg User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Wed, 06 Dec 2017 15:32:59 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: bug-gnu-utils@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Bug reports for the GNU utilities List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com gnu.utils.bug:2224 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --xbuKLA82Cc61QHNkeILJQJag2Qo8AjMqm From: Eric Blake To: iPack <2741547153@qq.com>, bug-gnu-utils Message-ID: Subject: Re: Bug of grep -E References: In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 12/06/2017 09:02 AM, iPack wrote: > [urain39@urain39-pc ~]$ cat test > https://konachan.com/image/a4ff5caad2fa35faa2271df9badacd35/Konachan.co= m%20-%20255941%20blush%20brown_eyes%20crying%20fate_kaleid_liner_prisma_i= llya%20fate_%28series%29%20japanese_clothes%20kimono%20long_hair%20miyu_e= delfelt%20purple_hair%20tagme_%28artist%29%20tears.jpg >=20 > [urain39@urain39-pc ~]$ cat test | grep -Eo '[0-9a-f]{32}/[0-9A-Za-z%_\= =2E\-]+' > a4ff5caad2fa35faa2271df9badacd35/Konachan.com%20-%20255941%20blush%20br= own_eyes%20crying%20fate_kaleid_liner_prisma_illya%20fate_%28series%29%20= japanese_clothes%20kimono%20long_hair%20miyu_edelfelt%20purple_hair%20tag= me_%28artist%29%20tears.jpg >=20 > [urain39@urain39-pc ~]$ cat test | grep -Eo '[0-9a-f]{32}/[0-9A-Za-z\-%= _\.]+' > a4ff5caad2fa35faa2271df9badacd35/Konachan.com%20 >=20 > It is bug ? or just my syntax error ? Your syntax error. In the C locale, [0-9A-Za-z%_\.\-] matches digits, letters, %, _, \ (listed twice, but the second listing is ignored), ., and -. [0-9A-Za-z\-%_\.] matches digits, letters, the range of ASCII bytes between \ and % (whoops - in ASCII, \ is 47 but % is 37 - you have a backwards range, so that portion of the range expression matches nothing at all), then _, \, and . Hence, '-' is not one of the characters matched, and grep's output is shorter. POSIX permits the implementation you saw; it also permits an implementation that refuses to grep at all by declaring your regex invalid because of the backwards range. In non-C locales, use of - in a [] expression that is not either the first or the last member of the set is implementation-defined, and all bets are off on what it matches (lately, GNU tools have been moving towards rational-range-interpretation, which means treating the range as the same bytes as it would match in the C locale; but other implementations, or even older versions of GNU tools, tried to get fancy and match any character that would collate between the two endpoints, which gets weird fast). It _looks_ like you were trying to use \- and \. as escape characters. But inside [] (at least, the Extended Regular Expression syntax of 'grep -E' as defined by POSIX), \ is not an escape character; and nothing needs escaping (there are only special rules about where ], ^, and - are handled). Yes, there are other flavors of regex engines (perl, for example) where \ DOES act as an escape even inside []. Which is why it is essential that you know the quirks of each regex engine you are targetting. By the way, bug-gnu-utils is no longer the preferred bug reporting address for grep; it means your version of grep is probably quite outdated. These days, 'grep --help' suggests bug-grep@gnu.org for reporting bugs. --=20 Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org --xbuKLA82Cc61QHNkeILJQJag2Qo8AjMqm Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEzBAEBCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAlooDakACgkQp6FrSiUn Q2pdlAf8DkJDh9AU1QHg7r8l1XFegtqDtKOIHMafN6EvO2uFoIXyOltaD/Fw3fAC vn+nkfmgoJRlVvzQvQhxfXpbEkfXGrmRFpiBUR1EyhvxJHdWgUy4n/M/w1XU9bhS rbZBgdzUUKWCJICDupxi6Whgj/2DadLkw6ipQ7gyRuQzG9Cv1CxzelikxgCD4mKM CJcXuLT4AxCvWCFVeHDX+zU6nlKnugl915zg/+xvlAI0a/0NaQXG87vYIlTLmbmf Sj2gh0OQ1LnQUyzBKG1lF2/RH7xAiFZxXt7Mmnwso67R/k7g0vzoVyB6OiCrgl/J 8ghbN/qRm/GT/P1w+O4U6acrCJGe1A== =69Kw -----END PGP SIGNATURE----- --xbuKLA82Cc61QHNkeILJQJag2Qo8AjMqm--