Path: csiph.com!weretis.net!feeder6.news.weretis.net!nntp.club.cc.cmu.edu!micro-heart-of-gold.mit.edu!bloom-beacon.mit.edu!bloom-beacon.mit.edu!171.64.64.130.MISMATCH!usenet.stanford.edu!not-for-mail From: =?UTF-8?Q?Julian_B=c3=bcning?= Newsgroups: gnu.utils.bug Subject: GNU regex does not terminate when matching some regular expressions on any input Date: Wed, 11 Apr 2018 14:20:22 +0200 Lines: 65 Approved: bug-gnu-utils@gnu.org Message-ID: NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-Trace: usenet.stanford.edu 1523453187 32570 208.118.235.17 (11 Apr 2018 13:26:27 GMT) X-Complaints-To: action@cs.stanford.edu Cc: Daniel Schemmel To: Envelope-to: bug-gnu-utils@gnu.org X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A2BXBQC5+81a/54agoZcHAEBAQQBAQoBA?= =?us-ascii?q?YNCYYEhg1iVE4FTgRYalFsLI4FXgwmCUiE4FAECAQEBAQEBAmwcDIVMDwFGNQI?= =?us-ascii?q?mAkkWDQEHAQEFhQQPpk2CHIgmJYIfCQF/hEuBH3uCE4EMIgyCCBqFb4I0glQCl?= =?us-ascii?q?1wIdIRiiFuHRQWFAIkghkgCAgICCQIUgSUzIYFScYMSCII/iEiFQI0NAYEWAQE?= X-IPAS-Result: =?us-ascii?q?A2BXBQC5+81a/54agoZcHAEBAQQBAQoBAYNCYYEhg1iVE4F?= =?us-ascii?q?TgRYalFsLI4FXgwmCUiE4FAECAQEBAQEBAmwcDIVMDwFGNQImAkkWDQEHAQEFh?= =?us-ascii?q?QQPpk2CHIgmJYIfCQF/hEuBH3uCE4EMIgyCCBqFb4I0glQCl1wIdIRiiFuHRQW?= =?us-ascii?q?FAIkghkgCAgICCQIUgSUzIYFScYMSCII/iEiFQI0NAYEWAQE?= X-IronPort-AV: E=Sophos;i="5.48,436,1517871600"; d="scan'208";a="42769428" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 Content-Language: en-US X-Originating-IP: [134.61.69.65] X-ClientProxiedBy: rwthex-s1-a.rwth-ad.de (2002:8682:1a98::8682:1a98) To rwthex-w2-a.rwth-ad.de (2002:8682:1a9e::8682:1a9e) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 134.130.5.47 X-Mailman-Approved-At: Wed, 11 Apr 2018 09:26:25 -0400 X-BeenThere: bug-gnu-utils@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Bug reports for the GNU utilities List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com gnu.utils.bug:2241 While we are fully aware that GNU regex has been decommissioned for many years (last release April 1993), several (in some cases modified) versions are still in use in a number of actively maintained free and source software projects. In some cases, GNU regex functions only as a fallback, in others it provides the sole implementation of regular expression matching. For example, GNU libiberty [1], mutt [2] and squid [3] still utilize versions of GNU regex. We therefore report this bug not in the expectation that it will be fixed, but rather to remind developers of software still using GNU regex that it might be time to cease using GNU regex in their projects. As suggested in [4], Gnulib provides a modern alternative that should be able to replace GNU regex with reasonable effort. Example: regex_t preg; regmatch_t matched_range; int err = regcomp(&preg, ".?^*", REG_EXTENDED); if (err != 0) { char msg[255]; regerror(err, &preg, msg, 255); printf("%s\n", msg); return -1; } regexec(&preg, "test", 1, &matched_range, 0); Observed behavior: When using GNU regex as implementation of regcomp() and regexec(), the latter does never return on any input string. That is, matching the compiled regular expression ".?^*" against "test", but also any other input, including the empty string. Alternative behavior: When using a current system provided glibc implementation, regcomp() returns an error code and regerror() outputs "Invalid preceding regular expression" as message. There are similar regular expressions that exhibit the same behavior, such as ".*^+" or ".*$*". While we were not able to trace down the origin of the bug, it seems like the result of regcomp() triggers an infinite loop in regexec() by pushing and popping failure points. Note, that GNU Emacs [5] also uses a modified version of GNU regex, which does not seem to exhibit this particular bug. This bug was found in klee-uclibc (which uses GNU regex by default) using Symbolic Execution techniques developed in the course of the SYMBIOSYS research project at COMSYS, RWTH Aachen University. This research is supported by the European Research Council (ERC) under the EU's Horizon 2020 Research and Innovation Programme grant agreement n. 647295 (SYMBIOSYS). [1] https://gcc.gnu.org/onlinedocs/libiberty/ [2] http://www.mutt.org/ [3] http://www.squid-cache.org/ [4] https://www.gnu.org/software/regex/ [5] https://www.gnu.org/software/emacs/ Best regards, Julian