Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!newsfeed.eweka.nl!eweka.nl!feeder3.eweka.nl!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.004 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'string.': 0.04; 'instance,': 0.05; 'defines': 0.07; 'rules.': 0.07; 'will,': 0.07; '"a"': 0.09; 'syntax.': 0.09; 'am,': 0.12; 'skip:" 40': 0.15; '10:45': 0.16; 'doubtful': 0.16; 'needed?': 0.16; 'sign,': 0.16; 'string:': 0.16; 'wrote:': 0.18; '>>>': 0.18; 'trying': 0.21; 'dec': 0.22; 'header:In-Reply-To:1': 0.22; 'literal': 0.23; 'pieces': 0.23; "skip:' 40": 0.23; 'string': 0.24; 'sender:addr:gmail.com': 0.25; 'guess': 0.26; 'import': 0.27; 'url:mailman': 0.28; 'message-id:@mail.gmail.com': 0.28; '27,': 0.29; 'expressions': 0.29; 'replaced': 0.29; 'fairly': 0.30; 'down,': 0.30; 'received:209.85.210.46': 0.30; 'received:mail- pz0-f46.google.com': 0.30; 'strings,': 0.30; 'whitespace': 0.30; 'translate': 0.31; 'anyone': 0.31; 'tue,': 0.32; 'url:listinfo': 0.32; "can't": 0.32; 'that,': 0.33; 'there': 0.33; 'to:addr :python-list': 0.34; 'probably': 0.34; 'regular': 0.35; 'url:python': 0.36; 'but': 0.37; 'list,': 0.37; 'received:google.com': 0.37; 'seeing': 0.38; 'received:209.85': 0.38; 'characters': 0.39; 'url:org': 0.39; 'help': 0.39; "it's": 0.40; 'received:209': 0.40; 'to:addr:python.org': 0.40; 'might': 0.40; 'once': 0.60; 'more': 0.61; '2011': 0.61; 'your': 0.61; 'ever': 0.65; 'order,': 0.73; '100%': 0.82; 'fields,': 0.91; 'letters,': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=pBRENLWXM2hcdPZwhwYpgGqL/YqRbHWA4HyRe8IzSy0=; b=hNrLtRmftFbM1kcKlyCV8834lOfxm2lxljyBXalbg8Pcw3ie4vBo52FJYY7/er7gn5 V9vzZIR2PWQImXNocVUeYCuJUaUEbpHS+KSDYMOQ3bzNIr8F7xoRQ2382tzgHO9r03UQ ug7jTYMAfQU9ou0qp2NUboZ0wkkWxQQWNPyFU= MIME-Version: 1.0 Sender: jsf80238@gmail.com In-Reply-To: References: <495b6fe6-704a-42fc-b10b-484218ad8409@b20g2000pro.googlegroups.com> Date: Tue, 27 Dec 2011 00:16:51 +0000 X-Google-Sender-Auth: ed95f4UiSW5sQ49L0hOLIOlUFY0 Subject: Re: Regular expressions From: Jason Friedman To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 39 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1324945014 news.xs4all.nl 6898 [2001:888:2000:d::a6]:40830 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:17997 > On Tue, Dec 27, 2011 at 10:45 AM, mauriceling@acm.org > wrote: >> Hi >> >> I am trying to change to . >> >> Can anyone help me with the regular expressions needed? > > A regular expression defines a string based on rules. Without seeing a > lot more strings, we can't know what possibilities there are for each > part of the string. You probably know your data better than we ever > will, even eyeballing the entire set of strings; just write down, in > order, what the pieces ought to be - for instance, the first token > might be a literal @ sign, followed by three upper-case letters, then > a hyphen, then any number of alphanumerics followed by a colon, etc. > Once you have that, it's fairly straightforward to translate that into > regex syntax. > > ChrisA > -- > http://mail.python.org/mailman/listinfo/python-list The OP told me, off list, that my guess was true: > Can we say that your string: > 1) Contains 7 colon-delimited fields, followed by > 2) whitespace, followed by > 3) 3 colon-delimited fields (A, B, C), followed by > 4) a colon? > The transformation needed is that the whitespace is replaced by a > slash, the "A" characters are taken as is, and the colons and fields > following the "A" characters are eliminated? Doubtful that my guess was 100% accurate, but nevertheless: >>> import re >>> string1 = "@HWI-ST115:568:B08LLABXX:1:1105:6465:151103 1:N:0:" >>> re.sub(r"(\S+)\s+(\S+?):.+", "\g<1>/\g<2>", string1) '@HWI-ST115:568:B08LLABXX:1:1105:6465:151103/1'