Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed1a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.002 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '*not*': 0.07; 'needed,': 0.07; 'string': 0.09; 'builtin': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'def': 0.12; 'jan': 0.12; '")"': 0.16; '"|"': 0.16; 'advice:': 0.16; 'interpreter,': 0.16; 'lexer': 0.16; 'lexical': 0.16; 'nightmare': 0.16; 'normal,': 0.16; 'prog': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'reedy': 0.16; 'self.file': 0.16; 'stringprefix': 0.16; 'language': 0.16; 'wrote:': 0.18; 'subject:need': 0.19; 'written': 0.21; 'code,': 0.22; 'separate': 0.22; 'header:User-Agent:1': 0.23; 'errors.': 0.24; 'equivalent': 0.26; 'certain': 0.27; 'skip:" 20': 0.27; 'header:X-Complaints-To:1': 0.27; 'header:In- Reply-To:1': 0.27; 'point': 0.28; 'chris': 0.29; 'absolute': 0.30; 'matching': 0.30; 'comments,': 0.31; 'keywords,': 0.31; 'comment': 0.34; 'except': 0.35; 'something': 0.35; 'definition': 0.35; 'but': 0.35; 'there': 0.35; 'keyword': 0.36; 'limitations': 0.36; 'should': 0.36; 'list': 0.37; 'depends': 0.38; 'skip:[ 10': 0.38; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'received:71': 0.39; 'sure': 0.39; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'how': 0.40; '2nd': 0.60; 'first': 0.61; 'name': 0.63; 'more': 0.64; 'skip:r 40': 0.68; 'deeply': 0.69; 'skip:r 30': 0.69; '1st': 0.74; "'class'": 0.84; "it'd": 0.84; 'received:fios.verizon.net': 0.84; 'regexp': 0.84; 'tolerance': 0.84 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Terry Reedy Subject: Re: I need an idea for practise! Date: Thu, 17 Jul 2014 17:46:51 -0400 References: <6239bcaa-828f-499b-936d-69d022bb94ac@googlegroups.com> <467108ec-19e7-4089-8d5f-53a80244adaf@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Gmane-NNTP-Posting-Host: pool-71-175-90-87.phlapa.fios.verizon.net User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 47 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1405633653 news.xs4all.nl 2842 [2001:888:2000:d::a6]:38850 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:74693 On 7/17/2014 1:20 PM, Chris Angelico wrote: > By the way, one specific point about RR's advice: A colorizer should > *not* be written using regexps. It'd make for an absolute nightmare of > impossible-to-debug regexp strings, plus there are fundamental > limitations on what you can accomplish with them. You need to use a > lexer - a lexical analyzer. Basically, to correctly colorize code, you > need to have something equivalent to the first part of the language > interpreter, but with a lot more tolerance for errors. That's a pretty > big thing to write as regexps. It depends on how deeply one wants to colorize. idlelib.ColorDelegator colors comments, strings, keywords, builtin names, and names following 'def' and 'class' with the following regexes. def any(name, alternates): "Return a named group pattern matching list of alternates." return "(?P<%s>" % name + "|".join(alternates) + ")" def make_pat(): kw = r"\b" + any("KEYWORD", keyword.kwlist) + r"\b" builtinlist = [str(name) for name in dir(builtins) if not name.startswith('_') and \ name not in keyword.kwlist] # self.file = open("file") : # 1st 'file' colorized normal, 2nd as builtin, 3rd as string builtin = r"([^.'\"\\#]\b|^)" + any("BUILTIN", builtinlist) + r"\b" comment = any("COMMENT", [r"#[^\n]*"]) stringprefix = r"(\br|u|ur|R|U|UR|Ur|uR|b|B|br|Br|bR|BR|rb|rB|Rb|RB)?" sqstring = stringprefix + r"'[^'\\\n]*(\\.[^'\\\n]*)*'?" dqstring = stringprefix + r'"[^"\\\n]*(\\.[^"\\\n]*)*"?' sq3string = stringprefix + r"'''[^'\\]*((\\.|'(?!''))[^'\\]*)*(''')?" dq3string = stringprefix + r'"""[^"\\]*((\\.|"(?!""))[^"\\]*)*(""")?' string = any("STRING", [sq3string, dq3string, sqstring, dqstring]) return kw + "|" + builtin + "|" + comment + "|" + string +\ "|" + any("SYNC", [r"\n"]) prog = re.compile(make_pat(), re.S) idrog = re.compile(r"\s+(\w+)", re.S) asprog = re.compile(r".*?\b(as)\b") I am not sure if the separate definition for as is still needed, or is a holdover from when 'as' was not a keyword except in certain contexts. -- Terry Jan Reedy