Path: csiph.com!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail From: Chris Angelico Newsgroups: comp.lang.python Subject: Re: Refactoring in a large code base Date: Sat, 23 Jan 2016 01:08:06 +1100 Lines: 48 Message-ID: References: <87bn8ec90z.fsf@elektro.pacujo.net> <93ffa9a5-b245-414e-b255-7f182e22e799@googlegroups.com> <874me5digp.fsf@elektro.pacujo.net> <4fdff224-13f3-455d-bb33-f95bcb6960b9@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: news.uni-berlin.de g4vez1IX/Y4naf4qPE+yAgSleTY4lXB0er0cAjCEXbPg== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.003 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'cpython': 0.05; 'json': 0.05; 'paths': 0.05; 'implements': 0.07; 'subject:code': 0.07; 'cc:addr:python-list': 0.09; 'fetch': 0.09; 'jan': 0.11; '\'"\'),': 0.16; '*always*': 0.16; '*i*': 0.16; '12:30': 0.16; '2016': 0.16; '23,': 0.16; 'arbitrarily': 0.16; 'barrier': 0.16; 'code).': 0.16; 'developer"': 0.16; 'distinct': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'geek': 0.16; 'get)': 0.16; 'lexer': 0.16; 'nerds': 0.16; 'readable': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'wrote:': 0.16; "wouldn't": 0.16; 'instance,': 0.18; ';-)': 0.18; 'language': 0.19; 'developer': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'fairly': 0.22; 'keywords,': 0.22; 'parser': 0.22; 'programming': 0.22; 'am,': 0.23; 'code.': 0.23; '(or': 0.23; 'nearly': 0.23; 'sat,': 0.23; 'this:': 0.23; 'header :In-Reply-To:1': 0.24; '(which': 0.26; 'question': 0.27; 'message- id:@mail.gmail.com': 0.27; '(it': 0.29; 'handful': 0.29; "they'll": 0.29; 'thinks': 0.29; 'understand,': 0.29; "i'm": 0.30; 'code': 0.30; 'becomes': 0.30; 'probably': 0.31; 'anyone': 0.32; 'up.': 0.32; 'knows': 0.32; 'skills.': 0.32; 'point': 0.33; 'problem': 0.33; 'languages': 0.34; 'file': 0.34; 'received:google.com': 0.35; 'could': 0.35; 'text': 0.35; 'weekend': 0.35; 'quite': 0.35; 'something': 0.35; 'problem.': 0.35; 'but': 0.36; 'should': 0.36; 'there': 0.36; 'lines': 0.36; 'received:209.85': 0.36; 'cases': 0.36; 'structures': 0.36; 'subject:: ': 0.37; 'two': 0.37; 'beyond': 0.37; 'received:209.85.213': 0.37; 'things': 0.38; 'received:209': 0.38; 'someone': 0.38; 'mean': 0.38; 'sure': 0.39; 'does': 0.39; 'where': 0.40; 'space': 0.40; 'easy': 0.60; 'skip:u 10': 0.61; 'different': 0.63; 'oldest': 0.66; 'real-world': 0.66; "they're": 0.66; 'areas': 0.67; 'worth': 0.67; 'covers': 0.72; 'skill': 0.76; 'chrisa': 0.84; 'complexity': 0.84; 'expect.': 0.84; 'iow': 0.84; 'man-made': 0.84; 'pike': 0.84; 'subject:base': 0.84; 'to:none': 0.91; '"one': 0.91; 'same,': 0.91; 'confidence': 0.95; 'serious': 0.97 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=GQEiGT8MNcjrSXayZ2NeIU0VkGkgzYMGS3gXqJHmTT8=; b=dqBX/N9elQmJO3sOYr4Far7sK8nqJpHzXSpaZpsgVgaLsWkMHH8M5tcDTuVrre8KBv AZ87MMqrtwA55PTV6EtHgSalDNJb1XI4JAvhgjq1RA6x0/UU//Qa8ZVz1OydpdxB4t78 llQ/Mr5NYLEG+mUiaZg23s4XZ/9+Uaudauv5PxITnY7r1TXkat86cnHNsi+VGIE+klzT e8AHkQ9avXt+jUN7cyJUatvMPbmXCcc8oosezBrBAGLrbNwq1sIROKTxoOnG1s2ZOEvs 08ObLuMJyJ1dOnXxyWr8VfsvjwZHiu3kpzEKOsSzztIJS1yYrLioDZwHzuR6B3AYToB2 BBPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:cc:content-type; bh=GQEiGT8MNcjrSXayZ2NeIU0VkGkgzYMGS3gXqJHmTT8=; b=UIMydgib9Ddc17TmzeZyONVANCFCH8U2Ay3gGVnUIF70mnIck6fsZ/b3iROJTXVJAo RDNy549IohDZ8OHg7VHviH/Q9X50ArVkwTzo4cgy3AK9hC0Gl4tMyVD95QT42QDN/oGO ZS2jNkccrxwOGbybTuUhaWXYnxlV4hSAd0w+NGRy4uQgmCNpfV7B0H7TeGYqsLYn9WEw tj8bEptpNYFFEvBaYm8D45uDdyIsbX/m2EwhJRfa2rPT9eS5AndUhVkYCp9vNon9OgZG Dnz+MKr4AHAKMg7JF4DkghoFBSmSV3zpw6QehZntAnfshqKw9iQ+IM+m0Hns01NjEVAW +e9w== X-Gm-Message-State: AG10YORnbeyPv6ET0KQ5HVLHepFq2ENdaUhiyqQ0y7o3Ju/RqqEVolfMyS8Qwrm70d1hpYo+iUP1NAh8HKfrYg== X-Received: by 10.50.28.19 with SMTP id x19mr3803083igg.92.1453471686466; Fri, 22 Jan 2016 06:08:06 -0800 (PST) In-Reply-To: <4fdff224-13f3-455d-bb33-f95bcb6960b9@googlegroups.com> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:102016 On Sat, Jan 23, 2016 at 12:30 AM, Rustom Mody wrote: > You just gave a graphic vivid description... > of the same thing Marko is describing: ;-) viz. > A full-size language parser is something that you - an experienced developer - > make a point of avoiding. It's worth noting that "experienced developer" covers a huge range of skills. There are quite a few other areas that I do not tinker with (crypto, CPU-level optimizations, and such), not because they're impossible to understand, but because *I* have not the skill to understand and improve them. This does mean they're complicated (they're beyond the "one weekend of tinkering" barrier that any serious geek should be able to invest), but I'm sure there are language nerds out there who are so familiar with the grammar of that they'll pick up CPython's grammar and make a change with confidence that it'll do what they expect. > So then the question comes down to this: Is this the order of nature? > Or is it man-made disorder? > Jury's out on that one for lexers/parsers specifically. Lexers/parsers are as complicated as the grammars they parse. A lexer for a simple structured text file can be pretty easy to implement; for instance, JSON is pretty straight-forward, with only a handful of cases (insignificant whitespace, three keywords, two recursive structures that start with specific characters ('{' and '['), strings (which start with '"'), and numbers (which start with a digit or a hyphen)), so a parser need only look for those few possibilities and it knows exactly what else to fetch up. I could probably write a JSON parser in a fairly short space of time, and wouldn't be scared of digging into the internals of someone else's. It's when the grammar adds complexities to deal with the real-world issues of full size programming languages that it becomes hairier. The CPython grammar is only ~150 lines of fairly readable directives, but the parser that implements it is ~3500 lines of C code. Pike merges the two into a YACC file of nearly 5000 lines of highly optimized code (it has different grammar paths for things a human would consider the same, in order to produce distinct code). That's where I'm ubercautious. > For arbitrary code in general, the problem that it may be arbitrarily and unboundedly > complex/complicated is the oldest problem in computer science: the halting problem. > > IOW anyone who thinks that *arbitrary* complexity can *always* be tamed either > has a visa to utopia or needs to re-evaluate (or get) a CS degree Exactly. ChrisA