Path: csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'read.': 0.03; 'skip:[ 20': 0.04; 'yet.': 0.04; 'encoding': 0.05; 'explicitly': 0.05; 'subject:Python': 0.06; '*not*': 0.07; 'tests.': 0.07; 'utf-8': 0.07; 'string': 0.09; '%s",': 0.09; '2to3': 0.09; 'already.': 0.09; 'anticipate': 0.09; 'ascii': 0.09; 'lines.': 0.09; 'msg': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; 'bug': 0.12; 'jan': 0.12; 'changes': 0.15; "'r',": 0.16; '*only*': 0.16; 'agree.': 0.16; 'codec': 0.16; 'csv': 0.16; 'easier.': 0.16; 'moving,': 0.16; 'non-ascii': 0.16; 'ordinal': 0.16; 'porting': 0.16; 'programmer,': 0.16; 'reedy': 0.16; 'self.args': 0.16; 'skip:" 100': 0.16; 'unicode,': 0.16; 'utf-8)': 0.16; 'wrote:': 0.18; 'code.': 0.18; 'library': 0.18; 'module': 0.19; '>>>': 0.22; 'import': 0.22; 'handles': 0.22; 'cc:addr:python.org': 0.22; 'byte': 0.24; 'parse': 0.24; 'replace': 0.24; 'specify': 0.24; 'unicode': 0.24; 'cc:2**0': 0.24; 'developers': 0.25; 'logging': 0.26; 'this:': 0.26; 'supported': 0.26; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; "doesn't": 0.30; 'change,': 0.30; 'message- id:@mail.gmail.com': 0.30; 'url:mailman': 0.30; 'code': 0.31; 'lines': 0.31; "skip:' 10": 0.31; 'easy,': 0.31; 'everywhere': 0.31; 'libraries': 0.31; 'ok.': 0.31; 'file': 0.32; 'this.': 0.32; "we're": 0.32; 'another': 0.32; 'url:python': 0.33; '(most': 0.33; 'programmers': 0.33; 'reader': 0.33; 'trouble': 0.34; 'skip:d 20': 0.34; 'subject:the': 0.34; 'could': 0.34; "can't": 0.35; 'knows': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'edge': 0.36; 'url:listinfo': 0.36; 'next': 0.36; 'url:org': 0.36; 'should': 0.36; 'effort': 0.37; 'application': 0.37; 'easily': 0.37; 'server': 0.38; 'handle': 0.38; 'needed': 0.38; 'that,': 0.38; 'recent': 0.39; 'url:mail': 0.40; 'how': 0.40; 'skip:u 10': 0.60; 'easy': 0.60; 'company': 0.60; 'logged': 0.60; 'most': 0.60; 'year.': 0.61; 'lost': 0.61; 'new': 0.61; 'skip:* 10': 0.61; 'our': 0.64; 'teaching': 0.64; 'more': 0.64; 'benefit': 0.68; 'mar': 0.68; '2015': 0.84; 'change?': 0.84; 'everywhere.': 0.84; 'hard.': 0.84; '8bit%:83': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; bh=z9krVCXD2VCjQ4k9CYjwq8Jkk+rKkFiH3o9cyJu9pb4=; b=qIUhxvmG+gLWa5CY5zFvGefcvm7XqMRm9n0QiZagpGHf+ydA+3yQo8E6s008P1n+Ki 3KFe5VmTwWqJBhHY4WXukfxtX0iYQmwM68xXRSF1JCrtrOVMlv7k+2KRr0TaOpgpgvAG qNF4pgHTHBztd258ujWk3Fkj9hJKd0G0drhlLCPCS61v6tZfaU4zLVXu1hvPfjiyvegX mF/RyGzi/t7/1mxmy/XGcMGyySVervHd+VqNYnVBzx+ie1gXqgXWlomOjGepks/zpWeU p0DXLwGAS11FhDo/qlQP/pPdIQ7U2stWH5i5bnym1MrtZkT+rO7SMvpl37MD3vdxxf02 VbrQ== X-Received: by 10.202.210.215 with SMTP id j206mr18207454oig.131.1426534891274; Mon, 16 Mar 2015 12:41:31 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <20150316025301.GA94576@cskk.homeip.net> <873855tts4.fsf@jester.gateway.sonic.net> <55068cb0$0$12923$c3e8da3$5496439d@news.astraweb.com> <8761a1gxhq.fsf@jester.gateway.sonic.net> From: INADA Naoki Date: Tue, 17 Mar 2015 04:41:11 +0900 Subject: Re: Python 2 to 3 conversion - embrace the pain To: Terry Reedy Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.19 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 104 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1426534899 news.xs4all.nl 2916 [2001:888:2000:d::a6]:58207 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:87582 On Tue, Mar 17, 2015 at 2:47 AM, Terry Reedy wrote: > On 3/16/2015 5:13 AM, INADA Naoki wrote: > >> Another experience is porting Flask application in my company from >> Python 2 to Python 3. >> It has 26k lines of code and 7.6k lines of tests. >> >> Since we don't need to support both of PY2 and PY3, we used 2to3. >> 2to3 changes 740 lines. > > > That is less than 3% of the lines. Were any changes incorrect? How many > lines *not* flagged by 2to3 needed change? All changes are OK. Flask (and Werkzeug) handles most part of pain. Application using Flask uses unicode most everywhere on Python 2 already. Few changes 2to3 can't handle is like this: - reader =3D DictReader(open(file_path, 'r'), delimiter=3D'\t') + reader =3D DictReader(open(file_path, 'r', encoding=3D'utf-8'), delimi= ter=3D'\t') Since csv module in Python 2 doesn't support unicode, we had to parse csv as bytestring. And our server doesn't have utf-8 locale, we should specify encoding explicitly on PY3. There were few (less than 10, maybe) easy trouble like this. > >> I had to replace google-api-client with >> requests+oauthlib since >> it had not supported PY3 yet. > > > Other than those needed for this change, which 2to3 could not anticipate = or > handle? > >> After that, we encountered few trouble with untested code. But Porting >> effort is surprisingly small. >> We're happy now with Python 3. We can write non-ascii string to log >> without fear of UnicodeError. >> We can use csv with unicode without hack. > > > People who use ascii only or perhaps one encoding everywhere severely > underestimate the benefit of unicode strings (and utf-8) everywhere. I agree. We may lost log easily on Python 2. It makes investigating bug har= d. >>> import logging >>> logging.error("%s %s", u'=E3=81=93=E3=82=93=E3=81=AB=E3=81=A1=E3=81= =AF', '=E3=81=93=E3=82=93=E3=81=AB=E3=81=A1=E3=81=AF') Traceback (most recent call last): ... File "/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Vers= ions/2.7/lib/python2.7/logging/__init__.py", line 335, in getMessage msg =3D msg % self.args UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128) Logged from file , line 1 And log including unicode is hard to read. >>> logging.error("%s", [u'=E3=81=93=E3=82=93=E3=81=AB=E3=81=A1=E3=81= =AF']) ERROR:root:[u'\u3053\u3093\u306b\u3061\u306f'] Python 3 makes our development faster and easier. Since old Python programmers knows how to avoid pitfalls in Python 2, writing Python 2 is not a pain. But when teaching Python to PHP programmer, teaching tons of pitfalls is pa= in. This is why I think new applications should start with Python 3. > >> Porting *modern* *application* code to *PY3 only* is easy, while >> porting libraries on the edge of >> bytes/unicode like google-api-client to PY2/3 is not easy. >> >> I think application developers should use *only* Python 3 from this year= . >> If we start moving, more library developers will be able to start >> writing Python 3 only code from next year. > > > -- > Terry Jan Reedy > > -- > https://mail.python.org/mailman/listinfo/python-list -- INADA Naoki