Path: csiph.com!eternal-september.org!feeder.eternal-september.org!mx02.eternal-september.org!.POSTED!not-for-mail From: Ben Bacarisse Newsgroups: comp.lang.python Subject: Re: Unicode normalisation [was Re: [beginner] What's wrong?] Date: Sat, 09 Apr 2016 19:27:01 +0100 Organization: A noiseless patient Spider Lines: 20 Message-ID: <874mba63ka.fsf@bsb.me.uk> References: <2796705.edb3E9ArW3@PointedEars.de> <1584744.4h7ToaqLat@PointedEars.de> <5705b9ef$0$1611$c3e8da3$5496439d@news.astraweb.com> <570748ec$0$1620$c3e8da3$5496439d@news.astraweb.com> <874mbcgfmd.fsf@elektro.pacujo.net> <753cdb8b-9f94-48d6-bc0d-589efba86afc@googlegroups.com> <87h9fa6gok.fsf@bsb.me.uk> <99446a57-07dd-4438-a14e-8b73e52d3e18@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: mx02.eternal-september.org; posting-host="017616aa25f81ec581c44d76d61ba2f3"; logging-data="1846"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/c8zrKjOSagOPnYN7M8z6i9iUvIuXmjgA=" Cancel-Lock: sha1:aq6TAA6Yk4269FIxHIPDmrb5bQ0= sha1:GDTzN3PTQTCYaoF2XuINQwvs250= X-BSB-Auth: 1.d012103e375ff5e67675.20160409192701BST.874mba63ka.fsf@bsb.me.uk Xref: csiph.com comp.lang.python:106749 Rustom Mody writes: > On Saturday, April 9, 2016 at 7:14:05 PM UTC+5:30, Ben Bacarisse wrote: >> The problem with that theory is that 'er/re' (this is e and r in either >> order) is the 3rd most common pair in English but have been placed >> together. ou and et (in either order) are the 15th and 22nd most common >> and they are separated by only one hammer position. On the other hand, >> the QWERTY layout puts jk together, but they almost never appear >> together in English text. > > Where do you get this (kind of) statistical data? It was generated by counting the pairs found in a corpus of texts taken from Project Gutenberg. The numbers do very depending on what you pick (for the complete works of Mark Twain er/re is second, for example), and the none of the texts are very modern (because of the source) but I doubt that matters too much. -- Ben.