Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed1a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.017 X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'encoding': 0.05; 'subject:text': 0.05; 'laura': 0.07; 'utf-8': 0.07; 'indication': 0.09; 'strings.': 0.09; 'will,': 0.09; 'subject:question': 0.10; 'cc:addr:python-list': 0.11; '>that': 0.16; 'ah,': 0.16; 'did.': 0.16; 'encoding.': 0.16; 'encodings': 0.16; 'non-ascii': 0.16; 'received:openend.se': 0.16; 'received:theraft.openend.se': 0.16; 'url:detail': 0.16; 'utf-8)': 0.16; 'wanted.': 0.16; 'appropriate': 0.16; 'so.': 0.16; 'do.': 0.18; 'work,': 0.20; 'feb': 0.22; '(in': 0.22; 'cc:addr:python.org': 0.22; 'cc:2**1': 0.23; 'char': 0.24; 'string,': 0.24; 'unicode': 0.24; 'header:In- Reply-To:1': 0.27; 'idea': 0.28; 'character': 0.29; 'nature': 0.30; 'sets': 0.30; 'included': 0.31; 'code': 0.31; 'apparently': 0.31; 'writes:': 0.31; 'cc:no real name:2**1': 0.33; 'guess': 0.33; 'knows': 0.35; 'consistent': 0.36; 'data,': 0.36; 'url:support': 0.36; 'doing': 0.36; 'charset:us-ascii': 0.36; 'starting': 0.37; 'that,': 0.38; 'sure': 0.39; 'users': 0.40; 'how': 0.40; 'read': 0.60; 'dave': 0.60; 'is.': 0.60; 'most': 0.60; 'back': 0.62; 'header:Message-Id:1': 0.63; 'more': 0.64; 'world': 0.66; 'sample': 0.67; 'europe': 0.67; 'promise': 0.68; 'default': 0.69; '2015': 0.84; 'cares.': 0.84; "it'd": 0.84; 'nonsense.': 0.84; 'received:89': 0.85; 'western': 0.86; 'angel': 0.91; 'loves': 0.93; 'url:cn': 0.93; 'europe,': 0.95 To: Dave Angel From: Laura Creighton Subject: Re: Newbie question about text encoding In-Reply-To: Message from Dave Angel of "Tue, 24 Feb 2015 12:13:24 -0500." <54ECB134.5090304@davea.name> References: <54EC5FA4.6070703@davea.name> <201502241455.t1OEtffT016452@fido.openend.se> <201502241507.t1OF7aUm018883@fido.openend.se> <201502241524.t1OFO09k022270@fido.openend.se> <201502241620.t1OGKf4n002146@fido.openend.se><54ECB134.5090304@davea.name> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <13090.1424807154.1@fido> Date: Tue, 24 Feb 2015 20:45:54 +0100 Cc: python-list@python.org, lac@openend.se X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 38 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1424807175 news.xs4all.nl 2855 [2001:888:2000:d::a6]:51122 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:86340 In a message of Tue, 24 Feb 2015 12:13:24 -0500, Dave Angel writes: >With a sample of one string, how did you read "all his strings". And >with one non-ASCII code in that single string, how did you know that >'latin1' was the only encoding that included a reasonable character at >that encoding? Ah, 2 strings. And I did not promise that latin1 was the only encoding that included a reasonable char at his encoding. I only proinmised that it was one that did. And, given the nature of the data, I was pretty sure that this was the one he wanted. If it did not work, he would come back and complain. >See http://support.esri.com/cn/knowledgebase/techarticles/detail/21106 > >according to that page, starting at ArcGIS 10.2.1, the default sets the >code page to UTF-8 (UNICODE) in the shapefile (.DBF) Who cares. In Europe, among Europeans, we are used to seeing Latin1 or Latin2. >My guess is that this is only appropriate for users who use only locally >created data. Since the OP's data is apparently old (if it were current >versions, it'd have been utf-8), who knows how consistent the encoding is. I do. Very much so. The idea that the whole world loves utf-8 is nonsense. Most of europe has been using latin1, latin2 etc. before unicode was invented and will, as far as I know, continue to use it. Oldness is an indication that latin1 is more likely to be the encoding than uft-8. Your guess is that latin1 is only used in local encodings. My data is that, we in Western Europe, have this format pretty much all of the time, for everywhere, unless you are only doing local encodings (in which case you would use utf-8) Laura