Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'resulting': 0.04; 'anyway.': 0.05; 'encoding': 0.05; 'output': 0.05; 'binary': 0.07; 'string': 0.09; 'badly': 0.09; 'function,': 0.09; 'pyodbc': 0.09; '0-127': 0.16; 'inserting': 0.16; 'jacob': 0.16; 'range,': 0.16; 'range.': 0.16; 'received:opentransfer.com': 0.16; 'subject:String': 0.16; 'subject:format': 0.16; 'subject:type': 0.16; 'subject:when': 0.16; 'thoughts': 0.19; 'fit': 0.20; 'seems': 0.21; 'preferred': 0.22; 'module,': 0.24; 'sort': 0.25; 'script': 0.25; 'pass': 0.26; 'subject:/': 0.26; 'values': 0.27; 'character': 0.29; 'characters': 0.30; "i'm": 0.30; 'code': 0.31; 'strip': 0.31; 'run': 0.32; 'another': 0.32; 'subject:from': 0.34; 'could': 0.34; 'basic': 0.35; "can't": 0.35; 'something': 0.35; 'hundreds': 0.35; 'really': 0.36; 'subject:data': 0.36; 'subject:one': 0.36; 'example,': 0.37; 'generic': 0.38; 'handle': 0.38; 'to:addr:python-list': 0.38; 'files': 0.38; 'issue': 0.38; 'to:addr:python.org': 0.39; 'space': 0.40; 'documents,': 0.60; 'most': 0.60; 'tell': 0.60; 'received:unknown': 0.61; 'range': 0.61; 'our': 0.64; 'within': 0.65; 'side': 0.67; 'fields,': 0.84 Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=blindza.co.za; s=dkim-shared; x=1421233859; h=Comment: DomainKey-Signature:Received:Received:Message-ID:From:To:Subject: Date:MIME-Version:Content-Type; bh=LLb708WHy1OenuAL8UgTo0qBQTVvc V2c+BDuMn/cGAI=; b=XeQ6so7gp0LPvKuhKgVyFFHvNrLdWc3bAQzZamz71mIsk Ht5a9rSXwO4wrgjcbm8eplHaBFu2H0xODUOYhLuYCR0YwfRVyJHi1Ha9bGhtOI9c pD73iHqz8GG1fbfa0n07p7uU7GKLtphPDrLttXDuIm8F8UuMihcATHRKg76rns= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dkim-shared; d=blindza.co.za; h=Received:Received:X-Originating-IP:Message-ID:From:To:Subject:Date:MIME-Version:Content-Type:X-Priority:X-MSMail-Priority:X-Mailer:X-MimeOLE; b=Q63/eAyH7rIzPj16H2kF5Lo89f1SjY19ISC+jtFE9wD5x1WZ6bsFtw+W6xp6Hg 4UrOpQbmYaG6mHrGWomkUNKnOHrYCKXCRdKbyUOG+1W25oGKSH0Hr5NbI3yDRCOE peptiYNgQHNROcdcBAPea+AuuNHfFOVe78A3rok5Yk6KY=; X-Originating-IP: 105.237.27.180 From: "Jacob Kruger" To: Subject: String character encoding when converting data from one type/format to another Date: Wed, 7 Jan 2015 13:04:16 +0200 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0021_01D02A7A.71409920" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Windows Mail 6.0.6002.18197 X-MimeOLE: Produced By Microsoft MimeOLE V6.1.7601.17609 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 120 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1420629063 news.xs4all.nl 2836 [2001:888:2000:d::a6]:36331 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:83286 This is a multi-part message in MIME format. ------=_NextPart_000_0021_01D02A7A.71409920 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable I'm busy using something like pyodbc to pull data out of MS access .mdb = files, and then generate .sql script files to execute against MySQL = databases using MySQLdb module, but, issue is forms of characters in = string values that don't fit inside the 0-127 range - current one seems = to be something like \xa3, and if I pass it through ord() function, it = comes out as character number 163. Now issue is, yes, could just run through the hundreds of thousands of = characters in these resulting strings, and strip out any that are not = within the basic 0-127 range, but, that could result in corrupting data = - think so anyway. Anyway, issue is, for example, if I try something like = str('\xa3').encode('utf-8') or str('\xa3').encode('ascii'), or = str('\xa3').encode('latin7') - that last one is actually our preferred = encoding for the MySQL database - they all just tell me they can't work = with a character out of range. Any thoughts on a sort of generic method/means to handle any/all = characters that might be out of range when having pulled them out of = something like these MS access databases? Another side note is for binary values that might store binary values, I = use something like the following to generate hex-based strings that work = alright when then inserting said same binary values into longblob = fields, but, don't think this would really help for what are really just = most likely badly chosen copy/pasted strings from documents, with = strange encoding, or something: #sample code line for binary encoding into string output s_values +=3D "0x" + str(l_data[J][I]).encode("hex").replace("\\", = "\\\\") + ", " TIA Jacob Kruger Blind Biker Skype: BlindZA "Roger Wilco wants to welcome you...to the space janitor's closet..." ------=_NextPart_000_0021_01D02A7A.71409920 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
I'm busy using something like pyodbc to = pull data=20 out of MS access .mdb files, and then generate .sql script files to = execute=20 against MySQL databases using MySQLdb module, but, issue is forms of = characters=20 in string values that don't fit inside the 0-127 range - current one = seems to be=20 something like \xa3, and if I pass it through ord() function, it comes = out as=20 character number 163.
 
Now issue is, yes, could just run = through the=20 hundreds of thousands of characters in these resulting strings, and = strip out=20 any that are not within the basic 0-127 range, but, that could result in = corrupting data - think so anyway.
 
Anyway, issue is, for example, if I try = something=20 like str('\xa3').encode('utf-8') or str('\xa3').encode('ascii'), or=20 str('\xa3').encode('latin7') - that last one is actually our preferred = encoding=20 for the MySQL database - they all just tell me they can't work with a = character=20 out of range.
 
Any thoughts on a sort of generic = method/means to=20 handle any/all characters that might be out of range when having pulled = them out=20 of something like these MS access databases?
 
Another side note is for binary values = that might=20 store binary values, I use something like the following to generate = hex-based=20 strings that work alright when then inserting said same binary values = into=20 longblob fields, but, don't think this would really help for what are = really=20 just most likely badly chosen copy/pasted strings from documents, with = strange=20 encoding, or something:
#sample code line for binary encoding = into string=20 output
s_values +=3D "0x" +=20 str(l_data[J][I]).encode("hex").replace("\\", "\\\\") + ", = "
 
TIA

Jacob Kruger
Blind = Biker
Skype:=20 BlindZA
"Roger Wilco wants to welcome you...to the space janitor's=20 closet..."
------=_NextPart_000_0021_01D02A7A.71409920--