Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #34949

Unicode

Path csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder2.hal-mli.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <tolidtm@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.001
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; '"""': 0.05; '%s"': 0.07; 'inserts': 0.07; 'params': 0.07; 'try:': 0.07; 'utf-8': 0.07; 'python': 0.09; 'cursor': 0.09; 'name)': 0.09; 'name):': 0.09; 'spec': 0.09; 'url:%s': 0.09; 'def': 0.10; '"""gets': 0.16; '"insert': 0.16; 'description)': 0.16; 'guys,': 0.16; 'spec)': 0.16; 'spec,': 0.16; 'subject:Unicode': 0.16; 'url:mi': 0.16; 'unicode': 0.17; 'working.': 0.17; 'insert': 0.23; 'linux': 0.24; 'tried': 0.25; 'values': 0.26; 'message-id:@mail.gmail.com': 0.27; "i'm": 0.29; 'error': 0.30; 'could': 0.32; 'skip:s 30': 0.33; 'to:addr:python-list': 0.33; 'received:google.com': 0.34; 'thanks': 0.34; 'received:209.85': 0.35; 'except': 0.36; 'skip:u 20': 0.36; 'but': 0.36; 'skip:g 30': 0.36; 'skip:m 40': 0.36; 'skip:p 20': 0.36; 'correctly': 0.37; 'received:209': 0.37; 'data': 0.37; 'some': 0.38; 'things': 0.38; 'description': 0.39; 'to:addr:python.org': 0.39; 'header:Received:5': 0.40; 'help': 0.40; 'skip:u 10': 0.60; 'url:index': 0.61; 'dont': 0.64; 'french': 0.64; 'url:cgi': 0.65; 'opener': 0.84; 'url:lang': 0.84; 'url:biz': 0.91; 'url:fr': 0.95
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=JwSCsRrTI7zc3f6vFBw3h0Ia9gFKTIkgOGe5tMJiuW4=; b=WDRiKLNzup82eEdqLD+6BZ4CTEzU+ssFWXkqWF5qhhy1ZWT2ugeN3GlGnuk93Knjo8 jgKHjjz4MinCYzvwHa00mCe1xFfxGnWO/+b/WasEd8hXvd3hCxXmqql1y9tHhdU+ATiY Zmglkql2kCDRT7PNOsrCqggNVFJDdWZWOOmG05c5V0dAO++BbYej2d9ir/utzQ/PVdX0 WllT5XzykUWpEQy1xsbFV3PINr45Fh64S7Jv6D5UMt7CWRHcYZPQLXFNNG0n7m5j8v6l 9e/3OiyjP1EaGBOxqAVoeIdgkDn57p7EZjar3mqwzXPJLFKde7xD3W6ldvCd/kb51wYi X4yQ==
MIME-Version 1.0
Date Sun, 16 Dec 2012 22:10:37 +0100
Subject Unicode
From Anatoli Hristov <tolidtm@gmail.com>
To python-list@python.org
Content-Type text/plain; charset=UTF-8
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.941.1355692240.29569.python-list@python.org> (permalink)
Lines 73
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1355692240 news.xs4all.nl 6884 [2001:888:2000:d::a6]:38376
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:34949

Show key headers only | View raw


Hello guys,

I'm using Linux CentOS and Python 2.4 with MySQL 5.xx, I get error
with Unicode I tried many things that I found on the net but none of
them working.

If I dont use UTF-8 it inserts the data into the DB  but some French
char. are not correctly decoded. Could you please help me ?

Thanks

def PrepareSpecs(product_id, icecat_prod_id, icecat_image_url, name):
"""Gets the specifications of a product from Icecat.biz and insert
them into the DB
"""
    specs = {3:GetSpecsNL(icecat_prod_id),2:GetSpecsFR(icecat_prod_id).decode('utf-8'),1:GetSpecsEN(icecat_prod_id)}
    SpecsToSQL(product_id,specs,name)
    CategorySQL(product_id)
    StoreSQL(product_id)
    GetIMG(icecat_image_url,icecat_prod_id)
    return

def GetSpecsFR(icecat_prod_id):
    opener = urllib.FancyURLopener({})
    ffr = opener.open("http://prf.icecat.biz/index.cgi?product_id=%s;mi=start;smi=product;shopname=openICEcat-url;lang=fr"
% icecat_prod_id)
    specsfr = ffr.read()
    #specsfr = specsfr.decode('utf-8')
    specsfr = RemoveHTML(specsfr)
    ##specsfr = "%r" % specsfr
##    if specsfr:
##        try:
##            specsfr = str(specsfr)
##        except UnicodeEncodeError:
##            specsfr = str(specsfr.encode('utf-16'))
    return specsfr

def RemoveHTML(specs):
    specs = specs.replace("<html>","")
    specs = specs.replace("<HTML>","")
    specs = specs.replace("</html>","")
    specs = specs.replace("</HTML>","")
    specs = specs.replace("<head>","")
    specs = specs.replace("<HEAD>","")
    specs = specs.replace("</head>","")
    specs = specs.replace("</HEAD>","")
    specs = specs.replace("<body>","")
    specs = specs.replace("</body>","")
    specs = specs.replace("<BODY>","")
    specs = specs.replace("</body>","")
    specs = specs.replace("<TITLE>","")
    specs = specs.replace("</TITLE>","")
    specs = specs.replace("<title>","")
    specs = specs.replace("</title>","")
    specs = specs.replace("<p>","")
    specs = specs.replace("</p>","")
    return specs

def SpecsToSQL(product_id, specs, name):
    for lang, spec in specs.iteritems():
        InsertSpecsDB(product_id, spec, lang, name)
    return

def InsertSpecsDB(product_id, spec, name, lang):
    db = MySQLdb.connect("localhost","getit","opencart")
    cursor = db.cursor()
    sql = "INSERT INTO product_description (product_id, language_id,
name, description) VALUES (%s,%s,%s,%s)"
    params = (product_id, lang, name, spec)
    cursor.execute(sql, params)
    id = cursor.lastrowid
    print"Updated ID %s description %s" %(int(id), lang)
    return

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-16 22:10 +0100
  Re: Unicode Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-17 06:06 +0000
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 09:59 +0100
    Re: Unicode Benjamin Kaplan <benjamin.kaplan@case.edu> - 2012-12-17 01:28 -0800
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 10:45 +0100
    Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 11:02 +0100
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 11:17 +0100
    Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 11:55 +0100
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 12:14 +0100
    Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 12:56 +0100
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 18:43 +0100
    Re: Unicode Dave Angel <d@davea.name> - 2012-12-17 13:07 -0500
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 19:36 +0100
      Re: Unicode Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-18 00:07 +0000
    Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 20:55 +0100
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 21:00 +0100
    Re: Unicode Dave Angel <d@davea.name> - 2012-12-17 16:09 -0500
      Re: Unicode Hans Mulder <hansmu@xs4all.nl> - 2012-12-17 23:02 +0100
        Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 23:33 +0100
    Re: Unicode Terry Reedy <tjreedy@udel.edu> - 2012-12-17 17:03 -0500
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 23:31 +0100

csiph-web