Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #34949

Unicode

Date 2012-12-16 22:10 +0100
Subject Unicode
From Anatoli Hristov <tolidtm@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.941.1355692240.29569.python-list@python.org> (permalink)

Show all headers | View raw


Hello guys,

I'm using Linux CentOS and Python 2.4 with MySQL 5.xx, I get error
with Unicode I tried many things that I found on the net but none of
them working.

If I dont use UTF-8 it inserts the data into the DB  but some French
char. are not correctly decoded. Could you please help me ?

Thanks

def PrepareSpecs(product_id, icecat_prod_id, icecat_image_url, name):
"""Gets the specifications of a product from Icecat.biz and insert
them into the DB
"""
    specs = {3:GetSpecsNL(icecat_prod_id),2:GetSpecsFR(icecat_prod_id).decode('utf-8'),1:GetSpecsEN(icecat_prod_id)}
    SpecsToSQL(product_id,specs,name)
    CategorySQL(product_id)
    StoreSQL(product_id)
    GetIMG(icecat_image_url,icecat_prod_id)
    return

def GetSpecsFR(icecat_prod_id):
    opener = urllib.FancyURLopener({})
    ffr = opener.open("http://prf.icecat.biz/index.cgi?product_id=%s;mi=start;smi=product;shopname=openICEcat-url;lang=fr"
% icecat_prod_id)
    specsfr = ffr.read()
    #specsfr = specsfr.decode('utf-8')
    specsfr = RemoveHTML(specsfr)
    ##specsfr = "%r" % specsfr
##    if specsfr:
##        try:
##            specsfr = str(specsfr)
##        except UnicodeEncodeError:
##            specsfr = str(specsfr.encode('utf-16'))
    return specsfr

def RemoveHTML(specs):
    specs = specs.replace("<html>","")
    specs = specs.replace("<HTML>","")
    specs = specs.replace("</html>","")
    specs = specs.replace("</HTML>","")
    specs = specs.replace("<head>","")
    specs = specs.replace("<HEAD>","")
    specs = specs.replace("</head>","")
    specs = specs.replace("</HEAD>","")
    specs = specs.replace("<body>","")
    specs = specs.replace("</body>","")
    specs = specs.replace("<BODY>","")
    specs = specs.replace("</body>","")
    specs = specs.replace("<TITLE>","")
    specs = specs.replace("</TITLE>","")
    specs = specs.replace("<title>","")
    specs = specs.replace("</title>","")
    specs = specs.replace("<p>","")
    specs = specs.replace("</p>","")
    return specs

def SpecsToSQL(product_id, specs, name):
    for lang, spec in specs.iteritems():
        InsertSpecsDB(product_id, spec, lang, name)
    return

def InsertSpecsDB(product_id, spec, name, lang):
    db = MySQLdb.connect("localhost","getit","opencart")
    cursor = db.cursor()
    sql = "INSERT INTO product_description (product_id, language_id,
name, description) VALUES (%s,%s,%s,%s)"
    params = (product_id, lang, name, spec)
    cursor.execute(sql, params)
    id = cursor.lastrowid
    print"Updated ID %s description %s" %(int(id), lang)
    return

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-16 22:10 +0100
  Re: Unicode Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-17 06:06 +0000
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 09:59 +0100
    Re: Unicode Benjamin Kaplan <benjamin.kaplan@case.edu> - 2012-12-17 01:28 -0800
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 10:45 +0100
    Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 11:02 +0100
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 11:17 +0100
    Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 11:55 +0100
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 12:14 +0100
    Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 12:56 +0100
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 18:43 +0100
    Re: Unicode Dave Angel <d@davea.name> - 2012-12-17 13:07 -0500
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 19:36 +0100
      Re: Unicode Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-18 00:07 +0000
    Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 20:55 +0100
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 21:00 +0100
    Re: Unicode Dave Angel <d@davea.name> - 2012-12-17 16:09 -0500
      Re: Unicode Hans Mulder <hansmu@xs4all.nl> - 2012-12-17 23:02 +0100
        Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 23:33 +0100
    Re: Unicode Terry Reedy <tjreedy@udel.edu> - 2012-12-17 17:03 -0500
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 23:31 +0100

csiph-web