Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #35884 > unrolled thread

Handling Special characters in python

Started byanilkumar.dannina@gmail.com
First post2013-01-01 03:35 -0800
Last post2013-01-01 20:46 -0800
Articles 9 — 4 participants

Back to article view | Back to comp.lang.python


Contents

  Handling Special characters in python anilkumar.dannina@gmail.com - 2013-01-01 03:35 -0800
    Re: Handling Special characters in python Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-01 12:01 +0000
    Re: Handling Special characters in python Chris Rebert <chris@rebertia.com> - 2013-01-01 10:30 -0800
      Re: Handling Special characters in python anilkumar.dannina@gmail.com - 2013-01-01 20:46 -0800
        Re: Handling Special characters in python Chris Rebert <clp2@rebertia.com> - 2013-01-01 22:32 -0800
          Re: Handling Special characters in python anilkumar.dannina@gmail.com - 2013-01-02 05:39 -0800
            Re: Handling Special characters in python Chris Rebert <clp2@rebertia.com> - 2013-01-02 22:00 -0800
          Re: Handling Special characters in python anilkumar.dannina@gmail.com - 2013-01-02 05:39 -0800
      Re: Handling Special characters in python anilkumar.dannina@gmail.com - 2013-01-01 20:46 -0800

#35884 — Handling Special characters in python

Fromanilkumar.dannina@gmail.com
Date2013-01-01 03:35 -0800
SubjectHandling Special characters in python
Message-ID<b1b99ce0-6088-4c80-9e43-ac12e7799b92@googlegroups.com>
I am facing one issue in my module. I am gathering data from sql server database. In the data that I got from db contains special characters like "endash". Python was taking it as "\x96". I require the same character(endash). How can I perform that. Can you please help me in resolving this issue.

Waiting for your reply.

Thanks,
D Anil Kumar

[toc] | [next] | [standalone]


#35886

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-01-01 12:01 +0000
Message-ID<50e2d02f$0$30003$c3e8da3$5496439d@news.astraweb.com>
In reply to#35884
On Tue, 01 Jan 2013 03:35:56 -0800, anilkumar.dannina wrote:

> I am facing one issue in my module. I am gathering data from sql server
> database. In the data that I got from db contains special characters
> like "endash". Python was taking it as "\x96". I require the same
> character(endash). How can I perform that. Can you please help me in
> resolving this issue.


"endash" is not a character, it is six characters.

On the other hand, "\x96" is a single byte:

py> c = u"\x96"
py> assert len(c) == 1


But it is not a legal Unicode character:

py> import unicodedata
py> unicodedata.name(c)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: no such name


So if it is not a Unicode character, it is probably a byte.

py> c = "\x96"
py> print c
�


To convert byte 0x96 to an n-dash character, you need to identify the 
encoding to use. 

(Aside: and *stop* using it. It is 2013 now, anyone who is not using 
UTF-8 is doing it wrong. Legacy encodings are still necessary for legacy 
data, but any new data should always using UTF-8.)

CP 1252 is one possible encoding, but there may be others:

py> uc = c.decode('cp1252')
py> unicodedata.name(uc)
'EN DASH'



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#35906

FromChris Rebert <chris@rebertia.com>
Date2013-01-01 10:30 -0800
Message-ID<mailman.1527.1357065016.29569.python-list@python.org>
In reply to#35884

[Multipart message — attachments visible in raw view] — view raw

On Jan 1, 2013 3:41 AM, <anilkumar.dannina@gmail.com> wrote:
>
> I am facing one issue in my module. I am gathering data from sql server
database. In the data that I got from db contains special characters like
"endash". Python was taking it as "\x96". I require the same
character(endash). How can I perform that. Can you please help me in
resolving this issue.

1. What library are you using to access the database?
2. To confirm, it's a Microsoft SQL Server database?
3. What OS are you on?

[toc] | [prev] | [next] | [standalone]


#35957

Fromanilkumar.dannina@gmail.com
Date2013-01-01 20:46 -0800
Message-ID<90663ba3-2307-45ed-a1b7-c3dbe5130ebd@googlegroups.com>
In reply to#35906
On Wednesday, January 2, 2013 12:00:06 AM UTC+5:30, Chris Rebert wrote:
> On Jan 1, 2013 3:41 AM, <anilkuma...@gmail.com> wrote:
> 
> >
> 
> > I am facing one issue in my module. I am gathering data from sql server database. In the data that I got from db contains special characters like "endash". Python was taking it as "\x96". I require the same character(endash). How can I perform that. Can you please help me in resolving this issue.
> 
> 
> 1. What library are you using to access the database?
> 
> 2. To confirm, it's a Microsoft SQL Server database?
> 
> 3. What OS are you on?



1. I am using "pymssql" module to access the database.
2. Yes, It is a SQL server database.
3. I am on Ubuntu 11.10

[toc] | [prev] | [next] | [standalone]


#35964

FromChris Rebert <clp2@rebertia.com>
Date2013-01-01 22:32 -0800
Message-ID<mailman.1552.1357108357.29569.python-list@python.org>
In reply to#35957

[Multipart message — attachments visible in raw view] — view raw

On Jan 1, 2013 8:48 PM, <anilkumar.dannina@gmail.com> wrote:
> On Wednesday, January 2, 2013 12:00:06 AM UTC+5:30, Chris Rebert wrote:
> > On Jan 1, 2013 3:41 AM, <anilkuma...@gmail.com> wrote:
> >
> > > I am facing one issue in my module. I am gathering data from sql
server database. In the data that I got from db contains special characters
like "endash". Python was taking it as "\x96". I require the same
character(endash). How can I perform that. Can you please help me in
resolving this issue.
> >
> > 1. What library are you using to access the database?
> > 2. To confirm, it's a Microsoft SQL Server database?
> > 3. What OS are you on?
>
> 1. I am using "pymssql" module to access the database.
> 2. Yes, It is a SQL server database.
> 3. I am on Ubuntu 11.10

Did you set "client charset" (to "UTF-8", unless you have good reason to
choose otherwise) in freetds.conf? That should at least ensure that the
driver itself is exchanging bytestrings via a well-defined encoding.
If you want to work in Unicode natively (Recommended), you'll probably need
to ensure that the columns are of type NVARCHAR as opposed to VARCHAR.
Unless you're using SQLAlchemy or similar (which I personally would
recommend using), you may need to do the .encode() and .decode()-ing
manually, using the charset you specified in freetds.conf.

Sorry my advice is a tad general. I went the alternative route of
SQLAlchemy + PyODBC + Microsoft's SQL Server ODBC driver for Linux (
http://www.microsoft.com/en-us/download/details.aspx?id=28160 ) for my
current project, which likewise needs to fetch data from MS SQL to an
Ubuntu box. The driver is intended for Red Hat and isn't packaged nicely
(it installs via a shell script), but after that was dealt with, things
have gone smoothly. Unicode, in particular, seems to work properly.

[toc] | [prev] | [next] | [standalone]


#35978

Fromanilkumar.dannina@gmail.com
Date2013-01-02 05:39 -0800
Message-ID<310c83c4-cfa2-4425-b291-d1a3604b3e29@googlegroups.com>
In reply to#35964
On Wednesday, January 2, 2013 12:02:34 PM UTC+5:30, Chris Rebert wrote:
> On Jan 1, 2013 8:48 PM, <anilkuma...@gmail.com> wrote:
> 
> > On Wednesday, January 2, 2013 12:00:06 AM UTC+5:30, Chris Rebert wrote:
> 
> > > On Jan 1, 2013 3:41 AM, <anilkuma...@gmail.com> wrote:
> 
> > >
> 
> > > > I am facing one issue in my module. I am gathering data from sql server database. In the data that I got from db contains special characters like "endash". Python was taking it as "\x96". I require the same character(endash). How can I perform that. Can you please help me in resolving this issue.
> 
> 
> > >
> 
> > > 1. What library are you using to access the database?
> 
> > > 2. To confirm, it's a Microsoft SQL Server database?
> 
> > > 3. What OS are you on?
> 
> >
> 
> > 1. I am using "pymssql" module to access the database.
> 
> > 2. Yes, It is a SQL server database.
> 
> > 3. I am on Ubuntu 11.10
> 
> Did you set "client charset" (to "UTF-8", unless you have good reason to choose otherwise) in freetds.conf? That should at least ensure that the driver itself is exchanging bytestrings via a well-defined encoding.
> 
> 
> If you want to work in Unicode natively (Recommended), you'll probably need to ensure that the columns are of type NVARCHAR as opposed to VARCHAR. Unless you're using SQLAlchemy or similar (which I personally would recommend using), you may need to do the .encode() and .decode()-ing manually, using the charset you specified in freetds.conf.
> 
> 
> Sorry my advice is a tad general. I went the alternative route of SQLAlchemy + PyODBC + Microsoft's SQL Server ODBC driver for Linux (http://www.microsoft.com/en-us/download/details.aspx?id=28160 ) for my current project, which likewise needs to fetch data from MS SQL to an Ubuntu box. The driver is intended for Red Hat and isn't packaged nicely (it installs via a shell script), but after that was dealt with, things have gone smoothly. Unicode, in particular, seems to work properly.



Thanks Chris Rebert for your suggestion, I tried with PyODBC module, But at the place of "en dash(-)", I am getting '?' symbol. How can I overcome this.

[toc] | [prev] | [next] | [standalone]


#36037

FromChris Rebert <clp2@rebertia.com>
Date2013-01-02 22:00 -0800
Message-ID<mailman.20.1357192819.2939.python-list@python.org>
In reply to#35978
On Wed, Jan 2, 2013 at 5:39 AM,  <anilkumar.dannina@gmail.com> wrote:
> On Wednesday, January 2, 2013 12:02:34 PM UTC+5:30, Chris Rebert wrote:
>> On Jan 1, 2013 8:48 PM, <anilkuma...@gmail.com> wrote:
>> > On Wednesday, January 2, 2013 12:00:06 AM UTC+5:30, Chris Rebert wrote:
>> > > On Jan 1, 2013 3:41 AM, <anilkuma...@gmail.com> wrote:
>> > > > I am facing one issue in my module. I am gathering data from sql server database. In the data that I got from db contains special characters like "endash". Python was taking it as "\x96". I require the same character(endash). How can I perform that. Can you please help me in resolving this issue.
>>
>> > > 1. What library are you using to access the database?
>> > > 2. To confirm, it's a Microsoft SQL Server database?
>> > > 3. What OS are you on?
>>
>> > 1. I am using "pymssql" module to access the database.
>> > 2. Yes, It is a SQL server database.
>> > 3. I am on Ubuntu 11.10
>>
>> Did you set "client charset" (to "UTF-8", unless you have good reason to choose otherwise) in freetds.conf? That should at least ensure that the driver itself is exchanging bytestrings via a well-defined encoding.
>> If you want to work in Unicode natively (Recommended), you'll probably need to ensure that the columns are of type NVARCHAR as opposed to VARCHAR. Unless you're using SQLAlchemy or similar (which I personally would recommend using), you may need to do the .encode() and .decode()-ing manually, using the charset you specified in freetds.conf.
>>
>> Sorry my advice is a tad general. I went the alternative route of SQLAlchemy + PyODBC + Microsoft's SQL Server ODBC driver for Linux (http://www.microsoft.com/en-us/download/details.aspx?id=28160 ) for my current project, which likewise needs to fetch data from MS SQL to an Ubuntu box. The driver is intended for Red Hat and isn't packaged nicely (it installs via a shell script), but after that was dealt with, things have gone smoothly. Unicode, in particular, seems to work properly.
>
> Thanks Chris Rebert for your suggestion, I tried with PyODBC module, But at the place of "en dash(-)", I am getting '?' symbol. How can I overcome this.

I would recommend first trying the advice in the initial part of my
response rather than the latter part. The latter part was more for
completeness and for the sake of the archives, although I can give
more details on its approach if you insist.

Additionally, giving more information as to what exactly you tried
would be helpful. What config / connection settings did you use? Of
what datatype is the relevant  column of the table? What's your code
snippet look like? Etc..

Regards,
Chris

[toc] | [prev] | [next] | [standalone]


#35988

Fromanilkumar.dannina@gmail.com
Date2013-01-02 05:39 -0800
Message-ID<mailman.1567.1357136345.29569.python-list@python.org>
In reply to#35964
On Wednesday, January 2, 2013 12:02:34 PM UTC+5:30, Chris Rebert wrote:
> On Jan 1, 2013 8:48 PM, <anilkuma...@gmail.com> wrote:
> 
> > On Wednesday, January 2, 2013 12:00:06 AM UTC+5:30, Chris Rebert wrote:
> 
> > > On Jan 1, 2013 3:41 AM, <anilkuma...@gmail.com> wrote:
> 
> > >
> 
> > > > I am facing one issue in my module. I am gathering data from sql server database. In the data that I got from db contains special characters like "endash". Python was taking it as "\x96". I require the same character(endash). How can I perform that. Can you please help me in resolving this issue.
> 
> 
> > >
> 
> > > 1. What library are you using to access the database?
> 
> > > 2. To confirm, it's a Microsoft SQL Server database?
> 
> > > 3. What OS are you on?
> 
> >
> 
> > 1. I am using "pymssql" module to access the database.
> 
> > 2. Yes, It is a SQL server database.
> 
> > 3. I am on Ubuntu 11.10
> 
> Did you set "client charset" (to "UTF-8", unless you have good reason to choose otherwise) in freetds.conf? That should at least ensure that the driver itself is exchanging bytestrings via a well-defined encoding.
> 
> 
> If you want to work in Unicode natively (Recommended), you'll probably need to ensure that the columns are of type NVARCHAR as opposed to VARCHAR. Unless you're using SQLAlchemy or similar (which I personally would recommend using), you may need to do the .encode() and .decode()-ing manually, using the charset you specified in freetds.conf.
> 
> 
> Sorry my advice is a tad general. I went the alternative route of SQLAlchemy + PyODBC + Microsoft's SQL Server ODBC driver for Linux (http://www.microsoft.com/en-us/download/details.aspx?id=28160 ) for my current project, which likewise needs to fetch data from MS SQL to an Ubuntu box. The driver is intended for Red Hat and isn't packaged nicely (it installs via a shell script), but after that was dealt with, things have gone smoothly. Unicode, in particular, seems to work properly.



Thanks Chris Rebert for your suggestion, I tried with PyODBC module, But at the place of "en dash(-)", I am getting '?' symbol. How can I overcome this.

[toc] | [prev] | [next] | [standalone]


#35958

Fromanilkumar.dannina@gmail.com
Date2013-01-01 20:46 -0800
Message-ID<mailman.1549.1357102018.29569.python-list@python.org>
In reply to#35906
On Wednesday, January 2, 2013 12:00:06 AM UTC+5:30, Chris Rebert wrote:
> On Jan 1, 2013 3:41 AM, <anilkuma...@gmail.com> wrote:
> 
> >
> 
> > I am facing one issue in my module. I am gathering data from sql server database. In the data that I got from db contains special characters like "endash". Python was taking it as "\x96". I require the same character(endash). How can I perform that. Can you please help me in resolving this issue.
> 
> 
> 1. What library are you using to access the database?
> 
> 2. To confirm, it's a Microsoft SQL Server database?
> 
> 3. What OS are you on?



1. I am using "pymssql" module to access the database.
2. Yes, It is a SQL server database.
3. I am on Ubuntu 11.10

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web