Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #104631 > unrolled thread
| Started by | Fillmore <fillmore_remove@hotmail.com> |
|---|---|
| First post | 2016-03-11 14:41 -0500 |
| Last post | 2016-03-11 16:29 -0500 |
| Articles | 13 — 8 participants |
Back to article view | Back to comp.lang.python
issue with CVS module Fillmore <fillmore_remove@hotmail.com> - 2016-03-11 14:41 -0500
Re: issue with CVS module Joel Goldstick <joel.goldstick@gmail.com> - 2016-03-11 15:05 -0500
Re: issue with CVS module Fillmore <fillmore_remove@hotmail.com> - 2016-03-11 15:32 -0500
Re: issue with CVS module mm0fmf <none@invalid.com> - 2016-03-11 21:04 +0000
Re: issue with CVS module Ben Finney <ben+python@benfinney.id.au> - 2016-03-12 08:13 +1100
Re: issue with CVS module Fillmore <fillmore_remove@hotmail.com> - 2016-03-11 15:49 -0500
Re: issue with CVS module MRAB <python@mrabarnett.plus.com> - 2016-03-11 21:14 +0000
Re: issue with CVS module Fillmore <fillmore_remove@hotmail.com> - 2016-03-11 16:23 -0500
Re: issue with csv module (subject module name spelling correction, too) "Martin A. Brown" <martin@linux-ip.net> - 2016-03-11 13:20 -0800
Re: issue with CVS module Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-11 21:15 +0000
Re: issue with CVS module Fillmore <fillmore_remove@hotmail.com> - 2016-03-11 16:26 -0500
Re: issue with CVS module alister <alister.ware@ntlworld.com> - 2016-03-12 09:49 +0000
Re: issue with CVS module Fillmore <fillmore_remove@hotmail.com> - 2016-03-11 16:29 -0500
| From | Fillmore <fillmore_remove@hotmail.com> |
|---|---|
| Date | 2016-03-11 14:41 -0500 |
| Subject | issue with CVS module |
| Message-ID | <nbv71v$1fj8$1@gioia.aioe.org> |
I have a TSV file containing a few strings like this (double quotes are
part of the string):
'"pragma: CacheHandler=08616B7E907744E026C9F044250EA55844CCFD52"'
After Python and the CVS module has read the file and re-printed the
value, the string has become:
'pragma: CacheHandler=08616B7E907744E026C9F044250EA55844CCFD52'
which is NOT good for me. I went back to Perl and noticed that Perl was
correctly leaving the original string intact.
This is what I am using to read the file:
with open(file, newline='') as csvfile:
myReader = csv.reader(csvfile, delimiter='\t')
for row in myReader:
and this is what I use to write the cell value
sys.stdout.write(row[0])
Is there some directive I can give CVS reader to tell it to stop
screwing with my text?
Thanks
[toc] | [next] | [standalone]
| From | Joel Goldstick <joel.goldstick@gmail.com> |
|---|---|
| Date | 2016-03-11 15:05 -0500 |
| Message-ID | <mailman.3.1457726734.12893.python-list@python.org> |
| In reply to | #104631 |
On Fri, Mar 11, 2016 at 2:41 PM, Fillmore <fillmore_remove@hotmail.com> wrote: > > I have a TSV file containing a few strings like this (double quotes are > part of the string): > > '"pragma: CacheHandler=08616B7E907744E026C9F044250EA55844CCFD52"' > > After Python and the CVS module has read the file and re-printed the > value, the string has become: > > 'pragma: CacheHandler=08616B7E907744E026C9F044250EA55844CCFD52' > > which is NOT good for me. I went back to Perl and noticed that Perl was > correctly leaving the original string intact. > > This is what I am using to read the file: > > > with open(file, newline='') as csvfile: > > myReader = csv.reader(csvfile, delimiter='\t') > for row in myReader: > > and this is what I use to write the cell value > > sys.stdout.write(row[0]) > > Is there some directive I can give CVS reader to tell it to stop screwing > with my text? > > Thanks > -- > https://mail.python.org/mailman/listinfo/python-list > Enter the python shell. Import csv then type help(csv) It is highly configurable -- Joel Goldstick http://joelgoldstick.com/ <http://joelgoldstick.com/stats/birthdays> http://cc-baseballstats.info/
[toc] | [prev] | [next] | [standalone]
| From | Fillmore <fillmore_remove@hotmail.com> |
|---|---|
| Date | 2016-03-11 15:32 -0500 |
| Message-ID | <nbva0l$1m7a$1@gioia.aioe.org> |
| In reply to | #104634 |
On 3/11/2016 3:05 PM, Joel Goldstick wrote:
>
> Enter the python shell. Import csv
>
> then type help(csv)
>
> It is highly configurable
>
Possibly, but I am having a hard time letting it know that it should
leave each and every char alone, ignore quoting and just handle strings
as strings. I tried playing with the quoting related parameters, to no
avail:
Traceback (most recent call last):
File "./myscript.py", line 47, in <module>
myReader = csv.reader(csvfile, delimiter='\t',quotechar='')
TypeError: quotechar must be set if quoting enabled
I tried adding CVS.QUOTE_NONE, but things get messy :(
Traceback (most recent call last):
File "./myscript.py", line 64, in <module>
sys.stdout.write("\t"+row[h])
IndexError: list index out of range
Sorry for being a pain, but I am porting from Perl and split
/\t/,$line; was doing the job for me. Maybe I should go back to split on
'\t' for python too...
[toc] | [prev] | [next] | [standalone]
| From | mm0fmf <none@invalid.com> |
|---|---|
| Date | 2016-03-11 21:04 +0000 |
| Message-ID | <nbvbo4$drk$1@dont-email.me> |
| In reply to | #104635 |
On 11/03/2016 20:32, Fillmore wrote: > myReader = csv.reader(csvfile, delimiter='\t',quotechar='') From reading that the quotechar is null. You have a single quote and single quote with nothing in the middle. Try this: myReader = csv.reader(csvfile, delimiter='\t',quotechar="'") i.e doublequote singlequote doublequote or the other way myReader = csv.reader(csvfile, delimiter='\t',quotechar='"') I haven't tried this, so it may be nonsense.
[toc] | [prev] | [next] | [standalone]
| From | Ben Finney <ben+python@benfinney.id.au> |
|---|---|
| Date | 2016-03-12 08:13 +1100 |
| Message-ID | <mailman.4.1457730844.12893.python-list@python.org> |
| In reply to | #104635 |
Fillmore <fillmore_remove@hotmail.com> writes: > Possibly, but I am having a hard time letting it know that it should > leave each and every char alone You're using the wrong module, then. To use the ‘csv’ module is to have the sequence of characters parsed to extract component values, which cannot also “leave each and every character alone”. If you want to “leave each and every character alone”, don't parse the data as CSV. Instead, read the file as a simple text file. -- \ “I have always wished for my computer to be as easy to use as | `\ my telephone; my wish has come true because I can no longer | _o__) figure out how to use my telephone.” —Bjarne Stroustrup | Ben Finney
[toc] | [prev] | [next] | [standalone]
| From | Fillmore <fillmore_remove@hotmail.com> |
|---|---|
| Date | 2016-03-11 15:49 -0500 |
| Message-ID | <nbvb0s$1nps$1@gioia.aioe.org> |
| In reply to | #104631 |
On 3/11/2016 2:41 PM, Fillmore wrote:
> Is there some directive I can give CVS reader to tell it to stop
> screwing with my text?
OK, I think I reproduced my problem at the REPL:
>>> import csv
>>> s = '"Please preserve my doublequotes"\ttext1\ttext2'
>>> reader = csv.reader([s], delimiter='\t')
>>> for row in reader:
... print(row[0])
...
Please preserve my doublequotes
>>>
:(
How do I instruct the reader to preserve my doublequotes?
As an aside. split() performs the job correctly...
>>> allVals = s.split("\t")
>>> print(allVals[0])
"Please preserve my doublequotes"
>>>
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2016-03-11 21:14 +0000 |
| Message-ID | <mailman.5.1457730898.12893.python-list@python.org> |
| In reply to | #104636 |
On 2016-03-11 20:49, Fillmore wrote:
> On 3/11/2016 2:41 PM, Fillmore wrote:
>> Is there some directive I can give CVS reader to tell it to stop
>> screwing with my text?
>
> OK, I think I reproduced my problem at the REPL:
>
> >>> import csv
> >>> s = '"Please preserve my doublequotes"\ttext1\ttext2'
> >>> reader = csv.reader([s], delimiter='\t')
> >>> for row in reader:
> ... print(row[0])
> ...
> Please preserve my doublequotes
> >>>
>
> :(
>
> How do I instruct the reader to preserve my doublequotes?
>
> As an aside. split() performs the job correctly...
>
> >>> allVals = s.split("\t")
> >>> print(allVals[0])
> "Please preserve my doublequotes"
> >>>
>
>>> import csv
>>> s = '"Please preserve my doublequotes"\ttext1\ttext2'
>>> reader = csv.reader([s], delimiter='\t', quotechar=None)
>>> for row in reader:
... print(row[0])
...
"Please preserve my doublequotes"
>>>
[toc] | [prev] | [next] | [standalone]
| From | Fillmore <fillmore_remove@hotmail.com> |
|---|---|
| Date | 2016-03-11 16:23 -0500 |
| Message-ID | <nbvd1b$1rak$1@gioia.aioe.org> |
| In reply to | #104639 |
On 3/11/2016 4:14 PM, MRAB wrote: >> > >>> import csv > >>> s = '"Please preserve my doublequotes"\ttext1\ttext2' > >>> reader = csv.reader([s], delimiter='\t', quotechar=None) > >>> for row in reader: > ... print(row[0]) > ... > "Please preserve my doublequotes" > >>> This worked! thank you MRAB
[toc] | [prev] | [next] | [standalone]
| From | "Martin A. Brown" <martin@linux-ip.net> |
|---|---|
| Date | 2016-03-11 13:20 -0800 |
| Subject | Re: issue with csv module (subject module name spelling correction, too) |
| Message-ID | <mailman.7.1457731250.12893.python-list@python.org> |
| In reply to | #104636 |
Good afternoon Fillmore,
>>>> import csv
>>>> s = '"Please preserve my doublequotes"\ttext1\ttext2'
>>>> reader = csv.reader([s], delimiter='\t')
> How do I instruct the reader to preserve my doublequotes?
Change the quoting used by the dialect on the csv reader instance:
reader = csv.reader([s], delimiter='\t', quoting=csv.QUOTE_NONE)
You can use the same technique for the writer.
If you cannot create your particular (required) variant of csv by
tuning the available parameters in the csv module's dialect control,
I'd be a touch surprised, but, it is possible that your other csv
readers and writers are more finicky.
Did you see the parameters that are available to you for tuning how
the csv module turns your csv data into records?
https://docs.python.org/3/library/csv.html#dialects-and-formatting-parameters
Judging from your example, you definitely want to use
quoting=csv.QUOTE_NONE, because you don't want the module to do much
more than split('\t').
Good luck,
-Martin
--
Martin A. Brown
http://linux-ip.net/
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2016-03-11 21:15 +0000 |
| Message-ID | <mailman.6.1457730962.12893.python-list@python.org> |
| In reply to | #104631 |
On 11/03/2016 19:41, Fillmore wrote: > > I have a TSV file containing a few strings like this (double quotes are > part of the string): > > '"pragma: CacheHandler=08616B7E907744E026C9F044250EA55844CCFD52"' > > After Python and the CVS module has read the file and re-printed the > value, the string has become: > > 'pragma: CacheHandler=08616B7E907744E026C9F044250EA55844CCFD52' > > which is NOT good for me. I went back to Perl and noticed that Perl was > correctly leaving the original string intact. > > This is what I am using to read the file: > > > with open(file, newline='') as csvfile: > > myReader = csv.reader(csvfile, delimiter='\t') > for row in myReader: > > and this is what I use to write the cell value > > sys.stdout.write(row[0]) > > Is there some directive I can give CVS reader to tell it to stop > screwing with my text? > > Thanks https://docs.python.org/3/library/csv.html#csv.Dialect.doublequote -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Fillmore <fillmore_remove@hotmail.com> |
|---|---|
| Date | 2016-03-11 16:26 -0500 |
| Message-ID | <nbvd5b$1rak$2@gioia.aioe.org> |
| In reply to | #104640 |
On 3/11/2016 4:15 PM, Mark Lawrence wrote: > > https://docs.python.org/3/library/csv.html#csv.Dialect.doublequote > thanks, but my TSV is not using any particular dialect as far as I understand... Thank you, anyway
[toc] | [prev] | [next] | [standalone]
| From | alister <alister.ware@ntlworld.com> |
|---|---|
| Date | 2016-03-12 09:49 +0000 |
| Message-ID | <oCREy.1473073$wX5.813049@fx40.am4> |
| In reply to | #104643 |
On Fri, 11 Mar 2016 16:26:02 -0500, Fillmore wrote: > On 3/11/2016 4:15 PM, Mark Lawrence wrote: >> >> https://docs.python.org/3/library/csv.html#csv.Dialect.doublequote >> >> > thanks, but my TSV is not using any particular dialect as far as I > understand... > > Thank you, anyway Every variation of a language/format is a dialect even if (at present) there is only one. CSV/TSV has many - (TSV is simply a dialect of CSV or vice versa) you also have the variable of quoting (as you have found) and line separation (which may or may not be an issue) Whenever you are processing an external file it is essential to know the exact format it is using to avoid errors -- So, you better watch out! You better not cry! You better not pout! I'm telling you why, Santa Claus is coming, to town. He knows when you've been sleeping, He know when you're awake. He knows if you've been bad or good, He has ties with the CIA. So...
[toc] | [prev] | [next] | [standalone]
| From | Fillmore <fillmore_remove@hotmail.com> |
|---|---|
| Date | 2016-03-11 16:29 -0500 |
| Message-ID | <nbvdc9$1rr0$1@gioia.aioe.org> |
| In reply to | #104631 |
On 3/11/2016 2:41 PM, Fillmore wrote:
>
> I have a TSV file containing a few strings like this (double quotes are
> part of the string):
A big thank you to everyone who helped with this and with other
questions. My porting of one of my Perl scripts to Python is over now
that the two scripts produce virtually the same result:
$ wc -l test2.txt test3.txt
70823 test2.txt
70822 test3.txt
141645 total
$ diff test2.txt test3.txt
69351d69350
<
there's only an extra empty line at the bottom that I'll leave as a tip
to Perl ;)
It was instructive.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web