Groups > comp.lang.python > #86700 > unrolled thread

rst and pypandoc

Started by	al.basili@gmail.com (alb)
First post	2015-03-02 07:59 +0000
Last post	2015-03-03 02:09 +1100
Articles	20 on this page of 29 — 9 participants

Back to article view | Back to comp.lang.python

  rst and pypandoc al.basili@gmail.com (alb) - 2015-03-02 07:59 +0000
    Re: rst and pypandoc Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2015-03-02 12:03 +0100
    Re: rst and pypandoc Dave Angel <davea@davea.name> - 2015-03-02 07:03 -0500
      Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-02 12:36 +0000
    Re: rst and pypandoc Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-02 23:33 +1100
      Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-02 13:51 +0000
        Re: rst and pypandoc Dave Angel <davea@davea.name> - 2015-03-02 09:08 -0500
          Re: rst and pypandoc Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-03 01:43 +1100
            Re: rst and pypandoc Dave Angel <davea@davea.name> - 2015-03-02 13:55 -0500
            Re: rst and pypandoc Ben Finney <ben+python@benfinney.id.au> - 2015-03-03 06:09 +1100
            Re: rst and pypandoc Dave Angel <davea@davea.name> - 2015-03-02 14:16 -0500
          Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-02 22:30 +0000
            Re: rst and pypandoc Chris Angelico <rosuav@gmail.com> - 2015-03-03 09:51 +1100
            Re: rst and pypandoc Ben Finney <ben+python@benfinney.id.au> - 2015-03-03 10:18 +1100
            Re: rst and pypandoc Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-03 10:32 +1100
              Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-03 20:35 +0000
          Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-02 22:40 +0000
            Re: rst and pypandoc Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-03-02 23:08 +0000
              Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-03 20:37 +0000
            Re: rst and pypandoc Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-03 10:22 +1100
              Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-03 20:46 +0000
            Re: rst and pypandoc Dave Angel <davea@davea.name> - 2015-03-02 18:23 -0500
        Re: rst and pypandoc MRAB <python@mrabarnett.plus.com> - 2015-03-02 14:37 +0000
          Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-02 22:37 +0000
            Re: rst and pypandoc Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2015-03-03 19:40 +1300
              Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-03 20:50 +0000
                Re: rst and pypandoc Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2015-03-04 11:27 +1300
        Re: rst and pypandoc MRAB <python@mrabarnett.plus.com> - 2015-03-02 14:40 +0000
        Re: rst and pypandoc Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-03 02:09 +1100

Page 1 of 2 [1] 2 Next page →

#86700 — rst and pypandoc

From	al.basili@gmail.com (alb)
Date	2015-03-02 07:59 +0000
Subject	rst and pypandoc
Message-ID	<cliji5FvctU1@mid.individual.net>

Hi everyone,

I'm writing a document in restructured text and I'd like to convert it 
to latex for printing. To accomplish this I've used semi-successfully 
pandoc and the wrapper pypandoc.

My biggest issue is with figures and references to them. We've our macro 
to allocate figures so I'm forced to bypass the rst directive /.. 
figure/, moreover I haven't happened to find how you can reference to a 
figure in the rst docs.

For all the above reasons I'm writing snippets of pure latex in my rst 
doc, but I'm having issues with the escape characters:

i = '\ref{fig:abc}'
print pypandoc.convert(i, 'latex', format='rst')
ef\{fig:abc\}

because of the \r that is interpreted by python as special character.

If I try to escape with '\' I don't seem to find a way out...

Any idea/pointer/suggestion?

Al

-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

[toc] | [next] | [standalone]

#86716

From	Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de>
Date	2015-03-02 12:03 +0100
Message-ID	<mailman.24.1425294257.13471.python-list@python.org>
In reply to	#86700

On 03/02/2015 08:59 AM, alb wrote:
> Hi everyone,
>
> I'm writing a document in restructured text and I'd like to convert it
> to latex for printing. To accomplish this I've used semi-successfully
> pandoc and the wrapper pypandoc.
>
> My biggest issue is with figures and references to them. We've our macro
> to allocate figures so I'm forced to bypass the rst directive /..
> figure/, moreover I haven't happened to find how you can reference to a
> figure in the rst docs.
>
> For all the above reasons I'm writing snippets of pure latex in my rst
> doc, but I'm having issues with the escape characters:
>
> i = '\ref{fig:abc}'
> print pypandoc.convert(i, 'latex', format='rst')
> ef\{fig:abc\}
>
> because of the \r that is interpreted by python as special character.
>
> If I try to escape with '\' I don't seem to find a way out...
>

what exactly do you mean by not finding a way out ? Escaping with a '\' 
should work. Of course, that backslash will print for clarity, but I 
suppose you want to write this to a file ? What happens if you do so ?

[toc] | [prev] | [next] | [standalone]

#86724

From	Dave Angel <davea@davea.name>
Date	2015-03-02 07:03 -0500
Message-ID	<mailman.33.1425297846.13471.python-list@python.org>
In reply to	#86700

On 03/02/2015 02:59 AM, alb wrote:
> Hi everyone,
>
> I'm writing a document in restructured text and I'd like to convert it
> to latex for printing. To accomplish this I've used semi-successfully
> pandoc and the wrapper pypandoc.

I don't see other responses yet, so I'll respond even though i don't 
know pyandoc.

>
> My biggest issue is with figures and references to them. We've our macro
> to allocate figures so I'm forced to bypass the rst directive /..
> figure/, moreover I haven't happened to find how you can reference to a
> figure in the rst docs.
>
> For all the above reasons I'm writing snippets of pure latex in my rst
> doc, but I'm having issues with the escape characters:
>
> i = '\ref{fig:abc}'
> print pypandoc.convert(i, 'latex', format='rst')
> ef\{fig:abc\}
>
> because of the \r that is interpreted by python as special character.

I don't know whether your problem is understanding what Python does with 
literals, or what pyandoc wants.  I can only help with the former.

You could try printing the i to see what it looks like, if you don't 
understand Python literal escaping.  Perhaps something like:

print "++" + i + "++"

Those pluses tend to help figure out what happens when you have control 
codes mixed in the line.  For example as it stands, the 0x0d character 
will have the effect of overwriting those first two "++"

A second method is to look at the string in hex:

     print i.encode("hex")

>
> If I try to escape with '\' I don't seem to find a way out...

You should be a lot more explicit with all three parts of that 
statement.  Try:

I'm trying to get a string of
      <here you show the string you expected from that convert statement>
When I try to escape with '\'
       i = '\\ref{fig:abc}'
I get the following exception:
       <here you include the traceback>

-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#86729

From	al.basili@gmail.com (alb)
Date	2015-03-02 12:36 +0000
Message-ID	<clj3qjF54bcU1@mid.individual.net>
In reply to	#86724

Hi Dave,

Dave Angel <davea@davea.name> wrote:
[]
> You should be a lot more explicit with all three parts of that 
> statement.  Try:
> 
> 
> I'm trying to get a string of

\ref{fig:A.B}

but unfortunately I need to go through a conversion between rst and 
latex. This is because a simple text like this:

<rst-text>
this is a simple list of items:

 - item A.
 - item B.

</rst-text>

gets translated into latex by pypandoc as this:

<latex-text>
\begin{itemize}
  \item item A.
  \item item B.
\end{itemize}
<latex-text>

And it's much simpler to write my document with rst markup rather than latex.

So my question is what should my restructured text look like in order to 
get it through pypandoc and get the following:

\ref{fig:abc}

Apparently rst only allows the following type of references:

- external hyperlink targets
- internal hyperlink targets
- indirect hyperlink targets
- implicit hyperlink targets

and I want to get a later that has a reference to a figure, but none of 
those seem to be able to do so. Therefore I thought about passing an 
inline text in my rst in order to get it through the conversion as is, 
but apparently I'm stuck with the various escaping mechanisms.

My python script reads the text and passes it on to pypandoc:

i = "%\n" % text
o = pypandoc.convert(i, 'latex', format='rst')

So if text is:

<text>
this is some text with a reference to Figure \ref{fig:abc}
</text>

I would like o to be like:

this is some text with a reference to Figaure \ref{fig:abc}

but I get:

ef\{fig:abc\}

Al

[toc] | [prev] | [next] | [standalone]

#86728

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2015-03-02 23:33 +1100
Message-ID	<54f458a5$0$13003$c3e8da3$5496439d@news.astraweb.com>
In reply to	#86700

alb wrote:

[...]
> For all the above reasons I'm writing snippets of pure latex in my rst
> doc, but I'm having issues with the escape characters:
> 
> i = '\ref{fig:abc}'

Since \r is an escape character, that will give you carriage return followed
by "ef{fig:abc".

The solution to that is to either escape the backslash:

i = '\\ref{fig:abc}'

or use a raw string:

i = r'\\ref{fig:abc}'

Oh, by the way, "i" is normally a terrible variable name for a string. Not
only doesn't it explain what the variable is for, but there is a very
strong convention in programming circles (not just Python, but hundreds of
languages) that "i" is a generic variable name for an integer. Not a
string.

> print pypandoc.convert(i, 'latex', format='rst')
> ef\{fig:abc\}
> 
> because of the \r that is interpreted by python as special character.
> 
> If I try to escape with '\' I don't seem to find a way out...

Can you show what you are doing? Escaping the backslash with another
backslash does work:

py> for c in '\\ref':
...     print(c, ord(c))
...
\ 92
r 114
e 101
f 102

so either you are doing something wrong, or the error lies elsewhere.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#86735

From	al.basili@gmail.com (alb)
Date	2015-03-02 13:51 +0000
Message-ID	<clj866F68tpU1@mid.individual.net>
In reply to	#86728

Hi Steven,

Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
[]
> Since \r is an escape character, that will give you carriage return followed
> by "ef{fig:abc".
> 
> The solution to that is to either escape the backslash:
> 
> i = '\\ref{fig:abc}'
> 
> 
> or use a raw string:
> 
> i = r'\\ref{fig:abc}'

ok, maybe I wasn't clear from the very beginning, but searching for a 
solution is a journey that takes time and patience.

The worngly named variable i (as noted below), contains the *i*nput of 
my text which is supposed to be restructured text. The output is what 
pypandoc spits out after conversion:

i = "\\begin{tag}{%s}{%s}\n %s\n \\end{tag}" % (some, restructured, text)
o = pypandoc.convert(i, 'latex', format='rst')

Now if i contains some inline text, i.e. text I do not want to convert 
in any other format, I need my text to be formatted accordingly in order 
to inject some escape symbols in i.

Rst escapes with "\", but unfortunately python also uses "\" for escaping!

> 
> Oh, by the way, "i" is normally a terrible variable name for a string. Not
> only doesn't it explain what the variable is for, but there is a very
> strong convention in programming circles (not just Python, but hundreds of
> languages) that "i" is a generic variable name for an integer. Not a
> string.

I'm not in the position to argue about good practices, I simply found 
more appropriate to have i for input and o for output, considering they 
are used like this:

i = "some string"
o = pypandoc.convert(i, ...)
f.write(o)

with very little risk to cause misunderstanding.

> Can you show what you are doing? Escaping the backslash with another
> backslash does work:
> 
> py> for c in '\\ref':
> ...     print(c, ord(c))
> ...
> \ 92
> r 114
> e 101
> f 102
> 
> so either you are doing something wrong, or the error lies elsewhere.

As said above, the string is converted by pandoc first and then printed. 
At this point the escaping becomes tricky (at least to me).

In [17]: inp = '\\ref{fig:abc}'

In [18]: print pypandoc.convert(inp, 'latex', format='rst')
ref\{fig:abc\}

Al

[toc] | [prev] | [next] | [standalone]

#86736

From	Dave Angel <davea@davea.name>
Date	2015-03-02 09:08 -0500
Message-ID	<mailman.39.1425305311.13471.python-list@python.org>
In reply to	#86735

On 03/02/2015 08:51 AM, alb wrote:
> Hi Steven,
>
> Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
> []
>> Since \r is an escape character, that will give you carriage return followed
>> by "ef{fig:abc".
>>
>> The solution to that is to either escape the backslash:
>>
>> i = '\\ref{fig:abc}'
>>
>>
>> or use a raw string:
>>
>> i = r'\\ref{fig:abc}'

Actually that'd be:
    i = r'\ref{fig:abc}'


>
> ok, maybe I wasn't clear from the very beginning, but searching for a
> solution is a journey that takes time and patience.
>
> The worngly named variable i (as noted below), contains the *i*nput of
> my text which is supposed to be restructured text. The output is what
> pypandoc spits out after conversion:
>
> i = "\\begin{tag}{%s}{%s}\n %s\n \\end{tag}" % (some, restructured, text)
> o = pypandoc.convert(i, 'latex', format='rst')
>
> Now if i contains some inline text, i.e. text I do not want to convert
> in any other format, I need my text to be formatted accordingly in order
> to inject some escape symbols in i.
>
> Rst escapes with "\", but unfortunately python also uses "\" for escaping!

Only when the string is in a literal.  If you've read it from a file, or 
built it by combining other strings, or...  then the backslash is just 
another character to Python.

>
>>
>> Oh, by the way, "i" is normally a terrible variable name for a string. Not
>> only doesn't it explain what the variable is for, but there is a very
>> strong convention in programming circles (not just Python, but hundreds of
>> languages) that "i" is a generic variable name for an integer. Not a
>> string.
>
> I'm not in the position to argue about good practices, I simply found
> more appropriate to have i for input and o for output, considering they
> are used like this:
>
> i = "some string"
> o = pypandoc.convert(i, ...)
> f.write(o)
>
> with very little risk to cause misunderstanding.

How about "in" and "out"?  Or perhaps some name that indicates what 
semantics the string represents, like   "rst_string"  and "html_string" 
or whatever they actually are?

>
>> Can you show what you are doing? Escaping the backslash with another
>> backslash does work:
>>
>> py> for c in '\\ref':
>> ...     print(c, ord(c))
>> ...
>> \ 92
>> r 114
>> e 101
>> f 102
>>
>> so either you are doing something wrong, or the error lies elsewhere.
>
> As said above, the string is converted by pandoc first and then printed.
> At this point the escaping becomes tricky (at least to me).
>
> In [17]: inp = '\\ref{fig:abc}'
>
> In [18]: print pypandoc.convert(inp, 'latex', format='rst')
> ref\{fig:abc\}
>

What did you expect/desire the pyandoc output to be?  Now that you don't 
have the embedded 0x0a, is there something else that's wrong?

If it's in the internals of pyandoc, I'll probably be of no help.  But 
your first question was about escaping;  I'm not sure what it's about now.

-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#86741

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2015-03-03 01:43 +1100
Message-ID	<54f47707$0$12979$c3e8da3$5496439d@news.astraweb.com>
In reply to	#86736

Dave Angel wrote:

> On 03/02/2015 08:51 AM, alb wrote:
>> Hi Steven,
>>
>> Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
>> []
>>> Since \r is an escape character, that will give you carriage return
>>> followed by "ef{fig:abc".
>>>
>>> The solution to that is to either escape the backslash:
>>>
>>> i = '\\ref{fig:abc}'
>>>
>>>
>>> or use a raw string:
>>>
>>> i = r'\\ref{fig:abc}'
> 
> Actually that'd be:
>     i = r'\ref{fig:abc}'


D'oh!

I mean, you spotted my deliberate mistake to check if you were paying
attention. Well done!


> How about "in" and "out"?  Or perhaps some name that indicates what
> semantics the string represents, like   "rst_string"  and "html_string"
> or whatever they actually are?

Can't use "in", it's a keyword.




-- 
Steven

[toc] | [prev] | [next] | [standalone]

#86770

From	Dave Angel <davea@davea.name>
Date	2015-03-02 13:55 -0500
Message-ID	<mailman.57.1425322560.13471.python-list@python.org>
In reply to	#86741

On 03/02/2015 09:43 AM, Steven D'Aprano wrote:
> Dave Angel wrote:
>
>> On 03/02/2015 08:51 AM, alb wrote:
>>> Hi Steven,
>>>
>>> Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

>>>>
>>>> or use a raw string:
>>>>
>>>> i = r'\\ref{fig:abc}'
>>
>> Actually that'd be:
>>      i = r'\ref{fig:abc}'
>
>
> D'oh!
>
> I mean, you spotted my deliberate mistake to check if you were paying
> attention. Well done!
>
>
>> How about "in" and "out"?  Or perhaps some name that indicates what
>> semantics the string represents, like   "rst_string"  and "html_string"
>> or whatever they actually are?
>
> Can't use "in", it's a keyword.
>

And D'oh right back at ya.  Ironic isn't it that I make a second mistake 
in the same message I correct yours?


-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#86775

From	Ben Finney <ben+python@benfinney.id.au>
Date	2015-03-03 06:09 +1100
Message-ID	<mailman.60.1425323380.13471.python-list@python.org>
In reply to	#86741

Dave Angel <davea@davea.name> writes:

> And D'oh right back at ya.  Ironic isn't it that I make a second
> mistake in the same message I correct yours?

<URL:https://en.wikipedia.org/wiki/Muphry%27s_law>

-- 
 \         “Truth would quickly cease to become stranger than fiction, |
  `\                     once we got as used to it.” —Henry L. Mencken |
_o__)                                                                  |
Ben Finney

[toc] | [prev] | [next] | [standalone]

#86843

From	Dave Angel <davea@davea.name>
Date	2015-03-02 14:16 -0500
Message-ID	<mailman.14.1425392617.21433.python-list@python.org>
In reply to	#86741

On 03/02/2015 02:09 PM, Ben Finney wrote:
> Dave Angel <davea@davea.name> writes:
>
>> And D'oh right back at ya.  Ironic isn't it that I make a second
>> mistake in the same message I correct yours?
>
> <URL:https://en.wikipedia.org/wiki/Muphry%27s_law>
>

I guess that word is too small to qualify as a malapropism, a word which 
I usually pronounce  "Mollypropism."


-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#86786

From	al.basili@gmail.com (alb)
Date	2015-03-02 22:30 +0000
Message-ID	<clk6j8Feal8U1@mid.individual.net>
In reply to	#86736

Hi Dave,

Dave Angel <davea@davea.name> wrote:
[]
>> Rst escapes with "\", but unfortunately python also uses "\" for escaping!
> 
> Only when the string is in a literal.  If you've read it from a file, or 
> built it by combining other strings, or...  then the backslash is just 
> another character to Python.

Holy s***t! that is enlightning. I'm not going to ask why is that so, 
but essentially this changes everything. Indeed I'm passing some strings 
as literal (as my example), some others are simply read from a file 
(well the file is read into a list of dictionaries and then I convert 
one of those keys into latex).

The it would mean that the following text (in a file) should be 
swallowed by python as if the backslash was just another character:

<test.txt>
this is \some text
</test.txt>

unfortunately when I pass that to pypandoc, as if it was restructured 
text, I get the following:

In [36]: f = open('test.txt', 'r')

In [37]: s = f.read()

In [38]: print s
this is \some restructured text.

In [39]: print pypandoc.convert(s, 'latex', format='rst')
this is some restructured text.

what happened to my backslash???

If I try to escape my backslash I get something worse:

In [40]: f = open('test.txt', 'r')

In [41]: s = f.read()

In [42]: print s
this is \\some restructured text.

In [43]: print pypandoc.convert(s, 'latex', format='rst')
this is \textbackslash{}some restructured text.

since a literal backslash gets converted to a literal latex backslash.

[]
>> As said above, the string is converted by pandoc first and then printed.
>> At this point the escaping becomes tricky (at least to me).
>>
>> In [17]: inp = '\\ref{fig:abc}'
>>
>> In [18]: print pypandoc.convert(inp, 'latex', format='rst')
>> ref\{fig:abc\}
>>
> 
> What did you expect/desire the pyandoc output to be?  Now that you don't 
> have the embedded 0x0a, is there something else that's wrong?

I need to get \ref{fig:abc} in my latex file in order to get a 
reference. It seems to me I'm not able to pass inline text to pandoc and 
every backslash is treated...somehow.

> If it's in the internals of pyandoc, I'll probably be of no help.  But 
> your first question was about escaping;  I'm not sure what it's about now.

It's still about escaping in both python and restructured text since I 
want my substring (is part of the text) to pass unchanged through 
pypandoc. 

Al

[toc] | [prev] | [next] | [standalone]

#86790

From	Chris Angelico <rosuav@gmail.com>
Date	2015-03-03 09:51 +1100
Message-ID	<mailman.68.1425336728.13471.python-list@python.org>
In reply to	#86786

On Tue, Mar 3, 2015 at 9:30 AM, alb <al.basili@gmail.com> wrote:
> Hi Dave,
>
> Dave Angel <davea@davea.name> wrote:
> []
>>> Rst escapes with "\", but unfortunately python also uses "\" for escaping!
>>
>> Only when the string is in a literal.  If you've read it from a file, or
>> built it by combining other strings, or...  then the backslash is just
>> another character to Python.
>
> Holy s***t! that is enlightning. I'm not going to ask why is that so,
> but essentially this changes everything. Indeed I'm passing some strings
> as literal (as my example), some others are simply read from a file
> (well the file is read into a list of dictionaries and then I convert
> one of those keys into latex).

You have two different things happening here. The first is the concept
of a "string literal", and the second is how pandoc handles things.

Python's string literals come in a few different forms, but the most
common is the one that looks the same as in several other languages.
You start with a quote character, you put all your stuff in the
middle, and you finish with another quote:

"Hello, world!"

Trouble is, this makes it really hard to put quotes into your string:

"I said, "Hello, world!""

That's not going to work properly! So we need to tell Python that
those interior quotes aren't the end of the string. That's done with a
backslash:

"I said, \"Hello, world!\""

And of course, that means you have to escape the backslash if you want
to have one in the text. But all of this is just for putting *string
literals* into your source code. If it's not Python source code, these
rules don't apply. You can read a line of text from the user and it'll
be unchanged:

>>> msg = input("Enter a string: ")
Enter a string: This is a string, but not a "string literal".
>>> print(msg)
This is a string, but not a "string literal".

(in Python 2, use raw_input instead of input)

Same applies to reading from a file, or anywhere else. If it's not
Python source code, it doesn't matter what characters are in the
string, they're all just characters.

> unfortunately when I pass that to pypandoc, as if it was restructured
> text, I get the following:
>
> In [36]: f = open('test.txt', 'r')
>
> In [37]: s = f.read()
>
> In [38]: print s
> this is \some restructured text.
>
>
> In [39]: print pypandoc.convert(s, 'latex', format='rst')
> this is some restructured text.
>
> what happened to my backslash???

That's something you'll have to figure out with pypandoc. I don't know
how it interprets the backslash, so you'll have to dig into its
documentation. At least now, though, you can print out your string and
see that it really does have its backslash in it.

ChrisA

[toc] | [prev] | [next] | [standalone]

#86792

From	Ben Finney <ben+python@benfinney.id.au>
Date	2015-03-03 10:18 +1100
Message-ID	<mailman.70.1425338274.13471.python-list@python.org>
In reply to	#86786

Chris Angelico <rosuav@gmail.com> writes:

> And of course, that means you have to escape the backslash if you want
> to have one in the text. But all of this is just for putting *string
> literals* into your source code. If it's not Python source code, these
> rules don't apply. You can read a line of text from the user and it'll
> be unchanged

To put it another way: The source code is not the value itself. The
string value is created *from* the characters in the source code, and
the sequence of characters in the string value may be different.

When the string value comes from somewhere else, it bypasses this
interpretation of source code — because it's not source code!

String literals exist in your Python source code. They are not the same
thing as the string value itself, and the sequence fo characters may be
different.

-- 
 \     “Try adding “as long as you don't breach the terms of service – |
  `\          according to our sole judgement” to the end of any cloud |
_o__)                      computing pitch.” —Simon Phipps, 2010-12-11 |
Ben Finney

[toc] | [prev] | [next] | [standalone]

#86794

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2015-03-03 10:32 +1100
Message-ID	<54f4f307$0$12979$c3e8da3$5496439d@news.astraweb.com>
In reply to	#86786

alb wrote:

> In [39]: print pypandoc.convert(s, 'latex', format='rst')
> this is some restructured text.
> 
> what happened to my backslash???

You'll need to read your pypandoc documentation to see what it says about
backslashes.

> If I try to escape my backslash I get something worse:
> 
> In [40]: f = open('test.txt', 'r')
> 
> In [41]: s = f.read()
> 
> In [42]: print s
> this is \\some restructured text.
> 
> 
> In [43]: print pypandoc.convert(s, 'latex', format='rst')
> this is \textbackslash{}some restructured text.
> 
> since a literal backslash gets converted to a literal latex backslash.

Why is this a problem? Isn't the ultimate aim to pass it through latex,
which will then covert the \textbackslash{} back into a backslash? If not,
I have misunderstood something.

If not, you could do something like this:

s = 'this is %(b)ssome restructured text.'
t = pypandoc.convert(s, 'latex', format='rst')
assert t == 'this is %(b)ssome restructured text.'
print t % {'b': '\\'}

taking care to escape any actual percent signs in your text as '%%'.

To be clear, what I'm doing here is using Python's % string interpolation to
post-process the Latex output:

- replace every '%' in your input string with '%%';
- replace every backslash in your input string with '%(b)s';
- convert;
- post-process using %.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#86862

From	al.basili@gmail.com (alb)
Date	2015-03-03 20:35 +0000
Message-ID	<clmk8tF2fphU1@mid.individual.net>
In reply to	#86794

Hi Steven,

Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
[]
>> In [43]: print pypandoc.convert(s, 'latex', format='rst')
>> this is \textbackslash{}some restructured text.
>> 
>> since a literal backslash gets converted to a literal latex backslash.
> 
> Why is this a problem? Isn't the ultimate aim to pass it through latex,
> which will then covert the \textbackslash{} back into a backslash? If not,
> I have misunderstood something.

\textbackslash{} is a latex command to typeset a backslash into the 
text. This is not what I need. I need to have a string of the form 
"\some" (actually we are talking about \ref or \hyperref commands).

> If not, you could do something like this:
> 
> s = 'this is %(b)ssome restructured text.'
> t = pypandoc.convert(s, 'latex', format='rst')
> assert t == 'this is %(b)ssome restructured text.'
> print t % {'b': '\\'}

This is somehow what I'm doing now, but is very dirty and difficult to 
expand to other corner cases.

Al

[toc] | [prev] | [next] | [standalone]

#86788

From	al.basili@gmail.com (alb)
Date	2015-03-02 22:40 +0000
Message-ID	<clk777Feal8U3@mid.individual.net>
In reply to	#86736

Hi Dave,

Dave Angel <davea@davea.name> wrote:
[]
>>> or use a raw string:
>>>
>>> i = r'\\ref{fig:abc}'
> 
> Actually that'd be:
>    i = r'\ref{fig:abc}'

Could you explain why I then see the following difference:

In [56]: inp = r'\\ref{fig:abc}'

In [57]: print pypandoc.convert(inp, 'latex', format='rst')
\textbackslash{}ref\{fig:abc\}


In [58]: inp = r'\ref{fig:abc}'

In [59]: print pypandoc.convert(inp, 'latex', format='rst')
ref\{fig:abc\}

The two results are clearly *not* the same, even though the two inp 
/claim/ to be the same...

Al

[toc] | [prev] | [next] | [standalone]

#86791

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2015-03-02 23:08 +0000
Message-ID	<mailman.69.1425337698.13471.python-list@python.org>
In reply to	#86788

On 02/03/2015 22:40, alb wrote:
> Hi Dave,
>
> Dave Angel <davea@davea.name> wrote:
> []
>>>> or use a raw string:
>>>>
>>>> i = r'\\ref{fig:abc}'
>>
>> Actually that'd be:
>>     i = r'\ref{fig:abc}'
>
> Could you explain why I then see the following difference:
>
> In [56]: inp = r'\\ref{fig:abc}'
>
> In [57]: print pypandoc.convert(inp, 'latex', format='rst')
> \textbackslash{}ref\{fig:abc\}
>
>
> In [58]: inp = r'\ref{fig:abc}'
>
> In [59]: print pypandoc.convert(inp, 'latex', format='rst')
> ref\{fig:abc\}
>
> The two results are clearly *not* the same, even though the two inp
> /claim/ to be the same...
>
> Al
>

The two inps are *not* the same.  Steven D'Aprano mislead you with a 
typo, or so he claims :)  Dave Angel pointed this out.  Steven replied. 
  You've either missed these emails or simply not read them.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]

#86863

From	al.basili@gmail.com (alb)
Date	2015-03-03 20:37 +0000
Message-ID	<clmkbuF2fphU2@mid.individual.net>
In reply to	#86791

Hi Mark,

Mark Lawrence <breamoreboy@yahoo.co.uk> wrote:
[]
> The two inps are *not* the same.  

My bad. I did not notice the difference, thanks for pointing that out.

Al

[toc] | [prev] | [next] | [standalone]

#86793

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2015-03-03 10:22 +1100
Message-ID	<54f4f0d0$0$12995$c3e8da3$5496439d@news.astraweb.com>
In reply to	#86788

alb wrote:

> Could you explain why I then see the following difference:
> 
> In [56]: inp = r'\\ref{fig:abc}'
> 
> In [57]: print pypandoc.convert(inp, 'latex', format='rst')
> \textbackslash{}ref\{fig:abc\}
> 
> 
> In [58]: inp = r'\ref{fig:abc}'
> 
> In [59]: print pypandoc.convert(inp, 'latex', format='rst')
> ref\{fig:abc\}
> 
> The two results are clearly *not* the same, even though the two inp
> /claim/ to be the same...

The two inp are not the same.

I'm sorry if I confused you with my earlier typo, but 

    inp = r'\\ref{fig:abc}'

starts with TWO backslashes, while:

    inp = r'\ref{fig:abc}'

starts with ONE backslash. Not the same.

I suspect you've been hitting your head against this problem for so long
you're starting to shy at shadows. Take a step back, a deep breath, and
remember your basic debugging skills:

a = r'\\ref{fig:abc}'
b = r'\ref{fig:abc}'
print a == b
print a, b
print repr(a), repr(b)
print len(a), len(b)

I'm sure that you know how to do such simple things to investigate whether
two inputs are in fact the same or not, and the fact that you failed to do
so is just a sign of your frustration and stress.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

Page 1 of 2 [1] 2 Next page →

csiph-web

rst and pypandoc

Contents

#86700 — rst and pypandoc

#86716

#86724

#86729

#86728

#86735

#86736

#86741

#86770

#86775

#86843

#86786

#86790

#86792

#86794

#86862

#86788

#86791

#86863

#86793