Groups > comp.lang.python > #86700 > unrolled thread

rst and pypandoc

Started by	al.basili@gmail.com (alb)
First post	2015-03-02 07:59 +0000
Last post	2015-03-03 02:09 +1100
Articles	9 on this page of 29 — 9 participants

Back to article view | Back to comp.lang.python

  rst and pypandoc al.basili@gmail.com (alb) - 2015-03-02 07:59 +0000
    Re: rst and pypandoc Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2015-03-02 12:03 +0100
    Re: rst and pypandoc Dave Angel <davea@davea.name> - 2015-03-02 07:03 -0500
      Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-02 12:36 +0000
    Re: rst and pypandoc Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-02 23:33 +1100
      Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-02 13:51 +0000
        Re: rst and pypandoc Dave Angel <davea@davea.name> - 2015-03-02 09:08 -0500
          Re: rst and pypandoc Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-03 01:43 +1100
            Re: rst and pypandoc Dave Angel <davea@davea.name> - 2015-03-02 13:55 -0500
            Re: rst and pypandoc Ben Finney <ben+python@benfinney.id.au> - 2015-03-03 06:09 +1100
            Re: rst and pypandoc Dave Angel <davea@davea.name> - 2015-03-02 14:16 -0500
          Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-02 22:30 +0000
            Re: rst and pypandoc Chris Angelico <rosuav@gmail.com> - 2015-03-03 09:51 +1100
            Re: rst and pypandoc Ben Finney <ben+python@benfinney.id.au> - 2015-03-03 10:18 +1100
            Re: rst and pypandoc Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-03 10:32 +1100
              Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-03 20:35 +0000
          Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-02 22:40 +0000
            Re: rst and pypandoc Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-03-02 23:08 +0000
              Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-03 20:37 +0000
            Re: rst and pypandoc Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-03 10:22 +1100
              Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-03 20:46 +0000
            Re: rst and pypandoc Dave Angel <davea@davea.name> - 2015-03-02 18:23 -0500
        Re: rst and pypandoc MRAB <python@mrabarnett.plus.com> - 2015-03-02 14:37 +0000
          Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-02 22:37 +0000
            Re: rst and pypandoc Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2015-03-03 19:40 +1300
              Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-03 20:50 +0000
                Re: rst and pypandoc Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2015-03-04 11:27 +1300
        Re: rst and pypandoc MRAB <python@mrabarnett.plus.com> - 2015-03-02 14:40 +0000
        Re: rst and pypandoc Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-03 02:09 +1100

Page 2 of 2 — ← Prev page 1 [2]

#86865

From	al.basili@gmail.com (alb)
Date	2015-03-03 20:46 +0000
Message-ID	<clmktlF2fphU3@mid.individual.net>
In reply to	#86793

Hi Steven,

Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
[]
>> The two results are clearly *not* the same, even though the two inp
>> /claim/ to be the same...
> 
> The two inp are not the same.

Correct. My statement was wrong.

[]
> I'm sure that you know how to do such simple things to investigate whether
> two inputs are in fact the same or not, and the fact that you failed to do
> so is just a sign of your frustration and stress.

You nailed it! Indeed there were all the symptoms of a stressed 
situation from the very beginning:

1. the OP was unclear and full of misleading information

2. part of the posts were misunderstood, hence causing more confusion 
than anything else.

3. my code has become a mess of workarounds, being far from pythonic.

Now that the delivery date is passed behind me I'll have some time to 
clean up the mess and get everything straight. Being pragmatic and 
finding workaround is not bad but the mess should be cleaned up 
afterwards!

Al

[toc] | [prev] | [next] | [standalone]

#86795

From	Dave Angel <davea@davea.name>
Date	2015-03-02 18:23 -0500
Message-ID	<mailman.71.1425339347.13471.python-list@python.org>
In reply to	#86788

On 03/02/2015 05:40 PM, alb wrote:
> Hi Dave,
>
> Dave Angel <davea@davea.name> wrote:
> []
>>>> or use a raw string:
>>>>
>>>> i = r'\\ref{fig:abc}'
>>
>> Actually that'd be:
>>     i = r'\ref{fig:abc}'
>
> Could you explain why I then see the following difference:
>
> In [56]: inp = r'\\ref{fig:abc}'

print inp
    and you should get
      \\ref{fig:abc}

>
> In [57]: print pypandoc.convert(inp, 'latex', format='rst')
> \textbackslash{}ref\{fig:abc\}
>
>
> In [58]: inp = r'\ref{fig:abc}'

print inp
     and you should get
        \ref{fig:abc}

This is NOT the same.

The rules are not arbitrary.  They're quite necessary, and it's the same 
for lots of different languages.  When in a regular literal, the 
backslash is an escape character that combines with the following 
character.  When in a raw literal, the backslash is a backslash, unless 
it's at the end of the string, in which case it's not the end of the 
string, it's an escaped quotation.  (Or something.  Just don't use 
*trailing* backslash in a raw literal)

>
> In [59]: print pypandoc.convert(inp, 'latex', format='rst')
> ref\{fig:abc\}
>
> The two results are clearly *not* the same, even though the two inp
> /claim/ to be the same...
>

When I said backslashes are not special in data read from a file, I 
should also say neither are quotes, or tabs, or anything else.  Python 
just reads them in, and stuffs them into a string object.  Newlines are 
special if you use readline(), but if you use read(), they're not 
special either (except on MSDOS compatible variants, which use two bytes 
for newline.  Even there, if you read a file in "b" mode, they're not 
special either.

So your code is going to mostly be getting strings from files, or from 
calculations, and these backslashes won't be special.  It's only in 
*testing* that you usually deal with this literal stuff.  Or in places 
where the data is fixed, and hardcoded in the source.

-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#86738

From	MRAB <python@mrabarnett.plus.com>
Date	2015-03-02 14:37 +0000
Message-ID	<mailman.41.1425307075.13471.python-list@python.org>
In reply to	#86735

On 2015-03-02 13:51, alb wrote:
> Hi Steven,
>
> Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
> []
>> Since \r is an escape character, that will give you carriage return followed
>> by "ef{fig:abc".
>>
>> The solution to that is to either escape the backslash:
>>
>> i = '\\ref{fig:abc}'
>>
>>
>> or use a raw string:
>>
>> i = r'\\ref{fig:abc}'
>
> ok, maybe I wasn't clear from the very beginning, but searching for a
> solution is a journey that takes time and patience.
>
> The worngly named variable i (as noted below), contains the *i*nput of
> my text which is supposed to be restructured text. The output is what
> pypandoc spits out after conversion:
>
> i = "\\begin{tag}{%s}{%s}\n %s\n \\end{tag}" % (some, restructured, text)
> o = pypandoc.convert(i, 'latex', format='rst')
>
> Now if i contains some inline text, i.e. text I do not want to convert
> in any other format, I need my text to be formatted accordingly in order
> to inject some escape symbols in i.
>
> Rst escapes with "\", but unfortunately python also uses "\" for escaping!
>
>>
>> Oh, by the way, "i" is normally a terrible variable name for a string. Not
>> only doesn't it explain what the variable is for, but there is a very
>> strong convention in programming circles (not just Python, but hundreds of
>> languages) that "i" is a generic variable name for an integer. Not a
>> string.
>
> I'm not in the position to argue about good practices, I simply found
> more appropriate to have i for input and o for output, considering they
> are used like this:
>
> i = "some string"
> o = pypandoc.convert(i, ...)
> f.write(o)
>
> with very little risk to cause misunderstanding.
>
>> Can you show what you are doing? Escaping the backslash with another
>> backslash does work:
>>
>> py> for c in '\\ref':
>> ...     print(c, ord(c))
>> ...
>> \ 92
>> r 114
>> e 101
>> f 102
>>
>> so either you are doing something wrong, or the error lies elsewhere.
>
> As said above, the string is converted by pandoc first and then printed.
> At this point the escaping becomes tricky (at least to me).
>
> In [17]: inp = '\\ref{fig:abc}'
>
> In [18]: print pypandoc.convert(inp, 'latex', format='rst')
> ref\{fig:abc\}
>
Have you tried escaping the escape character by doubling the backslash?

inp = '\\\\ref{fig:abc}'

or:

inp = r'\\ref{fig:abc}'

[toc] | [prev] | [next] | [standalone]

#86787

From	al.basili@gmail.com (alb)
Date	2015-03-02 22:37 +0000
Message-ID	<clk70gFeal8U2@mid.individual.net>
In reply to	#86738

Hi MRAB,

MRAB <python@mrabarnett.plus.com> wrote:
[]
> Have you tried escaping the escape character by doubling the backslash?
> 
> inp = '\\\\ref{fig:abc}'

In [54]: inp = '\\\\ref{fig:abc}'

In [55]: print pypandoc.convert(inp, 'latex', format='rst')
\textbackslash{}ref\{fig:abc\}

the backslash is considered as literal text for latex and is escaped 
with the appropriate command.

> or:
> 
> inp = r'\\ref{fig:abc}'
> 

In [56]: inp = r'\\ref{fig:abc}'

In [57]: print pypandoc.convert(inp, 'latex', format='rst')
\textbackslash{}ref\{fig:abc\}

same as above. The result I aim to would be:

In [BINGO]: print pypandoc.convert(inp, 'latex', format='rst')
\ref{fig:abc}

Al

[toc] | [prev] | [next] | [standalone]

#86818

From	Gregory Ewing <greg.ewing@canterbury.ac.nz>
Date	2015-03-03 19:40 +1300
Message-ID	<cll3adFl0hpU1@mid.individual.net>
In reply to	#86787

alb wrote:
> The result I aim to would be:
> 
> In [BINGO]: print pypandoc.convert(inp, 'latex', format='rst')
> \ref{fig:abc}

 From a cursory reading of the pypandoc docs, it looks
like enabling the raw_tex extension in pypandoc will
give you what you want.

Search for raw_tex on this page:

http://johnmacfarlane.net/pandoc/README.html

-- 
Greg

[toc] | [prev] | [next] | [standalone]

#86866

From	al.basili@gmail.com (alb)
Date	2015-03-03 20:50 +0000
Message-ID	<clml46F2fphU4@mid.individual.net>
In reply to	#86818

Hi Gregory,

Gregory Ewing <greg.ewing@canterbury.ac.nz> wrote:
[]
> From a cursory reading of the pypandoc docs, it looks
> like enabling the raw_tex extension in pypandoc will
> give you what you want.
> 
> Search for raw_tex on this page:
> 
> http://johnmacfarlane.net/pandoc/README.html

As far as I understood the docs, it seems this extension should be 
passed to pandoc through +EXTERNSION, but I don't seem to get it 
working:

In [14]: print pypandoc.convert(s, 'latex', format="md+raw_tex")
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-14-f41e67057a59> in <module>()
----> 1 print pypandoc.convert(s, 'latex', format="md+raw_tex")

/usr/local/lib/python2.7/dist-packages/pypandoc.pyc in convert(source, to, format, extra_args, encoding)
     25     '''
     26     return _convert(_read_file, _process_file, source, to,
---> 27                     format, extra_args, encoding=encoding)
     28 
     29 

/usr/local/lib/python2.7/dist-packages/pypandoc.pyc in _convert(reader, processor, source, to, format, extra_args, encoding)
     50         raise RuntimeError(
     51             'Invalid input format! Expected one of these: ' +
---> 52             ', '.join(from_formats))
     53 
     54     if to not in to_formats:

RuntimeError: Invalid input format! Expected one of these: native, json, markdown, markdown+lhs, rst, rst+lhs, docbook, textile, html, latex, latex+lhs

[toc] | [prev] | [next] | [standalone]

#86868

From	Gregory Ewing <greg.ewing@canterbury.ac.nz>
Date	2015-03-04 11:27 +1300
Message-ID	<clmqqqF52b1U1@mid.individual.net>
In reply to	#86866

alb wrote:
> RuntimeError: Invalid input format! Expected one of these: native, json,
> markdown, markdown+lhs, rst, rst+lhs, docbook, textile, html, latex,
> latex+lhs

It looks like it's expecting the base format to be spelled
"markdown", not abbreviated to "md". (The python wrapper
expands "md" to "markdown", but not if it's followed by
any + or - options.) So try:

pypandoc.convert(s, 'latex', format="markdown+raw_tex")

BTW, I just installed pandoc on MacOSX to try this out,
and it seems that raw_tex is enabled by default for me --
I have to turn it *off* with format="markdown-raw_tex"
in order to get the behaviour you're seeing. Maybe a
different version? My pandoc says it's version 1.12.0.1.

-- 
Greg

[toc] | [prev] | [next] | [standalone]

#86740

From	MRAB <python@mrabarnett.plus.com>
Date	2015-03-02 14:40 +0000
Message-ID	<mailman.42.1425307205.13471.python-list@python.org>
In reply to	#86735

On 2015-03-02 14:08, Dave Angel wrote:
> On 03/02/2015 08:51 AM, alb wrote:
>> Hi Steven,
>>
>> Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
>>>
[snip]

>>> Oh, by the way, "i" is normally a terrible variable name for a string. Not
>>> only doesn't it explain what the variable is for, but there is a very
>>> strong convention in programming circles (not just Python, but hundreds of
>>> languages) that "i" is a generic variable name for an integer. Not a
>>> string.
>>
>> I'm not in the position to argue about good practices, I simply found
>> more appropriate to have i for input and o for output, considering they
>> are used like this:
>>
>> i = "some string"
>> o = pypandoc.convert(i, ...)
>> f.write(o)
>>
>> with very little risk to cause misunderstanding.
>
> How about "in" and "out"?  Or perhaps some name that indicates what
> semantics the string represents, like   "rst_string"  and "html_string"
> or whatever they actually are?
>
[snip]

"in" is a reserved word, but "in_" would be OK.

[toc] | [prev] | [next] | [standalone]

#86746

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2015-03-03 02:09 +1100
Message-ID	<54f47d1d$0$12984$c3e8da3$5496439d@news.astraweb.com>
In reply to	#86735

alb wrote:

> Hi Steven,
> 
> Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
> []
>> Since \r is an escape character, that will give you carriage return
>> followed by "ef{fig:abc".
>> 
>> The solution to that is to either escape the backslash:
>> 
>> i = '\\ref{fig:abc}'
>> 
>> 
>> or use a raw string:
>> 
>> i = r'\\ref{fig:abc}'

Dave has corrected my typo in the above: it should be r'\ref', the whole
point of raw strings is that you don't need to escape the backslashes.

> ok, maybe I wasn't clear from the very beginning, but searching for a
> solution is a journey that takes time and patience.
> 
> The worngly named variable i (as noted below), contains the *i*nput of
> my text which is supposed to be restructured text. The output is what
> pypandoc spits out after conversion:

Ah, well that's not a bad convention for small utility functions, but I
wouldn't want single-letter names to be used in anything bigger than, say,
a dozen lines. Having i for input and o for output right next to each other
helps too. But you're still swimming against the convention that i means an
integer. Whether you decide it is worth going against that convention in
your own code is up to you, but when asking for help, it is worth your
while to be the least surprising or different as you can manage.

> i = "\\begin{tag}{%s}{%s}\n %s\n \\end{tag}" % (some, restructured, text)
> o = pypandoc.convert(i, 'latex', format='rst')
> 
> Now if i contains some inline text, i.e. text I do not want to convert
> in any other format, I need my text to be formatted accordingly in order
> to inject some escape symbols in i.
> 
> Rst escapes with "\", but unfortunately python also uses "\" for escaping!

Yes, but only in string literals. In Python source code, "\r" makes a
carriage return, but when reading from the keyboard (say, using the
raw_input function), from a file, or anything other than a string literal,
a string consisting of "\r" is just backslash-r.

So, worst case, you can always assemble your strings like this:

backslash = chr(92)
i = (backslash + "begin{tag}{%s}{%s}\n %s\n " + backslash + "end{tag}" 
        % (some, restructured, text))

although that is a PITA.

I recommend using raw triple strings, and avoid needing \n escapes:

i = r"""\begin{tag}{%s}{%s}
 %s
 \end{tag}""" % (some, restructured, text)

>> Can you show what you are doing? Escaping the backslash with another
>> backslash does work:
>> 
>> py> for c in '\\ref':
>> ...     print(c, ord(c))
>> ...
>> \ 92
>> r 114
>> e 101
>> f 102
>> 
>> so either you are doing something wrong, or the error lies elsewhere.
> 
> As said above, the string is converted by pandoc first and then printed.
> At this point the escaping becomes tricky (at least to me).
> 
> In [17]: inp = '\\ref{fig:abc}'

If you print inp at this point, you should see that it contains exactly what
you expect: backslash, R E F etc.

> In [18]: print pypandoc.convert(inp, 'latex', format='rst')
> ref\{fig:abc\}

and now the backslash is gone, and the braces are escaped. This suggests
that the problems lies with pypandoc. Perhaps you need to add extra
backslashes, so that pypandoc will convert a double-backslash to a single
one. Consult your pypandoc documentation, and try this:

inp = '\\\\ref{fig:abc}'  # That's FOUR backslashes, to get \\

# or as a raw-string:

inp = '\\ref{fig:abc}'
assert inp[0] == inp[1] == chr(92)
out = pypandoc.convert(inp, 'latex', format='rst') 
print out, out == r"\ref\{fig:abc\}"

-- 
Steven

[toc] | [prev] | [standalone]

Page 2 of 2 — ← Prev page 1 [2]

csiph-web

rst and pypandoc

Contents

#86865

#86795

#86738

#86787

#86818

#86866

#86868

#86740

#86746