Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #86700 > unrolled thread
| Started by | al.basili@gmail.com (alb) |
|---|---|
| First post | 2015-03-02 07:59 +0000 |
| Last post | 2015-03-03 02:09 +1100 |
| Articles | 9 on this page of 29 — 9 participants |
Back to article view | Back to comp.lang.python
rst and pypandoc al.basili@gmail.com (alb) - 2015-03-02 07:59 +0000
Re: rst and pypandoc Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2015-03-02 12:03 +0100
Re: rst and pypandoc Dave Angel <davea@davea.name> - 2015-03-02 07:03 -0500
Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-02 12:36 +0000
Re: rst and pypandoc Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-02 23:33 +1100
Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-02 13:51 +0000
Re: rst and pypandoc Dave Angel <davea@davea.name> - 2015-03-02 09:08 -0500
Re: rst and pypandoc Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-03 01:43 +1100
Re: rst and pypandoc Dave Angel <davea@davea.name> - 2015-03-02 13:55 -0500
Re: rst and pypandoc Ben Finney <ben+python@benfinney.id.au> - 2015-03-03 06:09 +1100
Re: rst and pypandoc Dave Angel <davea@davea.name> - 2015-03-02 14:16 -0500
Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-02 22:30 +0000
Re: rst and pypandoc Chris Angelico <rosuav@gmail.com> - 2015-03-03 09:51 +1100
Re: rst and pypandoc Ben Finney <ben+python@benfinney.id.au> - 2015-03-03 10:18 +1100
Re: rst and pypandoc Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-03 10:32 +1100
Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-03 20:35 +0000
Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-02 22:40 +0000
Re: rst and pypandoc Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-03-02 23:08 +0000
Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-03 20:37 +0000
Re: rst and pypandoc Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-03 10:22 +1100
Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-03 20:46 +0000
Re: rst and pypandoc Dave Angel <davea@davea.name> - 2015-03-02 18:23 -0500
Re: rst and pypandoc MRAB <python@mrabarnett.plus.com> - 2015-03-02 14:37 +0000
Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-02 22:37 +0000
Re: rst and pypandoc Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2015-03-03 19:40 +1300
Re: rst and pypandoc al.basili@gmail.com (alb) - 2015-03-03 20:50 +0000
Re: rst and pypandoc Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2015-03-04 11:27 +1300
Re: rst and pypandoc MRAB <python@mrabarnett.plus.com> - 2015-03-02 14:40 +0000
Re: rst and pypandoc Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-03 02:09 +1100
Page 2 of 2 — ← Prev page 1 [2]
| From | al.basili@gmail.com (alb) |
|---|---|
| Date | 2015-03-03 20:46 +0000 |
| Message-ID | <clmktlF2fphU3@mid.individual.net> |
| In reply to | #86793 |
Hi Steven, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: [] >> The two results are clearly *not* the same, even though the two inp >> /claim/ to be the same... > > The two inp are not the same. Correct. My statement was wrong. [] > I'm sure that you know how to do such simple things to investigate whether > two inputs are in fact the same or not, and the fact that you failed to do > so is just a sign of your frustration and stress. You nailed it! Indeed there were all the symptoms of a stressed situation from the very beginning: 1. the OP was unclear and full of misleading information 2. part of the posts were misunderstood, hence causing more confusion than anything else. 3. my code has become a mess of workarounds, being far from pythonic. Now that the delivery date is passed behind me I'll have some time to clean up the mess and get everything straight. Being pragmatic and finding workaround is not bad but the mess should be cleaned up afterwards! Al
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2015-03-02 18:23 -0500 |
| Message-ID | <mailman.71.1425339347.13471.python-list@python.org> |
| In reply to | #86788 |
On 03/02/2015 05:40 PM, alb wrote:
> Hi Dave,
>
> Dave Angel <davea@davea.name> wrote:
> []
>>>> or use a raw string:
>>>>
>>>> i = r'\\ref{fig:abc}'
>>
>> Actually that'd be:
>> i = r'\ref{fig:abc}'
>
> Could you explain why I then see the following difference:
>
> In [56]: inp = r'\\ref{fig:abc}'
print inp
and you should get
\\ref{fig:abc}
>
> In [57]: print pypandoc.convert(inp, 'latex', format='rst')
> \textbackslash{}ref\{fig:abc\}
>
>
> In [58]: inp = r'\ref{fig:abc}'
print inp
and you should get
\ref{fig:abc}
This is NOT the same.
The rules are not arbitrary. They're quite necessary, and it's the same
for lots of different languages. When in a regular literal, the
backslash is an escape character that combines with the following
character. When in a raw literal, the backslash is a backslash, unless
it's at the end of the string, in which case it's not the end of the
string, it's an escaped quotation. (Or something. Just don't use
*trailing* backslash in a raw literal)
>
> In [59]: print pypandoc.convert(inp, 'latex', format='rst')
> ref\{fig:abc\}
>
> The two results are clearly *not* the same, even though the two inp
> /claim/ to be the same...
>
When I said backslashes are not special in data read from a file, I
should also say neither are quotes, or tabs, or anything else. Python
just reads them in, and stuffs them into a string object. Newlines are
special if you use readline(), but if you use read(), they're not
special either (except on MSDOS compatible variants, which use two bytes
for newline. Even there, if you read a file in "b" mode, they're not
special either.
So your code is going to mostly be getting strings from files, or from
calculations, and these backslashes won't be special. It's only in
*testing* that you usually deal with this literal stuff. Or in places
where the data is fixed, and hardcoded in the source.
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2015-03-02 14:37 +0000 |
| Message-ID | <mailman.41.1425307075.13471.python-list@python.org> |
| In reply to | #86735 |
On 2015-03-02 13:51, alb wrote:
> Hi Steven,
>
> Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
> []
>> Since \r is an escape character, that will give you carriage return followed
>> by "ef{fig:abc".
>>
>> The solution to that is to either escape the backslash:
>>
>> i = '\\ref{fig:abc}'
>>
>>
>> or use a raw string:
>>
>> i = r'\\ref{fig:abc}'
>
> ok, maybe I wasn't clear from the very beginning, but searching for a
> solution is a journey that takes time and patience.
>
> The worngly named variable i (as noted below), contains the *i*nput of
> my text which is supposed to be restructured text. The output is what
> pypandoc spits out after conversion:
>
> i = "\\begin{tag}{%s}{%s}\n %s\n \\end{tag}" % (some, restructured, text)
> o = pypandoc.convert(i, 'latex', format='rst')
>
> Now if i contains some inline text, i.e. text I do not want to convert
> in any other format, I need my text to be formatted accordingly in order
> to inject some escape symbols in i.
>
> Rst escapes with "\", but unfortunately python also uses "\" for escaping!
>
>>
>> Oh, by the way, "i" is normally a terrible variable name for a string. Not
>> only doesn't it explain what the variable is for, but there is a very
>> strong convention in programming circles (not just Python, but hundreds of
>> languages) that "i" is a generic variable name for an integer. Not a
>> string.
>
> I'm not in the position to argue about good practices, I simply found
> more appropriate to have i for input and o for output, considering they
> are used like this:
>
> i = "some string"
> o = pypandoc.convert(i, ...)
> f.write(o)
>
> with very little risk to cause misunderstanding.
>
>> Can you show what you are doing? Escaping the backslash with another
>> backslash does work:
>>
>> py> for c in '\\ref':
>> ... print(c, ord(c))
>> ...
>> \ 92
>> r 114
>> e 101
>> f 102
>>
>> so either you are doing something wrong, or the error lies elsewhere.
>
> As said above, the string is converted by pandoc first and then printed.
> At this point the escaping becomes tricky (at least to me).
>
> In [17]: inp = '\\ref{fig:abc}'
>
> In [18]: print pypandoc.convert(inp, 'latex', format='rst')
> ref\{fig:abc\}
>
Have you tried escaping the escape character by doubling the backslash?
inp = '\\\\ref{fig:abc}'
or:
inp = r'\\ref{fig:abc}'
[toc] | [prev] | [next] | [standalone]
| From | al.basili@gmail.com (alb) |
|---|---|
| Date | 2015-03-02 22:37 +0000 |
| Message-ID | <clk70gFeal8U2@mid.individual.net> |
| In reply to | #86738 |
Hi MRAB,
MRAB <python@mrabarnett.plus.com> wrote:
[]
> Have you tried escaping the escape character by doubling the backslash?
>
> inp = '\\\\ref{fig:abc}'
In [54]: inp = '\\\\ref{fig:abc}'
In [55]: print pypandoc.convert(inp, 'latex', format='rst')
\textbackslash{}ref\{fig:abc\}
the backslash is considered as literal text for latex and is escaped
with the appropriate command.
> or:
>
> inp = r'\\ref{fig:abc}'
>
In [56]: inp = r'\\ref{fig:abc}'
In [57]: print pypandoc.convert(inp, 'latex', format='rst')
\textbackslash{}ref\{fig:abc\}
same as above. The result I aim to would be:
In [BINGO]: print pypandoc.convert(inp, 'latex', format='rst')
\ref{fig:abc}
Al
[toc] | [prev] | [next] | [standalone]
| From | Gregory Ewing <greg.ewing@canterbury.ac.nz> |
|---|---|
| Date | 2015-03-03 19:40 +1300 |
| Message-ID | <cll3adFl0hpU1@mid.individual.net> |
| In reply to | #86787 |
alb wrote:
> The result I aim to would be:
>
> In [BINGO]: print pypandoc.convert(inp, 'latex', format='rst')
> \ref{fig:abc}
From a cursory reading of the pypandoc docs, it looks
like enabling the raw_tex extension in pypandoc will
give you what you want.
Search for raw_tex on this page:
http://johnmacfarlane.net/pandoc/README.html
--
Greg
[toc] | [prev] | [next] | [standalone]
| From | al.basili@gmail.com (alb) |
|---|---|
| Date | 2015-03-03 20:50 +0000 |
| Message-ID | <clml46F2fphU4@mid.individual.net> |
| In reply to | #86818 |
Hi Gregory,
Gregory Ewing <greg.ewing@canterbury.ac.nz> wrote:
[]
> From a cursory reading of the pypandoc docs, it looks
> like enabling the raw_tex extension in pypandoc will
> give you what you want.
>
> Search for raw_tex on this page:
>
> http://johnmacfarlane.net/pandoc/README.html
As far as I understood the docs, it seems this extension should be
passed to pandoc through +EXTERNSION, but I don't seem to get it
working:
In [14]: print pypandoc.convert(s, 'latex', format="md+raw_tex")
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-14-f41e67057a59> in <module>()
----> 1 print pypandoc.convert(s, 'latex', format="md+raw_tex")
/usr/local/lib/python2.7/dist-packages/pypandoc.pyc in convert(source, to, format, extra_args, encoding)
25 '''
26 return _convert(_read_file, _process_file, source, to,
---> 27 format, extra_args, encoding=encoding)
28
29
/usr/local/lib/python2.7/dist-packages/pypandoc.pyc in _convert(reader, processor, source, to, format, extra_args, encoding)
50 raise RuntimeError(
51 'Invalid input format! Expected one of these: ' +
---> 52 ', '.join(from_formats))
53
54 if to not in to_formats:
RuntimeError: Invalid input format! Expected one of these: native, json, markdown, markdown+lhs, rst, rst+lhs, docbook, textile, html, latex, latex+lhs
[toc] | [prev] | [next] | [standalone]
| From | Gregory Ewing <greg.ewing@canterbury.ac.nz> |
|---|---|
| Date | 2015-03-04 11:27 +1300 |
| Message-ID | <clmqqqF52b1U1@mid.individual.net> |
| In reply to | #86866 |
alb wrote: > RuntimeError: Invalid input format! Expected one of these: native, json, > markdown, markdown+lhs, rst, rst+lhs, docbook, textile, html, latex, > latex+lhs It looks like it's expecting the base format to be spelled "markdown", not abbreviated to "md". (The python wrapper expands "md" to "markdown", but not if it's followed by any + or - options.) So try: pypandoc.convert(s, 'latex', format="markdown+raw_tex") BTW, I just installed pandoc on MacOSX to try this out, and it seems that raw_tex is enabled by default for me -- I have to turn it *off* with format="markdown-raw_tex" in order to get the behaviour you're seeing. Maybe a different version? My pandoc says it's version 1.12.0.1. -- Greg
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2015-03-02 14:40 +0000 |
| Message-ID | <mailman.42.1425307205.13471.python-list@python.org> |
| In reply to | #86735 |
On 2015-03-02 14:08, Dave Angel wrote: > On 03/02/2015 08:51 AM, alb wrote: >> Hi Steven, >> >> Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: >>> [snip] >>> Oh, by the way, "i" is normally a terrible variable name for a string. Not >>> only doesn't it explain what the variable is for, but there is a very >>> strong convention in programming circles (not just Python, but hundreds of >>> languages) that "i" is a generic variable name for an integer. Not a >>> string. >> >> I'm not in the position to argue about good practices, I simply found >> more appropriate to have i for input and o for output, considering they >> are used like this: >> >> i = "some string" >> o = pypandoc.convert(i, ...) >> f.write(o) >> >> with very little risk to cause misunderstanding. > > How about "in" and "out"? Or perhaps some name that indicates what > semantics the string represents, like "rst_string" and "html_string" > or whatever they actually are? > [snip] "in" is a reserved word, but "in_" would be OK.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2015-03-03 02:09 +1100 |
| Message-ID | <54f47d1d$0$12984$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #86735 |
alb wrote:
> Hi Steven,
>
> Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
> []
>> Since \r is an escape character, that will give you carriage return
>> followed by "ef{fig:abc".
>>
>> The solution to that is to either escape the backslash:
>>
>> i = '\\ref{fig:abc}'
>>
>>
>> or use a raw string:
>>
>> i = r'\\ref{fig:abc}'
Dave has corrected my typo in the above: it should be r'\ref', the whole
point of raw strings is that you don't need to escape the backslashes.
> ok, maybe I wasn't clear from the very beginning, but searching for a
> solution is a journey that takes time and patience.
>
> The worngly named variable i (as noted below), contains the *i*nput of
> my text which is supposed to be restructured text. The output is what
> pypandoc spits out after conversion:
Ah, well that's not a bad convention for small utility functions, but I
wouldn't want single-letter names to be used in anything bigger than, say,
a dozen lines. Having i for input and o for output right next to each other
helps too. But you're still swimming against the convention that i means an
integer. Whether you decide it is worth going against that convention in
your own code is up to you, but when asking for help, it is worth your
while to be the least surprising or different as you can manage.
> i = "\\begin{tag}{%s}{%s}\n %s\n \\end{tag}" % (some, restructured, text)
> o = pypandoc.convert(i, 'latex', format='rst')
>
> Now if i contains some inline text, i.e. text I do not want to convert
> in any other format, I need my text to be formatted accordingly in order
> to inject some escape symbols in i.
>
> Rst escapes with "\", but unfortunately python also uses "\" for escaping!
Yes, but only in string literals. In Python source code, "\r" makes a
carriage return, but when reading from the keyboard (say, using the
raw_input function), from a file, or anything other than a string literal,
a string consisting of "\r" is just backslash-r.
So, worst case, you can always assemble your strings like this:
backslash = chr(92)
i = (backslash + "begin{tag}{%s}{%s}\n %s\n " + backslash + "end{tag}"
% (some, restructured, text))
although that is a PITA.
I recommend using raw triple strings, and avoid needing \n escapes:
i = r"""\begin{tag}{%s}{%s}
%s
\end{tag}""" % (some, restructured, text)
>> Can you show what you are doing? Escaping the backslash with another
>> backslash does work:
>>
>> py> for c in '\\ref':
>> ... print(c, ord(c))
>> ...
>> \ 92
>> r 114
>> e 101
>> f 102
>>
>> so either you are doing something wrong, or the error lies elsewhere.
>
> As said above, the string is converted by pandoc first and then printed.
> At this point the escaping becomes tricky (at least to me).
>
> In [17]: inp = '\\ref{fig:abc}'
If you print inp at this point, you should see that it contains exactly what
you expect: backslash, R E F etc.
> In [18]: print pypandoc.convert(inp, 'latex', format='rst')
> ref\{fig:abc\}
and now the backslash is gone, and the braces are escaped. This suggests
that the problems lies with pypandoc. Perhaps you need to add extra
backslashes, so that pypandoc will convert a double-backslash to a single
one. Consult your pypandoc documentation, and try this:
inp = '\\\\ref{fig:abc}' # That's FOUR backslashes, to get \\
# or as a raw-string:
inp = '\\ref{fig:abc}'
assert inp[0] == inp[1] == chr(92)
out = pypandoc.convert(inp, 'latex', format='rst')
print out, out == r"\ref\{fig:abc\}"
--
Steven
[toc] | [prev] | [standalone]
Page 2 of 2 — ← Prev page 1 [2]
Back to top | Article view | comp.lang.python
csiph-web