Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.ruby > #7159

Mechanize decoding HTML entities twice

Newsgroups comp.lang.ruby
Date 2015-12-09 15:32 -0800
Message-ID <16fa9410-09ed-4108-9d37-48c3ff61652f@googlegroups.com> (permalink)
Subject Mechanize decoding HTML entities twice
From anders@sjoqvi.st

Show all headers | View raw


Hi,

I must have somehow misconfigured Mechanize, because it decodes entities twice. For example, the program

#!/usr/bin/env ruby                                                                                         
require 'mechanize'
html = "<!DOCTYPE html>\n"\
       "<html>\n"\
       " <head><title>foobar</title></head>\n"\
       " <body>\n"\
       "  <form>\n"\
       "   <input type='text' name='text1' value='&quot;' />\n"\
       "   <input type='text' name='text2' value='&amp;quot;' />\n"\
       "   <input type='text' name='text3' value='&amp;amp;quot;' />\n"\
       "  </form>\n"\
       " </body>\n"\
       "</html>"
agent = Mechanize.new
page = Mechanize::Page.new(nil, {'Content-Type' => 'text/html'}, html, nil, agent)
p page.forms.first.fields.map { |t| t.value }

prints

["\"", "\"", "&quot;"]

although I would have expected

["\"", "&quot;", "&amp;quot;"]

Could anyone tell me what I'm doing wrong?

Thanks!

Back to comp.lang.ruby | Previous | Next | Find similar


Thread

Mechanize decoding HTML entities twice anders@sjoqvi.st - 2015-12-09 15:32 -0800

csiph-web