Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.ruby > #2137 > unrolled thread

Reading XML to relational tables

Started byTed Flethuseo <flethuseo@gmail.com>
First post2011-04-01 17:47 -0500
Last post2011-04-11 02:56 -0500
Articles 4 — 2 participants

Back to article view | Back to comp.lang.ruby


Contents

  Reading XML to relational tables Ted Flethuseo <flethuseo@gmail.com> - 2011-04-01 17:47 -0500
    Re: Reading XML to relational tables Jesús Gabriel y Galán <jgabrielygalan@gmail.com> - 2011-04-01 18:02 -0500
    Re: Reading XML to relational tables Ted Flethuseo <flethuseo@gmail.com> - 2011-04-09 16:39 -0500
      Re: Reading XML to relational tables Jesús Gabriel y Galán <jgabrielygalan@gmail.com> - 2011-04-11 02:56 -0500

#2137 — Reading XML to relational tables

FromTed Flethuseo <flethuseo@gmail.com>
Date2011-04-01 17:47 -0500
SubjectReading XML to relational tables
Message-ID<236f02bc699a9c809bdec25e00a2d0f2@ruby-forum.com>
Hi everyone,

I need to build 3 relational tables from an xml text. In this tables, I
need to keep track of words that have the <emph> and <bold> tags in them
along with the
word mentioned and its count in the <p> tag. This is easier to
illustrate with an example:

I need to take this text:

<p> My name is <strong>Ted</strong>, and I like <emph>coffee</emph>.
<strong>Ted</strong> does not like tea. </p>
<p> I have a brother who likes <emph>tea</emph> but does not like
<emph>coffee</emph> </p>

To 3 normalized tables like this:

..p_table...
p_id    desc
1       My name is....
2       I have a ....


..p_to_emph_table...
p_id    e_id    count
1       2       1
2       1       1
2       2       1


..emph_table...
e_id    emph_word
1       Tea
2       Coffee

I am not sure what would be the best approach to parse this xml with
ruby or what tool
could help me do this efficiently?

Any ideas appreciated,

Ted.

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [next] | [standalone]


#2139

FromJesús Gabriel y Galán <jgabrielygalan@gmail.com>
Date2011-04-01 18:02 -0500
Message-ID<AANLkTi=XGgyLCTW7a5ktUtjOAg90zOmH__dtRXkMJ+mi@mail.gmail.com>
In reply to#2137
On Sat, Apr 2, 2011 at 12:47 AM, Ted Flethuseo <flethuseo@gmail.com> wrote:
> Hi everyone,
>
> I need to build 3 relational tables from an xml text. In this tables, I
> need to keep track of words that have the <emph> and <bold> tags in them
> along with the
> word mentioned and its count in the <p> tag. This is easier to
> illustrate with an example:
>
> I need to take this text:
>
> <p> My name is <strong>Ted</strong>, and I like <emph>coffee</emph>.
> <strong>Ted</strong> does not like tea. </p>
> <p> I have a brother who likes <emph>tea</emph> but does not like
> <emph>coffee</emph> </p>
>
> To 3 normalized tables like this:
>
> ...p_table...
> p_id    desc
> 1       My name is....
> 2       I have a ....
>
>
> ...p_to_emph_table...
> p_id    e_id    count
> 1       2       1
> 2       1       1
> 2       2       1
>
>
> ...emph_table...
> e_id    emph_word
> 1       Tea
> 2       Coffee
>
> I am not sure what would be the best approach to parse this xml with
> ruby or what tool
> could help me do this efficiently?

What I'd do is parse the XML (use Nokogiri, for example) and get all p
elements. For each p element, insert it into p_table if not present
and get its id. Look at all emph inside the p element, and for each of
them:
- Check if the word is already in emph_table and get the id or
- Insert it into emph_table and get the id

With that id, insert or update a row in the p_to_emph_table with the p
and the word id.

This is a straightforward approach that should work. Make a try (ask
any question that blocks you) and let us know how it goes.

Jesus.

[toc] | [prev] | [next] | [standalone]


#2586

FromTed Flethuseo <flethuseo@gmail.com>
Date2011-04-09 16:39 -0500
Message-ID<56aa29ce30e3b353f2096060f9e338ff@ruby-forum.com>
In reply to#2137
Hi Jesus,

Thank you for your help. Right now I am stuck trying to traverse the 
elements in a single xml::element. I know I can use this elements method 
to list the elements, but I am not sure how
I can traverse through them and get their contents individually.

xml = File.read('translateXML.xml')
doc = Nokogiri::XML(xml)

# split into sentences first
arr = doc.search('p')

puts arr[0].elements

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]


#2614

FromJesús Gabriel y Galán <jgabrielygalan@gmail.com>
Date2011-04-11 02:56 -0500
Message-ID<BANLkTikP24E9kmp45hRNTy95OY1Npz42TQ@mail.gmail.com>
In reply to#2586
On Sat, Apr 9, 2011 at 11:39 PM, Ted Flethuseo <flethuseo@gmail.com> wrote:
> Hi Jesus,
>
> Thank you for your help. Right now I am stuck trying to traverse the
> elements in a single xml::element. I know I can use this elements method
> to list the elements, but I am not sure how
> I can traverse through them and get their contents individually.
>
> xml = File.read('translateXML.xml')
> doc = Nokogiri::XML(xml)
>
> # split into sentences first
> arr = doc.search('p')

Try something like:

require 'nokogiri'

doc = Nokogiri::XML(File.read("p.xml"))
doc.search("p").each do |p_element|
  puts "---------"
  puts p_element.text
  p_element.css("emph,strong").each do |emph|
    puts "Highlighted: #{emph.text}"
  end
end

Jesus.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.ruby


csiph-web