Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.ruby > #2137 > unrolled thread
| Started by | Ted Flethuseo <flethuseo@gmail.com> |
|---|---|
| First post | 2011-04-01 17:47 -0500 |
| Last post | 2011-04-11 02:56 -0500 |
| Articles | 4 — 2 participants |
Back to article view | Back to comp.lang.ruby
Reading XML to relational tables Ted Flethuseo <flethuseo@gmail.com> - 2011-04-01 17:47 -0500
Re: Reading XML to relational tables Jesús Gabriel y Galán <jgabrielygalan@gmail.com> - 2011-04-01 18:02 -0500
Re: Reading XML to relational tables Ted Flethuseo <flethuseo@gmail.com> - 2011-04-09 16:39 -0500
Re: Reading XML to relational tables Jesús Gabriel y Galán <jgabrielygalan@gmail.com> - 2011-04-11 02:56 -0500
| From | Ted Flethuseo <flethuseo@gmail.com> |
|---|---|
| Date | 2011-04-01 17:47 -0500 |
| Subject | Reading XML to relational tables |
| Message-ID | <236f02bc699a9c809bdec25e00a2d0f2@ruby-forum.com> |
Hi everyone, I need to build 3 relational tables from an xml text. In this tables, I need to keep track of words that have the <emph> and <bold> tags in them along with the word mentioned and its count in the <p> tag. This is easier to illustrate with an example: I need to take this text: <p> My name is <strong>Ted</strong>, and I like <emph>coffee</emph>. <strong>Ted</strong> does not like tea. </p> <p> I have a brother who likes <emph>tea</emph> but does not like <emph>coffee</emph> </p> To 3 normalized tables like this: ..p_table... p_id desc 1 My name is.... 2 I have a .... ..p_to_emph_table... p_id e_id count 1 2 1 2 1 1 2 2 1 ..emph_table... e_id emph_word 1 Tea 2 Coffee I am not sure what would be the best approach to parse this xml with ruby or what tool could help me do this efficiently? Any ideas appreciated, Ted. -- Posted via http://www.ruby-forum.com/.
[toc] | [next] | [standalone]
| From | Jesús Gabriel y Galán <jgabrielygalan@gmail.com> |
|---|---|
| Date | 2011-04-01 18:02 -0500 |
| Message-ID | <AANLkTi=XGgyLCTW7a5ktUtjOAg90zOmH__dtRXkMJ+mi@mail.gmail.com> |
| In reply to | #2137 |
On Sat, Apr 2, 2011 at 12:47 AM, Ted Flethuseo <flethuseo@gmail.com> wrote: > Hi everyone, > > I need to build 3 relational tables from an xml text. In this tables, I > need to keep track of words that have the <emph> and <bold> tags in them > along with the > word mentioned and its count in the <p> tag. This is easier to > illustrate with an example: > > I need to take this text: > > <p> My name is <strong>Ted</strong>, and I like <emph>coffee</emph>. > <strong>Ted</strong> does not like tea. </p> > <p> I have a brother who likes <emph>tea</emph> but does not like > <emph>coffee</emph> </p> > > To 3 normalized tables like this: > > ...p_table... > p_id desc > 1 My name is.... > 2 I have a .... > > > ...p_to_emph_table... > p_id e_id count > 1 2 1 > 2 1 1 > 2 2 1 > > > ...emph_table... > e_id emph_word > 1 Tea > 2 Coffee > > I am not sure what would be the best approach to parse this xml with > ruby or what tool > could help me do this efficiently? What I'd do is parse the XML (use Nokogiri, for example) and get all p elements. For each p element, insert it into p_table if not present and get its id. Look at all emph inside the p element, and for each of them: - Check if the word is already in emph_table and get the id or - Insert it into emph_table and get the id With that id, insert or update a row in the p_to_emph_table with the p and the word id. This is a straightforward approach that should work. Make a try (ask any question that blocks you) and let us know how it goes. Jesus.
[toc] | [prev] | [next] | [standalone]
| From | Ted Flethuseo <flethuseo@gmail.com> |
|---|---|
| Date | 2011-04-09 16:39 -0500 |
| Message-ID | <56aa29ce30e3b353f2096060f9e338ff@ruby-forum.com> |
| In reply to | #2137 |
Hi Jesus,
Thank you for your help. Right now I am stuck trying to traverse the
elements in a single xml::element. I know I can use this elements method
to list the elements, but I am not sure how
I can traverse through them and get their contents individually.
xml = File.read('translateXML.xml')
doc = Nokogiri::XML(xml)
# split into sentences first
arr = doc.search('p')
puts arr[0].elements
--
Posted via http://www.ruby-forum.com/.
[toc] | [prev] | [next] | [standalone]
| From | Jesús Gabriel y Galán <jgabrielygalan@gmail.com> |
|---|---|
| Date | 2011-04-11 02:56 -0500 |
| Message-ID | <BANLkTikP24E9kmp45hRNTy95OY1Npz42TQ@mail.gmail.com> |
| In reply to | #2586 |
On Sat, Apr 9, 2011 at 11:39 PM, Ted Flethuseo <flethuseo@gmail.com> wrote:
> Hi Jesus,
>
> Thank you for your help. Right now I am stuck trying to traverse the
> elements in a single xml::element. I know I can use this elements method
> to list the elements, but I am not sure how
> I can traverse through them and get their contents individually.
>
> xml = File.read('translateXML.xml')
> doc = Nokogiri::XML(xml)
>
> # split into sentences first
> arr = doc.search('p')
Try something like:
require 'nokogiri'
doc = Nokogiri::XML(File.read("p.xml"))
doc.search("p").each do |p_element|
puts "---------"
puts p_element.text
p_element.css("emph,strong").each do |emph|
puts "Highlighted: #{emph.text}"
end
end
Jesus.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.ruby
csiph-web