Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.text.xml > #783

Re: Extract values from xml file using xpath/namespace/simplexml/php

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From Peter Flynn <peter@silmaril.ie>
Newsgroups comp.text.xml
Subject Re: Extract values from xml file using xpath/namespace/simplexml/php
Date Sun, 24 Aug 2014 22:22:33 +0100
Lines 71
Message-ID <c5v3cpFdbkU1@mid.individual.net> (permalink)
References <3f75d978-fb96-4f8b-a3f1-058a94d56c45@googlegroups.com>
Mime-Version 1.0
Content-Type text/plain; charset=ISO-8859-1
Content-Transfer-Encoding 7bit
X-Trace individual.net I/MsP0tdbeWclVRsT+t2awhiF7S7gwWMeZK7RR+rXZpy1GHInz
Cancel-Lock sha1:qytOuHVm/2ouhWD1XOa7ybzvLBs=
User-Agent Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.6.0
In-Reply-To <3f75d978-fb96-4f8b-a3f1-058a94d56c45@googlegroups.com>
X-Enigmail-Version 1.5.2
Xref csiph.com comp.text.xml:783

Show key headers only | View raw


On 08/13/2014 11:22 AM, ofuuzo@gmail.com wrote:
> Hi,
> I am new in XML. I can't figure out how to combine
> xpath/namespace/simplexml/php to extract the values of "dc.title" and
"dc.date" in this xml file:

> <?xml version="1.0" encoding="UTF-8"?>

Tip: always post a complete, well-formed instance, not one with the
bottom snipped off the file, especially when it's in an uncommon vocaulary.

Given your sample instance...

<OAI-PMH
  xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
  http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"
  xmlns="http://www.openarchives.org/OAI/2.0/"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <ListRecords>
    <record>
      <metadata
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/">
	<oai_dc:dc
	  xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/">
	  <dc:title>GAFBone</dc:title>
	  <dc:date>2014-05-01T00:00:00Z</dc:date>
	</oai_dc:dc>
      </metadata>
    </record>
  </ListRecords>
</OAI-PMH>

This XSLT2 script will extract the title and date:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:oai="http://www.openarchives.org/OAI/2.0/"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
                version="2.0">

  <xsl:output method="text"/>

  <xsl:template match="/">
    <xsl:value-of
	select="oai:OAI-PMH/oai:ListRecords/oai:record/oai:metadata"/>
  </xsl:template>

  <xsl:template match="oai:metadata">
    <xsl:value-of select="oai_dc:dc/dc:title"/>
    <xsl:value-of select="oai_dc:dc/dc:date"/>
  </xsl:template>

</xsl:stylesheet>

The trick is that you have to write the XPaths using the namespaces
which are in effect in the document. A validating parser such as rxp
will let you check what namespaces are in effect on which elements.

Alternatively, use a tool which lets you omit the default namespace,
such as lxprintf, eg (assuming your document is test.xml):

$ lxprintf -e oai_dc:dc "%s\t%s\n" dc:title dc:date test.xml
GAFBone	2014-05-01T00:00:00Z

How to do the equivalent in PHP is left as an exercise to the reader.

///Peter

Back to comp.text.xml | Previous | NextPrevious in thread | Find similar


Thread

Extract  values from xml file using xpath/namespace/simplexml/php ofuuzo@gmail.com - 2014-08-13 03:22 -0700
  Re: Extract  values from xml file using xpath/namespace/simplexml/php Peter Flynn <peter@silmaril.ie> - 2014-08-24 22:22 +0100

csiph-web