Perturb.org - interesting technology related things from around the internet

XPath is the coolest thing ever2006-04-10

XPath is a method for search through XML/HTML/XHTML and finding specific nodes in the hierarchy. Once you have the specific node you can pull out specific attributes of that node and use them later. It's extremely powerful in that you do not need to know much about the structure of the source document. I've been able to implement in both PHP and Perl quite easily. Here's some decent documentation on how to craft your XPath queries.

// Create the new DOMM object
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;

// Load the URL into the dom and attach it to the XPATH object
$url = "http://www.npr.org/rss/podcast.php?id=4538138";
$dom->Load($url);
$xpath = new DOMXPath($dom);

// Create and query the document for all the enclosure nodes
$query = "//item/enclosure";
$nodes = $xpath->query($query);

foreach ($nodes as $item) {
	print $item->getAttribute('url') . "
";
}

use strict;
use LWP::Simple;
use XML::XPath;
use Data::Dumper;

# Get the content at the URL
my $url = "http://www.npr.org/rss/podcast.php?id=4538138";
my $content = get($url);

# Create the XPath object from the XHMTL/XML we just got
my $xp = XML::XPath->new($content);
# my $xp = XML::XPath->new('filename' => "/tmp/foo.html");

# Create and run the query to get the appropriate nodes
my $query = "/body/font/a";
my $nodeset = $xp->find($query);

# Loop through and output all the href attributes
foreach my $node ($nodeset->get_nodelist) {
	print $node->getAttribute('href') . "\n";
}