Read HTML - ThomasWeinert/FluentDOM GitHub Wiki
Read HTML
FluentDOM extends the DOM classes. Like SimpleXML, FluentDOM implements a lot of PHP magic functions and interfaces. Unlike SimpleXML, it does not hide the DOM API. It just provides a lot of nice shortcuts. Additionally FluentDOM provides a static class as a starting point.
As a first example let's load some HTML and extract all link URLs and captions.
$htmlFile = 'sample.html';
$links = [];
$document = FluentDOM::load(
$htmlFile,
'text/html',
[FluentDOM\Loader\Options::ALLOW_FILE => TRUE]
);
foreach ($document('//a[@href]') as $a) {
$links[] = [
'caption' => (string)$a,
'href' => $a['href']
];
}
var_dump($links);
FluentDOM::load()
The static function FluentDOM::load()
loads a given source and returns an FluentDOM\DOM\Document
. It can load different formats (depending on the installed plugins). As a security measure the standard loaders will not load files without an option - only strings.
$document('//a[@href]')
FluentDOM\DOM\Document
and the other extended node classes implement the magic method __invoke()
. So they can be used like functions. This will execute a Xpath expression with the node as the context.
(string)$a
The extended node classes can be cast to string. This will return all text context of the node. It includes the text content of descendant nodes.
$a['href']
FluentDOM\DOM\Element
(element nodes) implement ArrayAccess
. So you can use array syntax. A string offset will access the attribute value, an integer offset a child node.