Read HTML - ThomasWeinert/FluentDOM GitHub Wiki

Read HTML

FluentDOM extends the DOM classes. Like SimpleXML, FluentDOM implements a lot of PHP magic functions and interfaces. Unlike SimpleXML, it does not hide the DOM API. It just provides a lot of nice shortcuts. Additionally FluentDOM provides a static class as a starting point.

As a first example let's load some HTML and extract all link URLs and captions.

$htmlFile = 'sample.html';
$links = [];

$document = FluentDOM::load(
  $htmlFile, 
  'text/html', 
  [FluentDOM\Loader\Options::ALLOW_FILE => TRUE]
);
foreach ($document('//a[@href]') as $a) {
  $links[] = [  
    'caption' => (string)$a,
    'href' => $a['href']
  ];
}

var_dump($links);

FluentDOM::load()

The static function FluentDOM::load() loads a given source and returns an FluentDOM\DOM\Document. It can load different formats (depending on the installed plugins). As a security measure the standard loaders will not load files without an option - only strings.

$document('//a[@href]')

FluentDOM\DOM\Document and the other extended node classes implement the magic method __invoke(). So they can be used like functions. This will execute a Xpath expression with the node as the context.

(string)$a

The extended node classes can be cast to string. This will return all text context of the node. It includes the text content of descendant nodes.

$a['href']

FluentDOM\DOM\Element (element nodes) implement ArrayAccess. So you can use array syntax. A string offset will access the attribute value, an integer offset a child node.