DOMDocument - markhowellsmead/helpers GitHub Wiki

Append HTML string as node to existing document

Since 9th January 2025, the appendHTML method no longer converts the incoming HTML to UTF-8. It's assumed that the string already uses this encoding.

<?php

namespace SayHello\Theme\Package;

use DOMDocument as GlobalDOMDocument;
use DOMNode;

/**
 * DomDocument stuff
 *
 * @author Say Hello GmbH <[email protected]>
 */
class DomDocument
{
	/**
	 * Helper function to allow easy adding an
	 * HTML string to the parent as a child node.
	 *
	 * @param DOMNode $parent
	 * @param string $source
	 * @return void
	 */
	public function appendHTML(DOMNode $parent, string $html)
	{
		$document = new GlobalDOMDocument();
		$document->loadHTML($html);
		foreach ($document->getElementsByTagName('body')->item(0)->childNodes as $node) {
			$node = $parent->ownerDocument->importNode($node, true);
			$parent->appendChild($node);
		}
	}
}

Revising the content of a block's HTML

This version is from 9th January 2025. I added LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD so that the DomDocument's HTML doesn't contain artificially-added html and body tags, which makes the parsing and return a bit cleaner.

When using this method, make sure that the node search is correct when using $document->documentElement (in reference to the root element).

When returning the HTML in the event that it has been changed, I've ensured that I remove the prefix '<?xml encoding="UTF-8">' (which I add in order to ensure maintained correct encoding) from the returned string. This isn't strictly necessary, but ensures that the source code remains clean.

Encoding

Finally, I removed the former method to convert the encoding of the HTML to UTF-8. My projects always use UTF-8, so I've ensured that the function simply doesn't run on HTML strings which are using a different encoding.

<?php

namespace PT\MustUse\Blocks\CoreImage;

use DOMDocument;
use DOMXPath;

class Block
{
	public function run()
	{
		add_filter('render_block_core/image', [$this, 'render'], 10, 2);
	}

	public function render($html, $block)
	{

		if (empty($html) || !mb_detect_encoding($html, 'UTF-8', true)) {
			return $html;
		}

		if (strpos($block['attrs']['className'] ?? '', 'is-style-webcam') === false) {
			return $html;
		}

		libxml_use_internal_errors(true);
		$document = new DOMDocument();
		$document->loadHTML('<?xml encoding="UTF-8">' . $html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

		$xpath = new DOMXPath($document);
		$nodeList = $xpath->query('//img');

		foreach ($nodeList as $node) {
			$new_src = $node->getAttribute('src') . (parse_url($node->getAttribute('src'), PHP_URL_QUERY) ? '&' : '?') . 'force=' . rand(1, 1000000);
			$node->setAttribute('src', $new_src);
		}

		libxml_clear_errors();

		return str_replace('<?xml encoding="UTF-8">','', $document->saveHTML());
	}
}