PHP Domdocument Not Displaying Cyrillic Characters Properly

Case: After loading an HTML doc as a string in PHP DOMDoc, the output of any selected node is similar to “Европейска столица“. Source file is UTF-8 encoded.

Solution

DOMDocument::LoadHTML assumes ISO-8859-1 encoding for any string passed to it.

If the string does not contain an XML encoding declaration, one can be prepended:

$dom = new DOMDocument();
@$dom->loadHTML('<?xml encoding="utf-8" ?>' . $txt);

DOMDocument is such a mess…

Ref

  1. https://www.php.net/manual/en/domdocument.loadhtml.php#95251
  2. https://stackoverflow.com/questions/8218230/php-domdocument-loadhtml-not-encoding-utf-8-correctly
  3. https://stackoverflow.com/questions/47397559/php-domdocument-savehtml-not-encoding-cyrillic-correctly

Was this post helpful?