PHP Simple HTML DOM Parser doesn’t work anymore – preg_match_all() Warning

Your script which uses the PHP Simple HTML DOM Parser doesn’t work anymore? Probably you or your webhoster have updated the PHP version. If you have enabled PHP error reporting, you should get the following warning:

Warning: preg_match(): Compilation failed: invalid range in character class at offset 4 in D:\xampp\htdocs\test\simple_html_dom.php on line 1364

and for line 684 as well

Warning: preg_match_all(): Compilation failed: invalid range in character class at offset 4 in D:\xampp\htdocs\test\simple_html_dom.php on line 684

But what’s wrong with the preg_macht_all(), why does it not work anymore? 

The answer is simple: In PHP 7.3 or higher the PHP PCRE engine migrates to PCRE2. And 

PCRE2 is more strict in the pattern validations, so the pattern in line 684 and 1364 could not compile anymore. 

But not only is the answer to the problem simple, so is the solution 🙂  You only have to escape (with a backslash) the hyphen after the ‘w’.

Go in your simple_html_dom.php file, search line 1364 and replace

if (!preg_match("/^[\w-:]+$/", $tag)) {

with

if (!preg_match("/^[\w\-:]+$/", $tag)) {

also the same for the pattern in line 683, replace

$pattern = "/([\w-:\*]*)(?:\#([\w-]+)|\.([\w-]+))?(?:\[@?(!?[\w-:]+)(?:([!*^$]?=)[\"']?(.*?)[\"']?)?\])?([\/, ]+)/is";

with

 $pattern = "/([\w\-:\*]*)(?:\#([\w\-]+)|\.([\w\-]+))?(?:\[@?(!?[\w\-:]+)(?:([!*^$]?=)[\"']?(.*?)[\"']?)?\])?([\/, ]+)/is";

Four changes are needed here.

Now your script with the PHP Simple HTML DOM Parser should work again.

Leave a Comment