Quote: "During the last week I was performing some audits and like so often it contained preg_match() filters that were not correct. Most PHP developers use ^ and $ within their regular expressions without actually reading the documentation about what they really achieve. You will find a lot of input filters like the following one.
Quite common way to filter incoming data, isn't it?
However the problem is, that the author of such a regular expression did not correctly read the documentation and mistakes the $ character for the definitive end of the subject. However the real meaning, as it is even documented in the PHP manual is that $ means the end of the subject OR not the real end but nearly, only followed by a single '\n' linebreak. This means that the following request will also pass the filter.
In several circumstances a newline character can be dangerous. For example when you want to stop HTTP Response Splitting or Email Injection attacks. To correct the above regular expression it is necessary to add the D modifier to it that changes the meaning of the $ specifier to really mean the end of the subject. Here is the corrected example.
I hope this tip helps getting rid of all these wrong filters once and for all. People using ext/filter should prepare for a recompile, too."
Holes in most preg_match() filters - PHP Security Blog
And i did. The most common filters that i saw was like:
... where the author obviously wants to filter out only the letters, numbers, dot and dash, but this pattern is doing complete opposite - it passes every symbol that is not letter or number.
It should be like:
Using Regular Expressions with PHP
PHP regular expression tutorial
PHP Code:
<?php
$clean = array();
if (preg_match("/^[0-9]+:[X-Z]+$/", $_GET['var'])) {
$clean['var'] = $_GET['var'];
}
?>
However the problem is, that the author of such a regular expression did not correctly read the documentation and mistakes the $ character for the definitive end of the subject. However the real meaning, as it is even documented in the PHP manual is that $ means the end of the subject OR not the real end but nearly, only followed by a single '\n' linebreak. This means that the following request will also pass the filter.
Code:
http://server.tld/index.php?var=012345:XYZ%0a
PHP Code:
<?php
$clean = array();
if (preg_match("/^[0-9]+:[X-Z]+$/D", $_GET['var'])) {
$clean['var'] = $_GET['var'];
}
?>
Holes in most preg_match() filters - PHP Security Blog
And i did. The most common filters that i saw was like:
PHP Code:
preg_match('/[^a-z0-9-.]+/', $var)
It should be like:
PHP Code:
preg_match('/^[a-z0-9-.]+$/D', $var)
Code:
Regular Expression Will match... foo The string "foo" ^foo "foo" at the start of a string foo$ "foo" at the end of a string ^foo$ "foo" when it is alone on a string [abc] a, b, or c [a-z] Any lowercase letter [^A-Z] Any character that is not a uppercase letter (gif|jpg) Matches either "gif" or "jpeg" [a-z]+ One or more lowercase letters [0-9\.\-] ?ny number, dot, or minus sign ^[a-zA-Z0-9_]{1,}$ Any word of at least one letter, number or _ ([wx])([yz]) wy, wz, xy, or xz [^A-Za-z0-9] Any symbol (not a number or a letter) ([A-Z]{3}|[0-9]{4}) Matches three letters or four numbers
Code:
Regular expression (pattern) Match (subject) Not match (subject) Comment world Hello world Hello Jim Match if the pattern is present anywhere in the subject ^world world class Hello world Match if the pattern is present at the beginning of the subject world$ Hello world world class Match if the pattern is present at the end of the subject world/i This WoRLd Hello Jim Makes a search in case insensitive mode ^world$ world Hello world The string contains only the "world" world* worl, world, worlddd wor There is 0 or more "d" after "worl" world+ world, worlddd worl There is at least 1 "d" after "worl" world? worl, world, worly wor, wory There is 0 or 1 "d" after "worl" world{1} world worly There is 1 "d" after "worl" world{1,} world, worlddd worly There is 1 ore more "d" after "worl" world{2,3} worldd, worlddd world There are 2 or 3 "d" after "worl" wo(rld)* wo, world, worldold wa There is 0 or more "rld" after "wo" earth|world earth, world sun The string contains the "earth" or the "world" w.rld world, wwrld wrld Any character in place of the dot. ^.{5}$ world, earth sun A string with exactly 5 characters [abc] abc, bbaccc sun There is an "a" or "b" or "c" in the string [a-z] world WORLD There are any lowercase letter in the string [a-zA-Z] world, WORLD, Worl12 123 There are any lower- or uppercase letter in the string [^wW] earth w, W The actual character can not be a "w" or "W"
Comment