Holes in most preg_match() filters

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Holes in most preg_match() filters

    Quote: "During the last week I was performing some audits and like so often it contained preg_match() filters that were not correct. Most PHP developers use ^ and $ within their regular expressions without actually reading the documentation about what they really achieve. You will find a lot of input filters like the following one.
    PHP Code:
    <?php
       $clean 
    = array();
       if (
    preg_match("/^[0-9]+:[X-Z]+$/"$_GET['var'])) {
          
    $clean['var'] = $_GET['var'];
       }
    ?>
    Quite common way to filter incoming data, isn't it?

    However the problem is, that the author of such a regular expression did not correctly read the documentation and mistakes the $ character for the definitive end of the subject. However the real meaning, as it is even documented in the PHP manual is that $ means the end of the subject OR not the real end but nearly, only followed by a single '\n' linebreak. This means that the following request will also pass the filter.
    Code:
    http://server.tld/index.php?var=012345:XYZ%0a
    In several circumstances a newline character can be dangerous. For example when you want to stop HTTP Response Splitting or Email Injection attacks. To correct the above regular expression it is necessary to add the D modifier to it that changes the meaning of the $ specifier to really mean the end of the subject. Here is the corrected example.
    PHP Code:
    <?php
       $clean 
    = array();
       if (
    preg_match("/^[0-9]+:[X-Z]+$/D"$_GET['var'])) {
          
    $clean['var'] = $_GET['var'];
       }
    ?>
    I hope this tip helps getting rid of all these wrong filters once and for all. People using ext/filter should prepare for a recompile, too."

    Holes in most preg_match() filters - PHP Security Blog

    And i did. The most common filters that i saw was like:

    PHP Code:
    preg_match('/[^a-z0-9-.]+/'$var
    ... where the author obviously wants to filter out only the letters, numbers, dot and dash, but this pattern is doing complete opposite - it passes every symbol that is not letter or number.

    It should be like:

    PHP Code:
    preg_match('/^[a-z0-9-.]+$/D'$var
    Code:
    Regular Expression	Will match...
    foo	The string "foo"
    ^foo	"foo" at the start of a string
    foo$	"foo" at the end of a string
    ^foo$	"foo" when it is alone on a string
    [abc]	a, b, or c
    [a-z]	Any lowercase letter
    [^A-Z]	Any character that is not a uppercase letter
    (gif|jpg)	Matches either "gif" or "jpeg"
    [a-z]+	One or more lowercase letters
    [0-9\.\-]	?ny number, dot, or minus sign
    ^[a-zA-Z0-9_]{1,}$	Any word of at least one letter, number or _
    ([wx])([yz])	wy, wz, xy, or xz
    [^A-Za-z0-9]	Any symbol (not a number or a letter)
    ([A-Z]{3}|[0-9]{4})	Matches three letters or four numbers
    Using Regular Expressions with PHP

    Code:
    Regular expression (pattern)	Match (subject)	 Not match (subject)	Comment
    world	Hello world	Hello Jim	Match if the pattern is present anywhere in the subject
    ^world	world class	Hello world	Match if the pattern is present at the beginning of the subject
    world$	Hello world	world class	Match if the pattern is present at the end of the subject
    world/i	This WoRLd	Hello Jim	Makes a search in case insensitive mode
    ^world$	world	Hello world	The string contains only the "world"
    world*	worl, world, worlddd	wor	There is 0 or more "d" after "worl"
    world+	world, worlddd	worl	There is at least 1 "d" after "worl"
    world?	worl, world, worly	wor, wory	There is 0 or 1 "d" after "worl"
    world{1}	world	worly	There is 1 "d" after "worl"
    world{1,}	world, worlddd	worly	There is 1 ore more "d" after "worl"
    world{2,3}	worldd, worlddd	world	There are 2 or 3 "d" after "worl"
    wo(rld)*	wo, world, worldold	wa	There is 0 or more "rld" after "wo"
    earth|world	earth, world	sun	The string contains the "earth" or the "world"
    w.rld	world, wwrld	wrld	Any character in place of the dot.
    ^.{5}$	world, earth	sun	A string with exactly 5 characters
    [abc]	abc, bbaccc	sun	There is an "a" or "b" or "c" in the string
    [a-z]	world	WORLD	There are any lowercase letter in the string
    [a-zA-Z]	world, WORLD, Worl12	123	There are any lower- or uppercase letter in the string
    [^wW]	earth	w, W	The actual character can not be a "w" or "W"
    PHP regular expression tutorial
    <!DOCTYPE html PUBLIC "-//WAPFORUM.RS

    #2
    your tutorial helps me .thank you man
    PHP Code:
    /* I don't know everything hehe */ 
    Find me on facebook

    Comment


      #3
      U r welcome.
      <!DOCTYPE html PUBLIC "-//WAPFORUM.RS

      Comment

      Working...
      X