Part II: Data Validation and Sanitization in WordPress

In the First Part of this article, I explained about Data Validation and Sanitization, it’s importance and some key differences between them. In this part I shall be covering some of the important functions in WordPress that can be used to validate and sanitize data. First, lets have a brief explanation on HTML elements and nodes so that it will make some sense and will be easier to use the escaping functions.

Here is an HTML code that can be broadly divided into 3 parts:

  1. Element node: Any element in the document like <h1>, <a> <span>
  2. Attribute node: An attribute provides additional information about an element and comes in name/value pairs like name=”value”. In above example class=”some_class”, title=”some_title” are attribute node.
  3. Text node: Any text found outside of element and attribute node. In above example some_text is text node.

So the key thing to remember here is that we should always try to sanitize attribute and text node.

For example:

<span><?php echo $title; ?> </span>
<a href="#link" title="<?php echo $title_link; ?> >$text</a>

The above code is vulnerable and breaks the output if

<?php
$title="<>/title_name";
$title_link='onclick="alert('XSS')"';
?>

Now here’s what we need to do to safely output the data in browsers

<span><?php echo esc_html( $title ); ?></span>
<a href="#link" title="<?php echo esc_attr( $title_link ); ?> ><?php echo esc_html( $text ); ?></a>

Now let’s go on details on available WordPress escaping functions and analyze what they do.

esc_attr(): This function escapes content that is to be contained inside HTML attributes. e.g. title, rel, etc. It encodes < > & ” ‘ (less than, greater than, ampersand, double quote, single quote). Will never double encode entities.

$attr="this >is <just "&?'attr";
echo esc_attr($attr);

//output will be encoded as
this &gt;is &lt;just &quot;&amp;?&#039;attr

esc_html(): This function encodes < > & ” ‘ (less than, greater than, ampersand, double quote, single quote), letting the browser render it instead of interpreting it. Very similar to esc_attr.

$attr= '<div class="text_class" onclick="click('me')">hello there...</div>';
echo esc_html($attr);

//above code will be encoded as
&lt;div class=&quot;text_class&quot; onclick=&quot;click('me')&quot;&gt;hello there...&lt;/div&gt;

esc_textarea(): Encodes text for use inside a <textarea> element. esc_textarea sanitizes anything that is going to be displayed in a textarea element and is similar to esc_html. It uses htmlspecialchars function of PHP.

$attr = 'This text contain <script type="text/javascript">alert("XSS");</script> here!';
<textarea><?php echo esc_textarea($attr); ?></textarea>

//output will be encoded as
This text contain &lt;script type=&quot;text/javascript&quot;&gt;alert(&quot;XSS&quot;);&lt;/script&gt; here! ?&gt;

esc_url(‘$url, (array) $protocols’): Always use esc_url when sanitizing URLs (in text nodes, attribute nodes or anywhere else). Rejects URLs that do not have one of the provided whitelisted protocols (defaulting to http, https, ftp, ftps, mailto, news, irc, gopher, nntp, feed, and telnet), eliminates invalid and removes dangerous characters.This function encodes characters as HTML entities:

//example 1
$url1 = "http://catchinternet.com/<script>alert('XSS')</script>";
echo esc_url( $url1);

//output
http://catchinternet.com/scriptalert(‘XSS’)/script

 


//example 2
$url2 = 'hellothere';
echo esc_url($url2);

//output
http://hellothere

We can utilize the second argument to allow only certain protocols among the default one.

$protocol = array( ‘http’, ‘https’, ‘ftp’ );
$url1 = ‘https://example.com’;
echo esc_url( $url1, $protocol );
//output
https://example.com

 

$url2 = ‘irc://example.com’;
echo esc_url( $url2, $protocol );

//output
blank

in JS. It is intended
esc_js( $text ): Escape single quotes, htmlspecialchar ” < > &, and fix line endings.
Escapes text strings for echoing in JS. It is intented to be used for inline JS (in a tag attribute, for example onclick=”do something”). Escapes all sorts of quote manipulations in strings that can lead to broken JavaScript

These are some important escaping function that we can use to sanitize the output data before echoing to the browser. Remember it’s best to do the output validation as late as possible. This way one can always be sure that our data is properly validated/escaped and we don’t need to remember if the variable has been previously validated.
NOTE: On my next article on Data validation and sanitization, I shall be covering on functions for input validation and its uses.

Part II: Data Validation and Sanitization in WordPress