I'm working on a web application that allows users to type short descriptions of items in a catalog. I'm allowing Markdown in my textareas so users can do some HTML formatting.
My text sanitization function strips all tags from any inputted text before inserting it in the database:
public function sanitizeText($string, $allowedTags = "") {
$string = strip_tags($string, $allowedTags);
if(get_magic_quotes_gpc()) {
return mysql_real_escape_string(stripslashes($string));
} else {
return mysql_real_escape_string($string);
}
}
Essentially, all I'm storing in the database is Markdown--no other HTML, even "basic HTML" (like here at SO) is allowed.
Will allowing markdown present any security threats? Can markdown be XSSed, even though it has no tags?
解决方案
I think stripping any HTML tag from the input will get you something pretty secure -- except if someone find a way to inject some really messed up data into Markdown, having it generate some even more messed-up output ^^
Still, here are two things that come to my mind :
First one : strip_tags is not a miracle function : it has some flaws...
For instance, it'll strip everything after the '
$str = "10 appels is
var_dump(strip_tags($str));
The output I get is :
string '10 appels is ' (length=13)
Which is not that nice for your users :-(
Second one : One day or another, you might want to allow some HTML tags/attributes ; or, even today, you might want to be sure that Markdown doesn't generate some HTML Tags/attributes.
You might be interested by something like HTMLPurifier : it allows you to specify which tags and attributes should be kept, and filters a string, so that only those remain.
It also generates valid HTML code -- which is always nice ;-)