Version 3.1 of the Microsoft AntiXSS library (binary download) was released on the 15th September and now comes with HTML sanitation. Not content with dropping a new release of the library Anil’s wife also dropped a release of her own and he’s now on paternity leave, which means the new functionality is undocumented for now.
A quick look in the help file shows two new methods, GetSafeHtml and GetSafeHtmlFragment. Both methods have the same three overloads,
GetSafeHtml(string) – which takes a string containing the HTML to be made safe GetSafeHtml(TextReader, Stream) – which takes a TextReader as the source of the HTML and outputs to the specified stream GetSafeHtml(TextReader, TextWriter) – which takes a TextReader as the source of the HTML and outputs to the specified text writer.
The difference between GetSafeHtml and GetSafeHtmlFragment lies in the output. GetSafeHtml outputs an html page, wrapping the input in <html> and <body> tags if they’re not there, GetSafeHtmlFragment just strips unsafe HTML from the input, without turning it into a complete page. What’s considered unsafe isn’t documented yet, but it does use a white list of non-scriptable tags and attributes. What you will find is that it is HTML it’s outputting, not XHTML, so if you want to use this to produce safe output for your XHTML site then you’re going to have to pump the output through the HtmlAgilityPack to get XHTML.
So congrats to Anil on the new babies. I think nappies/diapers come before documentation right now!