Arc Forumnew | comments | leaders | submitlogin
2 points by kmag 6173 days ago | link | parent

I should also point out that dealing with non-normalized Unicode has caused many security bugs in the past. For instance, sometimes input validation routines have bugs where they only test for the normalized forms of unsafe inputs.

Imagine a website that checks that user-submitted JavaScript fragments only contain a "safe" subset of JavaScript, disallowing eval(). Now imagine that the user input contains eval with the "e" represented as a Latin wide character and imagine that some browser understands Latin wide versions of JavaScript keywords, or that the code that renders web pages to HTML transforms Latin wide characters to standard Latin characters.