Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!gegeweb.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail From: Joshua Cranmer Newsgroups: comp.lang.java.programmer Subject: Re: Java code to output escaped Javascript? Date: Wed, 01 Jun 2011 21:47:48 -0400 Organization: A noiseless patient Spider Lines: 45 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Thu, 2 Jun 2011 01:47:50 +0000 (UTC) Injection-Info: mx04.eternal-september.org; posting-host="bAymlyY9SkaJNa8Tz2rerw"; logging-data="16236"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+tNopgFZiUivfoMbIOZND8SlWfXm7JyZE=" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.16pre) Gecko/20110305 Lightning/1.0b3pre Thunderbird/3.1.10pre In-Reply-To: Cancel-Lock: sha1:tdO2RJ/U30994f55sAqY5ejkeM0= Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:4874 On 06/01/2011 06:17 PM, Lawrence D'Oliveiro wrote: > In message, Joshua Cranmer wrote: > >> On 06/01/2011 09:11 AM, laredotornado wrote: >> >>> private String escapeForJS(String value) { >>> value = value.replace("\n", "\\n"); >>> value = value.replace("\r", "\\r"); >>> value = value.replace("\"", "\\\""); >>> return value; >>> } >> >> You also forgot `\' as well as every character in the range >> '\u0000'-'\u001f' and '\u007f-\uffff' ... > > Can’t they just occur literally? According to the ECMAScript specification, Line terminators (i.e., \u000A, \u000D, \u2028, and \u2029), `\', and the string character (", in this case) are prohibited from appearing in strings outright. In practice, anything that isn't pure ASCII puts you on shaky grounds due to the potential for charset confusion (the specification assumes that the input source text is already normalized to Unicode canonical form, so how engines see what you input may be different). I would also hold the use of, in particular, NUL and form-feed characters as potentially problematic. In short: The following characters are always safe *not* to escape: * A-Z, a-z, 0-9 * ~!@#$%^&*()_+`-={}[]|\:;<>?,./ * spaces The following should be okay: * ' or ", depending on how you open the string * "simple" accented characters (i.e., \xa0-ff in your favorite 8-bit charset, mostly UTF-8 or Cp1252) Never valid: * \, \n, \r, \u2028, and \u2029 Anything else (particularly "\u0000") is potentially risky. -- Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth