在纯Java代码中输出HTML时,是否有一种推荐的方法来转义<,>,"和&字符?(除了手动执行以下操作之外)。
String source = "The less than sign (<) and ampersand (&) must be escaped before using them in HTML";
String escaped = source.replace("<", "<").replace("&", "&"); // ...
在纯Java代码中输出HTML时,是否有一种推荐的方法来转义<,>,"和&字符?(除了手动执行以下操作之外)。
String source = "The less than sign (<) and ampersand (&) must be escaped before using them in HTML";
String escaped = source.replace("<", "<").replace("&", "&"); // ...
当前回答
Be careful with this. There are a number of different 'contexts' within an HTML document: Inside an element, quoted attribute value, unquoted attribute value, URL attribute, javascript, CSS, etc... You'll need to use a different encoding method for each of these to prevent Cross-Site Scripting (XSS). Check the OWASP XSS Prevention Cheat Sheet for details on each of these contexts. You can find escaping methods for each of these contexts in the OWASP ESAPI library -- https://github.com/ESAPI/esapi-java-legacy.
其他回答
对于使用谷歌番石榴的人:
import com.google.common.html.HtmlEscapers;
[...]
String source = "The less than sign (<) and ampersand (&) must be escaped before using them in HTML";
String escaped = HtmlEscapers.htmlEscaper().escape(source);
Be careful with this. There are a number of different 'contexts' within an HTML document: Inside an element, quoted attribute value, unquoted attribute value, URL attribute, javascript, CSS, etc... You'll need to use a different encoding method for each of these to prevent Cross-Site Scripting (XSS). Check the OWASP XSS Prevention Cheat Sheet for details on each of these contexts. You can find escaping methods for each of these contexts in the OWASP ESAPI library -- https://github.com/ESAPI/esapi-java-legacy.
简单的方法:
public static String escapeHTML(String s) {
StringBuilder out = new StringBuilder(Math.max(16, s.length()));
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c > 127 || c == '"' || c == '\'' || c == '<' || c == '>' || c == '&') {
out.append("&#");
out.append((int) c);
out.append(';');
} else {
out.append(c);
}
}
return out.toString();
}
基于https://stackoverflow.com/a/8838023/1199155(放大器不在那里)。根据http://www.w3.org/TR/html4/sgml/entities.html的说法,if子句中选中的四个字符是唯一低于128的字符
有一个更新版本的Apache Commons Lang库,它使用了一个不同的包名(org.apache.commons.lang3)。StringEscapeUtils现在有不同的静态方法来转义不同类型的文档(http://commons.apache.org/proper/commons-lang/javadocs/api-3.0/index.html)。转义HTML 4.0版本的字符串:
import static org.apache.commons.lang3.StringEscapeUtils.escapeHtml4;
String output = escapeHtml4("The less than sign (<) and ampersand (&) must be escaped before using them in HTML");
Java 8+解决方案:
public static String escapeHTML(String str) {
return str.chars().mapToObj(c -> c > 127 || "\"'<>&".indexOf(c) != -1 ?
"&#" + c + ";" : String.valueOf((char) c)).collect(Collectors.joining());
}
String#chars返回String中的char值的IntStream。然后,我们可以使用mapToObj来转义字符代码大于127的字符(非ascii字符)以及双引号(")、单引号(')、左尖括号(<)、右尖括号(>)和&号(&)。收藏家。join将字符串连接在一起。
为了更好地处理Unicode字符,可以使用String#codePoints代替。
public static String escapeHTML(String str) {
return str.codePoints().mapToObj(c -> c > 127 || "\"'<>&".indexOf(c) != -1 ?
"&#" + c + ";" : new String(Character.toChars(c)))
.collect(Collectors.joining());
}