在纯Java代码中输出HTML时,是否有一种推荐的方法来转义<,>,"和&字符?(除了手动执行以下操作之外)。
String source = "The less than sign (<) and ampersand (&) must be escaped before using them in HTML";
String escaped = source.replace("<", "<").replace("&", "&"); // ...
在纯Java代码中输出HTML时,是否有一种推荐的方法来转义<,>,"和&字符?(除了手动执行以下操作之外)。
String source = "The less than sign (<) and ampersand (&) must be escaped before using them in HTML";
String escaped = source.replace("<", "<").replace("&", "&"); // ...
当前回答
Java 8+解决方案:
public static String escapeHTML(String str) {
return str.chars().mapToObj(c -> c > 127 || "\"'<>&".indexOf(c) != -1 ?
"&#" + c + ";" : String.valueOf((char) c)).collect(Collectors.joining());
}
String#chars返回String中的char值的IntStream。然后,我们可以使用mapToObj来转义字符代码大于127的字符(非ascii字符)以及双引号(")、单引号(')、左尖括号(<)、右尖括号(>)和&号(&)。收藏家。join将字符串连接在一起。
为了更好地处理Unicode字符,可以使用String#codePoints代替。
public static String escapeHTML(String str) {
return str.codePoints().mapToObj(c -> c > 127 || "\"'<>&".indexOf(c) != -1 ?
"&#" + c + ";" : new String(Character.toChars(c)))
.collect(Collectors.joining());
}
其他回答
有一个更新版本的Apache Commons Lang库,它使用了一个不同的包名(org.apache.commons.lang3)。StringEscapeUtils现在有不同的静态方法来转义不同类型的文档(http://commons.apache.org/proper/commons-lang/javadocs/api-3.0/index.html)。转义HTML 4.0版本的字符串:
import static org.apache.commons.lang3.StringEscapeUtils.escapeHtml4;
String output = escapeHtml4("The less than sign (<) and ampersand (&) must be escaped before using them in HTML");
出于某些目的,htmltils:
import org.springframework.web.util.HtmlUtils;
[...]
HtmlUtils.htmlEscapeDecimal("&"); //gives &
HtmlUtils.htmlEscape("&"); //gives &
stringescapeutils现在已弃用。您现在必须使用org.apache.commons.text.StringEscapeUtils by
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-text</artifactId>
<version>${commons.text.version}</version>
</dependency>
大多数库都提供转义,包括数百个符号和数千个非ascii字符,这在UTF-8世界中不是你想要的。
而且,正如Jeff Williams所指出的,没有单一的“转义HTML”选项,有几个上下文。
假设你从未使用过不带引号的属性,并记住存在不同的上下文,它写了我自己的版本:
private static final long TEXT_ESCAPE =
1L << '&' | 1L << '<';
private static final long DOUBLE_QUOTED_ATTR_ESCAPE =
TEXT_ESCAPE | 1L << '"';
private static final long SINGLE_QUOTED_ATTR_ESCAPE =
TEXT_ESCAPE | 1L << '\'';
private static final long ESCAPES =
DOUBLE_QUOTED_ATTR_ESCAPE | SINGLE_QUOTED_ATTR_ESCAPE;
// 'quot' and 'apos' are 1 char longer than '#34' and '#39'
// which I've decided to use
private static final String REPLACEMENTS = ""&'<";
private static final int REPL_SLICES = /* [0, 5, 10, 15, 19) */
5<<5 | 10<<10 | 15<<15 | 19<<20;
// These 5-bit numbers packed into a single int
// are indices within REPLACEMENTS which is a 'flat' String[]
private static void appendEscaped(
Appendable builder, CharSequence content, long escapes) {
try {
int startIdx = 0, len = content.length();
for (int i = 0; i < len; i++) {
char c = content.charAt(i);
long one;
if (((c & 63) == c) && ((one = 1L << c) & escapes) != 0) {
// -^^^^^^^^^^^^^^^ -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// | | take only dangerous characters
// | java shifts longs by 6 least significant bits,
// | e. g. << 0b110111111 is same as >> 0b111111.
// | Filter out bigger characters
int index = Long.bitCount(ESCAPES & (one - 1));
builder.append(content, startIdx, i /* exclusive */).append(
REPLACEMENTS,
REPL_SLICES >>> (5 * index) & 31,
REPL_SLICES >>> (5 * (index + 1)) & 31
);
startIdx = i + 1;
}
}
builder.append(content, startIdx, len);
} catch (IOException e) {
// typically, our Appendable is StringBuilder which does not throw;
// also, there's no way to declare 'if A#append() throws E,
// then appendEscaped() throws E, too'
throw new UncheckedIOException(e);
}
}
考虑从Gist复制粘贴,没有行长限制。
UPD:正如另一个答案所暗示的,>转义是不必要的;同样,“within attr='…'也是允许的。我已经相应地更新了代码。
你可以自己去看看:
<!DOCTYPE html>
<html lang="en">
<head><title>Test</title></head>
<body>
<p title="<"I'm double-quoted!">"><"Hello!"></p>
<p title='<"I'm single-quoted!">'><"Goodbye!"></p>
</body>
</html>
Apache Commons的替代方案:使用Spring的htmltils。htmlEscape(字符串输入)方法。