在XML中什么是无效字符

我正在使用一些包含字符串的XML:

<node>This is a string</node>

我传递给节点的一些字符串将有&，#，$等字符:

<node>This is a string & so is this</node>

由于&，这是无效的。

我不能在CDATA中包装这些字符串，因为它们需要这样。我尝试寻找一个字符列表，这些字符不能放在XML节点中，而不能放在CDATA中。

有人能给我指个方向或者给我一份非法字符的列表吗?

当前回答

“XmlWriter和低ASCII字符”对我很有用

string code = Regex.Replace(item.Code, @"[\u0000-\u0008,\u000B,\u000C,\u000E-\u001F]", "");

2018-07-04 04:43:59

其他回答

除了potame的答案，如果你想转义使用CDATA块。

如果你把你的文本放在一个CDATA块，那么你不需要使用转义。在这种情况下，您可以使用以下范围内的所有字符:

注意:除此之外，您不允许使用]]>字符序列。因为它将匹配CDATA块的末尾。

如果仍然存在无效字符(例如控制字符)，那么可能最好使用某种编码(例如base64)。

2017-01-30 14:07:36

有效字符的列表在XML规范中:

Char       ::=      #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]  /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

2011-02-24 20:34:52

综上所述，文本中的有效字符为:

制表符，换行和换行。除&和<外，所有非控制字符都有效。如果使用]]，则>无效。

XML规范的2.2节和2.4节详细给出了答案:

字符

合法字符包括制表符、回车符、换行符以及Unicode和ISO/IEC 10646的合法字符

字符数据

The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they must be escaped using either numeric character references or the strings " & " and " < " respectively. The right angle bracket (>) may be represented using the string " > ", and must, for compatibility, be escaped using either " > " or a character reference when it appears in the string " ]]> " in content, when that string is not marking the end of a CDATA section.

2018-10-24 14:41:42

对于XSL(在非常懒惰的日子里)，我使用:

capture="&amp;(?!amp;)" capturereplace="&amp;amp;"

翻译所有没有遵循的&符号på amp;敬合适的人。

在某些情况下，输入是CDATA，但是使用XML的系统没有考虑到它。这是一个草率的修复，小心…

2013-06-17 15:36:03

在Woodstox XML处理器中，无效字符由以下代码分类:

if (c == 0) {
    throw new IOException("Invalid null character in text to output");
}
if (c < ' ' || (c >= 0x7F && c <= 0x9F)) {
    String msg = "Invalid white space character (0x" + Integer.toHexString(c) + ") in text to output";
    if (mXml11) {
        msg += " (can only be output using character entity)";
    }
    throw new IOException(msg);
}
if (c > 0x10FFFF) {
    throw new IOException("Illegal unicode character point (0x" + Integer.toHexString(c) + ") to output; max is 0x10FFFF as per RFC");
}
/*
 * Surrogate pair in non-quotable (not text or attribute value) content, and non-unicode encoding (ISO-8859-x,
 * Ascii)?
 */
if (c >= SURR1_FIRST && c <= SURR2_LAST) {
    throw new IOException("Illegal surrogate pair -- can only be output via character entities, which are not allowed in this content");
}
throw new IOException("Invalid XML character (0x"+Integer.toHexString(c)+") in text to output");

来自这里

2014-12-03 10:27:05

在XML中什么是无效字符

推荐文章

最新文章

标签