我如何使用jQuery解码字符串中的HTML实体?


当前回答

您不需要jQuery来解决这个问题,因为它会产生一些开销和依赖。

我知道这里有很多好的答案,但由于我实现了一个有点不同的方法,我想分享一下。

这段代码是一种非常安全的安全方法,因为转义处理程序依赖于浏览器,而不是函数。因此,如果将来会发现某些漏洞,则覆盖此解决方案。

const decodeHTMLEntities = text => {
    // Create a new element or use one from cache, to save some element creation overhead
    const el = decodeHTMLEntities.__cache_data_element 
             = decodeHTMLEntities.__cache_data_element 
               || document.createElement('div');
    
    const enc = text
        // Prevent any mixup of existing pattern in text
        .replace(/⪪/g, '⪪#')
        // Encode entities in special format. This will prevent native element encoder to replace any amp characters
        .replace(/&([a-z1-8]{2,31}|#x[0-9a-f]+|#\d+);/gi, '⪪$1⪫');

    // Encode any HTML tags in the text to prevent script injection
    el.textContent = enc;

    // Decode entities from special format, back to their original HTML entities format
    el.innerHTML = el.innerHTML
        .replace(/⪪([a-z1-8]{2,31}|#x[0-9a-f]+|#\d+)⪫/gi, '&$1;')
        .replace(/⪪#/g, '⪪');
   
    // Get the decoded HTML entities
    const dec = el.textContent;
    
    // Clear the element content, in order to preserve a bit of memory (in case the text is big)
    el.textContent = '';

    return dec;
}

// Example
console.log(decodeHTMLEntities("<script>alert('&awconint;&CounterClockwiseContourIntegral;&#x02233;&#8755;⪪#x02233⪫');</script>"));
// Prints: <script>alert('∳∳∳∳⪪#x02233⪫');</script>

顺便说一下,我选择使用字符⪪和⪫,因为它们很少被使用,因此通过匹配它们影响性能的几率显著降低。

其他回答

最简单的方法是为你的元素设置一个类选择器,然后使用以下代码:

$(function(){
    $('.classSelector').each(function(a, b){
        $(b).html($(b).text());
    });
});

什么都不需要了!

我有这个问题,找到了这个清晰的解决方案,它工作得很好。

我认为你混淆了文本和HTML方法。看看这个例子,如果您使用元素的内部HTML作为文本,您将得到解码的HTML标记(第二个按钮)。但如果您将它们作为HTML使用,则会得到HTML格式的视图(第一个按钮)。

<div id="myDiv">
    here is a <b>HTML</b> content.
</div>
<br />
<input value="Write as HTML" type="button" onclick="javascript:$('#resultDiv').html($('#myDiv').html());" />
&nbsp;&nbsp;
<input value="Write as Text" type="button" onclick="javascript:$('#resultDiv').text($('#myDiv').html());" />
<br /><br />
<div id="resultDiv">
    Results here !
</div>

第一个按钮写着:这是一个HTML内容。

第二个按钮写:这里是一个<B>HTML</B>内容。

顺便说一下,你可以看到我在jQuery插件中找到的一个插件- HTML decode and encode,它编码和解码HTML字符串。

就像Mike Samuel说的,不要使用jQuery.html().text()来解码html实体,因为这是不安全的。

相反,使用模板渲染器,如Mustache.js或decodeEntities从@VyvIT的评论。

js实用带库提供了escape和unescape方法,但它们对用户输入不安全:

_.escape(string)

_.unescape(string)

您不需要jQuery来解决这个问题,因为它会产生一些开销和依赖。

我知道这里有很多好的答案,但由于我实现了一个有点不同的方法,我想分享一下。

这段代码是一种非常安全的安全方法,因为转义处理程序依赖于浏览器,而不是函数。因此,如果将来会发现某些漏洞,则覆盖此解决方案。

const decodeHTMLEntities = text => {
    // Create a new element or use one from cache, to save some element creation overhead
    const el = decodeHTMLEntities.__cache_data_element 
             = decodeHTMLEntities.__cache_data_element 
               || document.createElement('div');
    
    const enc = text
        // Prevent any mixup of existing pattern in text
        .replace(/⪪/g, '⪪#')
        // Encode entities in special format. This will prevent native element encoder to replace any amp characters
        .replace(/&([a-z1-8]{2,31}|#x[0-9a-f]+|#\d+);/gi, '⪪$1⪫');

    // Encode any HTML tags in the text to prevent script injection
    el.textContent = enc;

    // Decode entities from special format, back to their original HTML entities format
    el.innerHTML = el.innerHTML
        .replace(/⪪([a-z1-8]{2,31}|#x[0-9a-f]+|#\d+)⪫/gi, '&$1;')
        .replace(/⪪#/g, '⪪');
   
    // Get the decoded HTML entities
    const dec = el.textContent;
    
    // Clear the element content, in order to preserve a bit of memory (in case the text is big)
    el.textContent = '';

    return dec;
}

// Example
console.log(decodeHTMLEntities("<script>alert('&awconint;&CounterClockwiseContourIntegral;&#x02233;&#8755;⪪#x02233⪫');</script>"));
// Prints: <script>alert('∳∳∳∳⪪#x02233⪫');</script>

顺便说一下,我选择使用字符⪪和⪫,因为它们很少被使用,因此通过匹配它们影响性能的几率显著降低。

您可以使用该库,从https://github.com/mathiasbynens/he获得

例子:

console.log(he.decode("J&#246;rg &amp J&#xFC;rgen rocked to &amp; fro "));
// Logs "Jörg & Jürgen rocked to & fro"

我质疑这个库的作者,是否有任何理由在客户端代码中使用这个库来支持<textarea>黑客提供的其他答案在这里和其他地方。他提供了一些可能的理由:

If you're using node.js serverside, using a library for HTML encoding/decoding gives you a single solution that works both clientside and serverside. Some browsers' entity decoding algorithms have bugs or are missing support for some named character references. For example, Internet Explorer will both decode and render non-breaking spaces (&nbsp;) correctly but report them as ordinary spaces instead of non-breaking ones via a DOM element's innerText property, breaking the <textarea> hack (albeit only in a minor way). Additionally, IE 8 and 9 simply don't support any of the new named character references added in HTML 5. The author of he also hosts a test of named character reference support at http://mathias.html5.org/tests/html/named-character-references/. In IE 8, it reports over one thousand errors. If you want to be insulated from browser bugs related to entity decoding and/or be able to handle the full range of named character references, you can't get away with the <textarea> hack; you'll need a library like he. He just darn well feels like doing things this way is less hacky.