使用Javascript的atob解码base64不能正确解码utf-8字符串

我使用Javascript window.atob()函数来解码base64编码的字符串(特别是来自GitHub API的base64编码的内容)。问题是我得到ascii编码的字符返回(如âⅱ而不是™)。我如何正确地处理传入的base64编码流，以便将其解码为utf-8?

当前回答

事物是变化的。escape/unescape方法已弃用。

你可以在对字符串进行base64编码之前对其进行URI编码。注意，这不会生成base64编码的UTF8，而是生成base64编码的url编码数据。双方必须就相同的编码达成一致。

参见工作示例:http://codepen.io/anon/pen/PZgbPW

// encode string
var base64 = window.btoa(encodeURIComponent('€ 你好 æøåÆØÅ'));
// decode string
var str = decodeURIComponent(window.atob(tmp));
// str is now === '€ 你好 æøåÆØÅ'

对于OP的问题，第三方库如js-base64应该可以解决这个问题。

2016-02-17 07:03:59

其他回答

这是我的一行程序解决方案，结合了Jackie Hans的答案和另一个问题的一些代码:

const utf8_encoded_text = new TextDecoder().decode(Uint8Array.from(window.atob(base_64_decoded_text).split("").map(x => x.charCodeAt(0))));

2022-11-15 16:43:57

以下是Mozilla开发资源中描述的2018年更新的解决方案

从unicode编码到b64

function b64EncodeUnicode(str) {
    // first we use encodeURIComponent to get percent-encoded UTF-8,
    // then we convert the percent encodings into raw bytes which
    // can be fed into btoa.
    return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
        function toSolidBytes(match, p1) {
            return String.fromCharCode('0x' + p1);
    }));
}

b64EncodeUnicode('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64EncodeUnicode('\n'); // "Cg=="

从b64解码到unicode

function b64DecodeUnicode(str) {
    // Going backwards: from bytestream, to percent-encoding, to original string.
    return decodeURIComponent(atob(str).split('').map(function(c) {
        return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
    }).join(''));
}

b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
b64DecodeUnicode('Cg=='); // "\n"

2018-10-04 13:02:27

对我有用的完整文章:https://developer.mozilla.org/en-US/docs/Web/JavaScript/Base64_encoding_and_decoding

我们从Unicode/UTF-8编码的部分是

function utf8_to_b64( str ) {
   return window.btoa(unescape(encodeURIComponent( str )));
}

function b64_to_utf8( str ) {
   return decodeURIComponent(escape(window.atob( str )));
}

// Usage:
utf8_to_b64('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64_to_utf8('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"

这是当今最常用的方法之一。

2020-06-23 14:51:15

事物是变化的。escape/unescape方法已弃用。

参见工作示例:http://codepen.io/anon/pen/PZgbPW

// encode string
var base64 = window.btoa(encodeURIComponent('€ 你好 æøåÆØÅ'));
// decode string
var str = decodeURIComponent(window.atob(tmp));
// str is now === '€ 你好 æøåÆØÅ'

对于OP的问题，第三方库如js-base64应该可以解决这个问题。

2016-02-17 07:03:59

小修正，unescape和escape已弃用，因此:

function utf8_to_b64( str ) {
    return window.btoa(decodeURIComponent(encodeURIComponent(str)));
}

function b64_to_utf8( str ) {
     return decodeURIComponent(encodeURIComponent(window.atob(str)));
}


function b64_to_utf8( str ) {
    str = str.replace(/\s/g, '');    
    return decodeURIComponent(encodeURIComponent(window.atob(str)));
}

2015-12-08 11:32:58

使用Javascript的atob解码base64不能正确解码utf-8字符串

推荐文章

最新文章

标签