I'm developing a part of an application that's responsible for exporting some data into CSV files. The application always uses UTF-8 because of its multilingual nature at all levels. But opening such CSV files (containing e.g. diacritics, cyrillic letters, Greek letters) in Excel does not achieve the expected results showing something like Г„/Г¤, Г–/Г¶. And I don't know how to force Excel understand that the open CSV file is encoded in UTF-8. I also tried specifying UTF-8 BOM EF BB BF, but Excel ignores that.

有什么解决办法吗?

附注:哪些工具可能像Excel一样?


更新

I have to say that I've confused the community with the formulation of the question. When I was asking this question, I asked for a way of opening a UTF-8 CSV file in Excel without any problems for a user, in a fluent and transparent way. However, I used a wrong formulation asking for doing it automatically. That is very confusing and it clashes with VBA macro automation. There are two answers for this questions that I appreciate the most: the very first answer by Alex https://stackoverflow.com/a/6002338/166589, and I've accepted this answer; and the second one by Mark https://stackoverflow.com/a/6488070/166589 that have appeared a little later. From the usability point of view, Excel seemed to have lack of a good user-friendly UTF-8 CSV support, so I consider both answers are correct, and I have accepted Alex's answer first because it really stated that Excel was not able to do that transparently. That is what I confused with automatically here. Mark's answer promotes a more complicated way for more advanced users to achieve the expected result. Both answers are great, but Alex's one fits my not clearly specified question a little better.


更新2

在最后一次编辑5个月后,我注意到Alex的答案不知为何消失了。我真的希望这不是一个技术问题,我希望现在不再有关于哪个答案更好的讨论。所以我认为马克的答案是最好的。


当前回答

您可以转换。csv文件到UTF-8与BOM通过notepad++:

在notepad++中打开文件。 进入“编码→转换为UTF-8-BOM”菜单。 进入菜单文件→保存。 关闭记事本+ +。 在Excel中打开文件。

在Microsoft Excel 2013 (15.0.5093.1000) MSO(15.0.5101.1000) 64位中工作,来自Microsoft Office Professional Plus 2013在Windows 8.1上,非unicode程序的区域设置为“德语(德国)”。

其他回答

简单的vba宏用于打开utf-8文本和csv文件

Sub OpenTextFile()

   filetoopen = Application.GetOpenFilename("Text Files (*.txt;*.csv), *.txt;*.csv")
   If filetoopen = Null Or filetoopen = Empty Then Exit Sub

   Workbooks.OpenText Filename:=filetoopen, _
   Origin:=65001, DataType:=xlDelimited, Comma:=True

End Sub

原点:=65001为UTF-8。 逗号:对于按列分布的.csv文件为True

保存在个人。XLSB使它始终可用。 个性化excel工具栏添加一个宏调用按钮,并从那里打开文件。 您可以添加更多的格式到宏,如列自动拟合,对齐等。

这并不是准确地解决问题,但由于我偶然发现了这一点,上面的解决方案不适合我或有要求,我不能满足,这里是另一种方式添加BOM时,你可以访问vim:

vim -e -s +"set bomb|set encoding=utf-8|wq" filename.csv

令人难以置信的是,有这么多答案,但没有一个能回答这个问题:

“当我问这个问题时,我询问了一种打开UTF-8的方法 CSV文件在Excel没有任何问题的用户,……”

被标记为200+赞成的接受答案对我来说是无用的,因为我不想给我的用户如何配置Excel的手册。 除此之外:本手册将适用于一个Excel版本,但其他Excel版本有不同的菜单和配置对话框。每个Excel版本都需要一个手册。

那么问题是如何使Excel显示UTF8数据与一个简单的双击?

好吧,至少在Excel 2007中,如果你使用CSV文件,这是不可能的,因为UTF8 BOM被忽略,你只会看到垃圾。这已经是Lyubomyr Shaydariv问题的一部分:

“我还尝试指定UTF-8 BOM EF BB BF,但Excel忽略了这一点。”

我也有同样的经历:将俄语或希腊语数据写入UTF8 CSV文件,并使用BOM在Excel中生成垃圾:

UTF8 CSV文件内容:

Colum1;Column2
Val1;Val2
Авиабилет;Tλληνικ

Excel 2007的结果:

A solution is to not use CSV at all. This format is implemented so stupidly by Microsoft that it depends on the region settings in control panel if comma or semicolon is used as separator. So the same CSV file may open correctly on one computer but on anther computer not. "CSV" means "Comma Separated Values" but for example on a german Windows by default semicolon must be used as separator while comma does not work. (Here it should be named SSV = Semicolon Separated Values) CSV files cannot be interchanged between different language versions of Windows. This is an additional problem to the UTF-8 problem.

Excel已经存在了几十年。微软这么多年都没能实现CSV导入这样一个基本的功能,真是太遗憾了。


但是,如果您将相同的值放入HTML文件中,并将该文件保存为UTF8文件,文件扩展名为XLS,您将得到正确的结果。

UTF8 XLS文件内容:

<table>
<tr><td>Colum1</td><td>Column2</td></tr>
<tr><td>Val1</td><td>Val2</td></tr>
<tr><td>Авиабилет</td><td>Tλληνικ</td></tr>
</table>

Excel 2007的结果:

你甚至可以在HTML中使用Excel能正确显示的颜色。

<style>
.Head { background-color:gray; color:white; }
.Red  { color:red; }
</style>
<table border=1>
<tr><td class=Head>Colum1</td><td class=Head>Column2</td></tr>
<tr><td>Val1</td><td>Val2</td></tr>
<tr><td class=Red>Авиабилет</td><td class=Red>Tλληνικ</td></tr>
</table>

Excel 2007的结果:

在这种情况下,只有表本身具有黑色边框和线条。如果你想要所有的单元格显示网格线,这在HTML中也是可能的:

<html xmlns:x="urn:schemas-microsoft-com:office:excel">
    <head>
        <meta http-equiv="content-type" content="text/plain; charset=UTF-8"/>
        <xml>
            <x:ExcelWorkbook>
                <x:ExcelWorksheets>
                    <x:ExcelWorksheet>
                        <x:Name>MySuperSheet</x:Name>
                        <x:WorksheetOptions>
                            <x:DisplayGridlines/>
                        </x:WorksheetOptions>
                    </x:ExcelWorksheet>
                </x:ExcelWorksheets>
            </x:ExcelWorkbook>
        </xml>
    </head>
    <body>
        <table>
            <tr><td>Colum1</td><td>Column2</td></tr>
            <tr><td>Val1</td><td>Val2</td></tr>
            <tr><td>Авиабилет</td><td>Tλληνικ</td></tr>
        </table>
    </body>
</html>

这段代码甚至允许指定工作表的名称(这里是“MySuperSheet”)

Excel 2007的结果:

我过去也遇到过同样的问题(如何生成Excel可以读取的文件,以及其他工具也可以读取的文件)。我使用的是TSV而不是CSV,但同样的编码问题出现了。

我没能找到任何方法让Excel自动识别UTF-8,我也不愿意/不能给文件的使用者复杂的如何打开它们的指令。所以我将它们编码为UTF-16le(带有BOM)而不是UTF-8。大小是原来的两倍,但Excel可以识别编码。而且它们的压缩性很好,所以尺寸很少(但遗憾的是并非永远)重要。

这是一个老问题,但我刚刚遇到过类似的问题,解决方案可能会帮助其他人:

同样的问题是,将CSV文本数据写入文件,然后在Excel中打开生成的. CSV,将所有文本转移到单个列中。在阅读了上面的答案后,我尝试了下面的答案,这似乎可以解决问题。

在创建StreamWriter时应用UTF-8编码。就是这样。

例子:

using (StreamWriter output = new StreamWriter(outputFileName, false, Encoding.UTF8, 2 << 22)) {
   /* ... do stuff .... */
   output.Close();
}