我正在我的学校使用SQL Server 2005为一个小型web应用程序开发数据库。 我在varchar vs nvarchar的问题上看到了几个学派的思想:

使用varchar,除非你要处理大量国际化的数据,否则就使用nvarchar。 只要用nvarchar就可以了。

我开始看到观点二的优点了。我知道nvarchar占用了两倍的空间,但这并不一定是一个大问题,因为它只存储几百个学生的数据。对我来说,不担心它,允许所有东西都使用nvarchar似乎是最简单的方法。还是我遗漏了什么?


当前回答

为什么在所有这些讨论中,没有提到UTF-8?能够存储完整的unicode字符跨度并不意味着必须总是为每个字符分配两个字节(或使用unicode术语的“码位”)。所有的ASCII都是UTF-8。SQL Server检查VARCHAR()字段,文本是严格的ASCII(即顶部字节位零)?我希望不是。

如果您希望存储unicode并希望与旧的仅使用ascii的应用程序兼容,我认为使用VARCHAR()和UTF-8将是神奇的子弹:它只在需要时使用更多的空间。

对于那些不熟悉UTF-8的人,我可以推荐一个入门。

其他回答

我不愿意在这里再补充一个答案,因为已经有很多了,但有几个问题需要说明,这些问题要么没有说明,要么没有讲清楚。

首先:不要总是使用NVARCHAR。这是一种非常危险,而且通常代价高昂的态度/方法。说“永远不要使用游标”也不好,因为它们有时是解决特定问题的最有效的方法,而执行WHILE循环的常见变通方法几乎总是比正确执行游标要慢。

The only time you should use the term "always" is when advising to "always do what is best for the situation". Granted that is often difficult to determine, especially when trying to balance short-term gains in development time (manager: "we need this feature -- that you didn't know about until just now -- a week ago!") with long-term maintenance costs (manager who initially pressured team to complete a 3-month project in a 3-week sprint: "why are we having these performance problems? How could we have possibly done X which has no flexibility? We can't afford a sprint or two to fix this. What can we get done in a week so we can get back to our priority items? And we definitely need to spend more time in design so this doesn't keep happening!").

其次:@gbn的回答涉及到一些非常重要的问题,当路径不是100%清晰时,在做某些数据建模决策时需要考虑。但还有更多需要考虑的问题:

size of transaction log files time it takes to replicate (if using replication) time it takes to ETL (if ETLing) time it takes to ship logs to a remote system and restore (if using Log Shipping) size of backups length of time it takes to complete the backup length of time it takes to do a restore (this might be important some day ;-) size needed for tempdb performance of triggers (for inserted and deleted tables that are stored in tempdb) performance of row versioning (if using SNAPSHOT ISOLATION, since the version store is in tempdb) ability to get new disk space when the CFO says that they just spent $1 million on a SAN last year and so they will not authorize another $250k for additional storage length of time it takes to do INSERT and UPDATE operations length of time it takes to do index maintenance etc, etc, etc.

浪费空间会对整个系统产生巨大的级联效应。我写过一篇文章,详细介绍了这个主题:磁盘很便宜!奥利吗?(免费登记;对不起,我不掌握这项政策)。

第三:虽然有些答案错误地集中在“这是一个小应用程序”方面,有些答案正确地建议“使用合适的”,但没有一个答案为O.P.提供了真正的指导。问题中提到的一个重要细节是,这是他们学校的网页。太棒了!所以我们可以提出:

Fields for Student and/or Faculty names should probably be NVARCHAR since, over time, it is only getting more likely that names from other cultures will be showing up in those places. But for street address and city names? The purpose of the app was not stated (it would have been helpful) but assuming the address records, if any, pertain to just to a particular geographical region (i.e. a single language / culture), then use VARCHAR with the appropriate Code Page (which is determined from the Collation of the field). If storing State and/or Country ISO codes (no need to store INT / TINYINT since ISO codes are fixed length, human readable, and well, standard :) use CHAR(2) for two letter codes and CHAR(3) if using 3 letter codes. And consider using a binary Collation such as Latin1_General_100_BIN2. If storing postal codes (i.e. zip codes), use VARCHAR since it is an international standard to never use any letter outside of A-Z. And yes, still use VARCHAR even if only storing US zip codes and not INT since zip codes are not numbers, they are strings, and some of them have a leading "0". And consider using a binary Collation such as Latin1_General_100_BIN2. If storing email addresses and/or URLs, use NVARCHAR since both of those can now contain Unicode characters. and so on....

第四:现在你的NVARCHAR数据占用的空间是适合VARCHAR的数据所需的两倍("fits nicely" =不会变成"?"),不知怎么的,就像变魔术一样,应用程序确实增长了,现在至少有一个字段中有数百万条记录,其中大多数行是标准ASCII,但有些包含Unicode字符,所以你必须保留NVARCHAR,考虑以下问题:

If you are using SQL Server 2008 - 2016 RTM and are on Enterprise Edition, OR if using SQL Server 2016 SP1 (which made Data Compression available in all editions) or newer, then you can enable Data Compression. Data Compression can (but won't "always") compress Unicode data in NCHAR and NVARCHAR fields. The determining factors are: NCHAR(1 - 4000) and NVARCHAR(1 - 4000) use the Standard Compression Scheme for Unicode, but only starting in SQL Server 2008 R2, AND only for IN ROW data, not OVERFLOW! This appears to be better than the regular ROW / PAGE compression algorithm. NVARCHAR(MAX) and XML (and I guess also VARBINARY(MAX), TEXT, and NTEXT) data that is IN ROW (not off row in LOB or OVERFLOW pages) can at least be PAGE compressed, but not ROW compressed. Of course, PAGE compression depends on size of the in-row value: I tested with VARCHAR(MAX) and saw that 6000 character/byte rows would not compress, but 4000 character/byte rows did. Any OFF ROW data, LOB or OVERLOW = No Compression For You! If using SQL Server 2005, or 2008 - 2016 RTM and not on Enterprise Edition, you can have two fields: one VARCHAR and one NVARCHAR. For example, let's say you are storing URLs which are mostly all base ASCII characters (values 0 - 127) and hence fit into VARCHAR, but sometimes have Unicode characters. Your schema can include the following 3 fields: ... URLa VARCHAR(2048) NULL, URLu NVARCHAR(2048) NULL, URL AS (ISNULL(CONVERT(NVARCHAR([URLa])), [URLu])), CONSTRAINT [CK_TableName_OneUrlMax] CHECK ( ([URLa] IS NOT NULL OR [URLu] IS NOT NULL) AND ([URLa] IS NULL OR [URLu] IS NULL)) ); In this model you only SELECT from the [URL] computed column. For inserting and updating, you determine which field to use by seeing if converting alters the incoming value, which has to be of NVARCHAR type: INSERT INTO TableName (..., URLa, URLu) VALUES (..., IIF (CONVERT(VARCHAR(2048), @URL) = @URL, @URL, NULL), IIF (CONVERT(VARCHAR(2048), @URL) <> @URL, NULL, @URL) ); You can GZIP incoming values into VARBINARY(MAX) and then unzip on the way out: For SQL Server 2005 - 2014: you can use SQLCLR. SQL# (a SQLCLR library that I wrote) comes with Util_GZip and Util_GUnzip in the Free version For SQL Server 2016 and newer: you can use the built-in COMPRESS and DECOMPRESS functions, which are also GZip. If using SQL Server 2017 or newer, you can look into making the table a Clustered Columnstore Index. While this is not a viable option yet, SQL Server 2019 introduces native support for UTF-8 in VARCHAR / CHAR datatypes. There are currently too many bugs with it for it to be used, but if they are fixed, then this is an option for some scenarios. Please see my post, "Native UTF-8 Support in SQL Server 2019: Savior or False Prophet?", for a detailed analysis of this new feature.

由于您的应用程序很小,使用nvarchar与使用varchar相比,基本上没有明显的成本增加,而且如果您需要存储unicode数据,您也省去了潜在的麻烦。

I can speak from experience on this, beware of nvarchar. Unless you absolutely require it this data field type destroys performance on larger database. I inherited a database that was hurting in terms of performance and space. We were able to reduce a 30GB database in size by 70%! There were some other modifications made to help with performance but I'm sure the varchar's helped out significantly with that as well. If your database has the potential for growing tables to a million + records stay away from nvarchar at all costs.

为什么在所有这些讨论中,没有提到UTF-8?能够存储完整的unicode字符跨度并不意味着必须总是为每个字符分配两个字节(或使用unicode术语的“码位”)。所有的ASCII都是UTF-8。SQL Server检查VARCHAR()字段,文本是严格的ASCII(即顶部字节位零)?我希望不是。

如果您希望存储unicode并希望与旧的仅使用ascii的应用程序兼容,我认为使用VARCHAR()和UTF-8将是神奇的子弹:它只在需要时使用更多的空间。

对于那些不熟悉UTF-8的人,我可以推荐一个入门。

我在工作中经常遇到这样的问题:

库存和定价的FTP提要-当varchar工作正常时,项目描述和其他文本是在nvarchar中。将这些文件转换为varchar可以将文件大小减少近一半,并且对上传非常有帮助。 上面的场景工作得很好,直到有人在商品描述中添加了一个特殊字符(可能是商标,不记得了)

我还是不会每次都用varchar。如果有任何疑问或特殊字符的潜力,我使用nvarchar。我发现,当我100%控制填充字段的内容时,我主要使用varchar。