我正在我的学校使用SQL Server 2005为一个小型web应用程序开发数据库。 我在varchar vs nvarchar的问题上看到了几个学派的思想:
使用varchar,除非你要处理大量国际化的数据,否则就使用nvarchar。 只要用nvarchar就可以了。
我开始看到观点二的优点了。我知道nvarchar占用了两倍的空间,但这并不一定是一个大问题,因为它只存储几百个学生的数据。对我来说,不担心它,允许所有东西都使用nvarchar似乎是最简单的方法。还是我遗漏了什么?
我正在我的学校使用SQL Server 2005为一个小型web应用程序开发数据库。 我在varchar vs nvarchar的问题上看到了几个学派的思想:
使用varchar,除非你要处理大量国际化的数据,否则就使用nvarchar。 只要用nvarchar就可以了。
我开始看到观点二的优点了。我知道nvarchar占用了两倍的空间,但这并不一定是一个大问题,因为它只存储几百个学生的数据。对我来说,不担心它,允许所有东西都使用nvarchar似乎是最简单的方法。还是我遗漏了什么?
当前回答
磁盘空间不是问题…但是记忆和性能会。 双倍的页面阅读量,双倍的索引大小,奇怪的LIKE和=恒定的行为等等
你需要存储中文等脚本吗?是或不是…
来自MS BOL的《Unicode的存储和性能影响》
编辑:
最近的SO问题强调了nvarchar性能有多差…
SQL Server在搜索nvarchar字符串时使用高CPU
其他回答
我不愿意在这里再补充一个答案,因为已经有很多了,但有几个问题需要说明,这些问题要么没有说明,要么没有讲清楚。
首先:不要总是使用NVARCHAR。这是一种非常危险,而且通常代价高昂的态度/方法。说“永远不要使用游标”也不好,因为它们有时是解决特定问题的最有效的方法,而执行WHILE循环的常见变通方法几乎总是比正确执行游标要慢。
The only time you should use the term "always" is when advising to "always do what is best for the situation". Granted that is often difficult to determine, especially when trying to balance short-term gains in development time (manager: "we need this feature -- that you didn't know about until just now -- a week ago!") with long-term maintenance costs (manager who initially pressured team to complete a 3-month project in a 3-week sprint: "why are we having these performance problems? How could we have possibly done X which has no flexibility? We can't afford a sprint or two to fix this. What can we get done in a week so we can get back to our priority items? And we definitely need to spend more time in design so this doesn't keep happening!").
其次:@gbn的回答涉及到一些非常重要的问题,当路径不是100%清晰时,在做某些数据建模决策时需要考虑。但还有更多需要考虑的问题:
size of transaction log files time it takes to replicate (if using replication) time it takes to ETL (if ETLing) time it takes to ship logs to a remote system and restore (if using Log Shipping) size of backups length of time it takes to complete the backup length of time it takes to do a restore (this might be important some day ;-) size needed for tempdb performance of triggers (for inserted and deleted tables that are stored in tempdb) performance of row versioning (if using SNAPSHOT ISOLATION, since the version store is in tempdb) ability to get new disk space when the CFO says that they just spent $1 million on a SAN last year and so they will not authorize another $250k for additional storage length of time it takes to do INSERT and UPDATE operations length of time it takes to do index maintenance etc, etc, etc.
浪费空间会对整个系统产生巨大的级联效应。我写过一篇文章,详细介绍了这个主题:磁盘很便宜!奥利吗?(免费登记;对不起,我不掌握这项政策)。
第三:虽然有些答案错误地集中在“这是一个小应用程序”方面,有些答案正确地建议“使用合适的”,但没有一个答案为O.P.提供了真正的指导。问题中提到的一个重要细节是,这是他们学校的网页。太棒了!所以我们可以提出:
Fields for Student and/or Faculty names should probably be NVARCHAR since, over time, it is only getting more likely that names from other cultures will be showing up in those places. But for street address and city names? The purpose of the app was not stated (it would have been helpful) but assuming the address records, if any, pertain to just to a particular geographical region (i.e. a single language / culture), then use VARCHAR with the appropriate Code Page (which is determined from the Collation of the field). If storing State and/or Country ISO codes (no need to store INT / TINYINT since ISO codes are fixed length, human readable, and well, standard :) use CHAR(2) for two letter codes and CHAR(3) if using 3 letter codes. And consider using a binary Collation such as Latin1_General_100_BIN2. If storing postal codes (i.e. zip codes), use VARCHAR since it is an international standard to never use any letter outside of A-Z. And yes, still use VARCHAR even if only storing US zip codes and not INT since zip codes are not numbers, they are strings, and some of them have a leading "0". And consider using a binary Collation such as Latin1_General_100_BIN2. If storing email addresses and/or URLs, use NVARCHAR since both of those can now contain Unicode characters. and so on....
第四:现在你的NVARCHAR数据占用的空间是适合VARCHAR的数据所需的两倍("fits nicely" =不会变成"?"),不知怎么的,就像变魔术一样,应用程序确实增长了,现在至少有一个字段中有数百万条记录,其中大多数行是标准ASCII,但有些包含Unicode字符,所以你必须保留NVARCHAR,考虑以下问题:
If you are using SQL Server 2008 - 2016 RTM and are on Enterprise Edition, OR if using SQL Server 2016 SP1 (which made Data Compression available in all editions) or newer, then you can enable Data Compression. Data Compression can (but won't "always") compress Unicode data in NCHAR and NVARCHAR fields. The determining factors are: NCHAR(1 - 4000) and NVARCHAR(1 - 4000) use the Standard Compression Scheme for Unicode, but only starting in SQL Server 2008 R2, AND only for IN ROW data, not OVERFLOW! This appears to be better than the regular ROW / PAGE compression algorithm. NVARCHAR(MAX) and XML (and I guess also VARBINARY(MAX), TEXT, and NTEXT) data that is IN ROW (not off row in LOB or OVERFLOW pages) can at least be PAGE compressed, but not ROW compressed. Of course, PAGE compression depends on size of the in-row value: I tested with VARCHAR(MAX) and saw that 6000 character/byte rows would not compress, but 4000 character/byte rows did. Any OFF ROW data, LOB or OVERLOW = No Compression For You! If using SQL Server 2005, or 2008 - 2016 RTM and not on Enterprise Edition, you can have two fields: one VARCHAR and one NVARCHAR. For example, let's say you are storing URLs which are mostly all base ASCII characters (values 0 - 127) and hence fit into VARCHAR, but sometimes have Unicode characters. Your schema can include the following 3 fields: ... URLa VARCHAR(2048) NULL, URLu NVARCHAR(2048) NULL, URL AS (ISNULL(CONVERT(NVARCHAR([URLa])), [URLu])), CONSTRAINT [CK_TableName_OneUrlMax] CHECK ( ([URLa] IS NOT NULL OR [URLu] IS NOT NULL) AND ([URLa] IS NULL OR [URLu] IS NULL)) ); In this model you only SELECT from the [URL] computed column. For inserting and updating, you determine which field to use by seeing if converting alters the incoming value, which has to be of NVARCHAR type: INSERT INTO TableName (..., URLa, URLu) VALUES (..., IIF (CONVERT(VARCHAR(2048), @URL) = @URL, @URL, NULL), IIF (CONVERT(VARCHAR(2048), @URL) <> @URL, NULL, @URL) ); You can GZIP incoming values into VARBINARY(MAX) and then unzip on the way out: For SQL Server 2005 - 2014: you can use SQLCLR. SQL# (a SQLCLR library that I wrote) comes with Util_GZip and Util_GUnzip in the Free version For SQL Server 2016 and newer: you can use the built-in COMPRESS and DECOMPRESS functions, which are also GZip. If using SQL Server 2017 or newer, you can look into making the table a Clustered Columnstore Index. While this is not a viable option yet, SQL Server 2019 introduces native support for UTF-8 in VARCHAR / CHAR datatypes. There are currently too many bugs with it for it to be used, but if they are fixed, then this is an option for some scenarios. Please see my post, "Native UTF-8 Support in SQL Server 2019: Savior or False Prophet?", for a detailed analysis of this new feature.
由于您的应用程序很小,使用nvarchar与使用varchar相比,基本上没有明显的成本增加,而且如果您需要存储unicode数据,您也省去了潜在的麻烦。
是一致的!加入一个VARCHAR到NVARCHAR有一个很大的性能打击。
磁盘空间不是问题…但是记忆和性能会。 双倍的页面阅读量,双倍的索引大小,奇怪的LIKE和=恒定的行为等等
你需要存储中文等脚本吗?是或不是…
来自MS BOL的《Unicode的存储和性能影响》
编辑:
最近的SO问题强调了nvarchar性能有多差…
SQL Server在搜索nvarchar字符串时使用高CPU
总是使用nvarchar。
对于大多数应用程序,您可能永远都不需要双字节字符。但是,如果您需要支持双字节语言,而您的数据库模式中只有单字节支持,那么返回并在整个应用程序中进行修改是非常昂贵的。
将一个应用程序从varchar迁移到nvarchar的成本将远远超过在大多数应用程序中使用的一点点额外磁盘空间。