我正在开发一个多语言软件。就应用程序代码而言,本地化不是问题。我们可以使用特定于语言的资源,并拥有各种与之配合良好的工具。

但是,定义多语言数据库模式的最佳方法是什么?假设我们有很多表(100个或更多),每个表可以有多个可以本地化的列(大多数nvarchar列应该是可本地化的)。例如,其中一个表可能保存产品信息:

CREATE TABLE T_PRODUCT (
  NAME        NVARCHAR(50),
  DESCRIPTION NTEXT,
  PRICE       NUMBER(18, 2)
)

我可以想到三种方法来支持NAME和DESCRIPTION列中的多语言文本:

Separate column for each language When we add a new language to the system, we must create additional columns to store the translated text, like this: CREATE TABLE T_PRODUCT ( NAME_EN NVARCHAR(50), NAME_DE NVARCHAR(50), NAME_SP NVARCHAR(50), DESCRIPTION_EN NTEXT, DESCRIPTION_DE NTEXT, DESCRIPTION_SP NTEXT, PRICE NUMBER(18,2) ) Translation table with columns for each language Instead of storing translated text, only a foreign key to the translations table is stored. The translations table contains a column for each language. CREATE TABLE T_PRODUCT ( NAME_FK int, DESCRIPTION_FK int, PRICE NUMBER(18, 2) ) CREATE TABLE T_TRANSLATION ( TRANSLATION_ID, TEXT_EN NTEXT, TEXT_DE NTEXT, TEXT_SP NTEXT ) Translation tables with rows for each language Instead of storing translated text, only a foreign key to the translations table is stored. The translations table contains only a key, and a separate table contains a row for each translation to a language. CREATE TABLE T_PRODUCT ( NAME_FK int, DESCRIPTION_FK int, PRICE NUMBER(18, 2) ) CREATE TABLE T_TRANSLATION ( TRANSLATION_ID ) CREATE TABLE T_TRANSLATION_ENTRY ( TRANSLATION_FK, LANGUAGE_FK, TRANSLATED_TEXT NTEXT ) CREATE TABLE T_TRANSLATION_LANGUAGE ( LANGUAGE_ID, LANGUAGE_CODE CHAR(2) )

每种解决方案都有优点和缺点,我想知道您使用这些方法的经验,您有什么建议,以及您将如何设计多语言数据库模式。


当前回答

您需要记住,在创建多语言数据库时,您正在从产品表中删除名称或描述等字段,并将其移动到已翻译的资源中。

翻译后的资源可以是另一个表,就像我的例子一样,它被设计成与SQL视图一起工作,以简化查询和友好地开发底层应用程序

我分离了LabelTranslations,因为这是一个网页上字段标签的全局翻译表。你可以随意称呼它,它们是无状态的,不依赖于特定的产品或类别。

ProductTranslations的CategoryTranslations是有状态的,这意味着“name”的描述符将是实际的产品名称。

与简单的SQL视图相比,使用物化视图可以获得更好的性能(使用存储空间的成本和更多的精力用于底层应用程序开发来刷新它们),或者如果你想要使用更重的SQL视图。

在Postgres中创建类别物化视图:

CREATE MATERIALIZED VIEW VCategories AS (
    SELECT cat.id, lng.iso_639_1_code, ct.descriptor, ct.value
    FROM Categories cat
    JOIN CategoryTranslations ct ON ct.category_id = cat.id
    JOIN Languages lng ON lng.id = ct.language_id
);

查询ID为120的类别的所有翻译

SELECT * FROM VCategories WHERE id = 120 AND iso_639_1_code = 'en'

我觉得在使用应用程序的代码时很方便,您可以编写非常简单的代码来查询翻译和搜索记录

其他回答

这是一个有趣的问题,让我们来看看死灵传说。

让我们从方法一的问题开始: 问题:你为了节省速度而去规范化。 在SQL中(除了带有hstore的PostGreSQL),你不能传递一个参数语言,然后说:

SELECT ['DESCRIPTION_' + @in_language]  FROM T_Products

所以你必须这样做:

SELECT 
    Product_UID 
    ,
    CASE @in_language 
        WHEN 'DE' THEN DESCRIPTION_DE 
        WHEN 'SP' THEN DESCRIPTION_SP 
        ELSE DESCRIPTION_EN 
    END AS Text 
FROM T_Products 

这意味着如果你添加了一种新的语言,你必须改变所有的查询。 这自然会导致使用“动态SQL”,因此您不必更改所有查询。

这通常会导致类似这样的结果(顺便说一下,它不能用于视图或表值函数,如果您实际上需要过滤报告日期,这确实是一个问题)

CREATE PROCEDURE [dbo].[sp_RPT_DATA_BadExample]
     @in_mandant varchar(3) 
    ,@in_language varchar(2) 
    ,@in_building varchar(36) 
    ,@in_wing varchar(36) 
    ,@in_reportingdate varchar(50) 
AS
BEGIN
    DECLARE @sql varchar(MAX), @reportingdate datetime
    
    -- Abrunden des Eingabedatums auf 00:00:00 Uhr
    SET @reportingdate = CONVERT( datetime, @in_reportingdate) 
    SET @reportingdate = CAST(FLOOR(CAST(@reportingdate AS float)) AS datetime)
    SET @in_reportingdate = CONVERT(varchar(50), @reportingdate) 
    
    SET NOCOUNT ON;


    SET @sql='SELECT 
         Building_Nr AS RPT_Building_Number 
        ,Building_Name AS RPT_Building_Name 
        ,FloorType_Lang_' + @in_language + ' AS RPT_FloorType 
        ,Wing_No AS RPT_Wing_Number 
        ,Wing_Name AS RPT_Wing_Name 
        ,Room_No AS RPT_Room_Number 
        ,Room_Name AS RPT_Room_Name 
    FROM V_Whatever 
    WHERE SO_MDT_ID = ''' + @in_mandant + ''' 
    
    AND 
    ( 
        ''' + @in_reportingdate + ''' BETWEEN CAST(FLOOR(CAST(Room_DateFrom AS float)) AS datetime) AND Room_DateTo 
        OR Room_DateFrom IS NULL 
        OR Room_DateTo IS NULL 
    ) 
    '
    
    IF @in_building    <> '00000000-0000-0000-0000-000000000000' SET @sql=@sql + 'AND (Building_UID  = ''' + @in_building + ''') '
    IF @in_wing    <> '00000000-0000-0000-0000-000000000000' SET @sql=@sql + 'AND (Wing_UID  = ''' + @in_wing + ''') '
    
    EXECUTE (@sql) 
    
END


GO

The problem with this is (apart from SQL-injection) a) Date-formatting is very language-specific, so you get a problem there, if you don't input in ISO format (which the average garden-variety programmer usually doesn't do, and in case of a report the user sure as hell won't do for you, even if explicitly instructed to do so). and b) most significantly, you loose any kind of syntax checking. If <insert name of your "favourite" person here> alters the schema because suddenly the requirements for wing change, and a a new table is created, the old one left but the reference field renamed, you don't get any kind of warning. A report even works when you run it without selecting the wing parameter (==> guid.empty). But suddenly, when an actual user actually selects a wing ==> boom. This method completely breaks any kind of testing.


Method 2: In a nutshell: "Great" idea (warning - sarcasm), let's combine the disadvantages of method 3 (slow speed when many entries) with the rather horrible disadvantages of method 1. The only advantage of this method is that you keep all translation in one table, and therefore make maintenance simple. However, the same thing can be achieved with method 1 and a dynamic SQL stored procedure, and a (possibly temporary) table containing the translations, and the name of the target table (and is quite simple assuming you named all your text-fields the same).


Method 3: One table for all translations: Disadvantage: You have to store n Foreign Keys in the products table for n fields you want to translate. Therefore, you have to do n joins for n fields. When the translation table is global, it has many entries, and joins become slow. Also, you always have to join the T_TRANSLATION table n times for n fields. This is quite an overhead. Now, what do you do when you must accommodate custom translations per customer ? You'll have to add another 2x n joins onto an additional table. If you have to join , say 10 tables, with 2x2xn = 4n additional joins, what a mess ! Also, this design makes it possible to use the same translation with 2 tables. If I change the item name in one table, do I really want to change an entry in another table as well EVERY SINGLE TIME ?

另外,你不能再删除和重新插入表,因为现在在产品表中有外键…当然,您可以省略设置fk,然后<insert name of your“favourite”person here>可以删除表,并使用newid()[或通过在插入中指定id,但有identity-insert OFF]重新插入所有条目,这将(并将)导致数据垃圾(和空引用异常)很快。


Method 4 (not listed): Storing all the languages in a XML field in the database. e.g
-- CREATE TABLE MyTable(myfilename nvarchar(100) NULL, filemeta xml NULL )


;WITH CTE AS 
(
      -- INSERT INTO MyTable(myfilename, filemeta) 
      SELECT 
             'test.mp3' AS myfilename 
            --,CONVERT(XML, N'<?xml version="1.0" encoding="utf-16" standalone="yes"?><body>Hello</body>', 2) 
            --,CONVERT(XML, N'<?xml version="1.0" encoding="utf-16" standalone="yes"?><body><de>Hello</de></body>', 2) 
            ,CONVERT(XML
            , N'<?xml version="1.0" encoding="utf-16" standalone="yes"?>
<lang>
      <de>Deutsch</de>
      <fr>Français</fr>
      <it>Ital&amp;iano</it>
      <en>English</en>
</lang>
            ' 
            , 2 
            ) AS filemeta 
) 

SELECT 
       myfilename
      ,filemeta
      --,filemeta.value('body', 'nvarchar') 
      --, filemeta.value('.', 'nvarchar(MAX)') 

      ,filemeta.value('(/lang//de/node())[1]', 'nvarchar(MAX)') AS DE
      ,filemeta.value('(/lang//fr/node())[1]', 'nvarchar(MAX)') AS FR
      ,filemeta.value('(/lang//it/node())[1]', 'nvarchar(MAX)') AS IT
      ,filemeta.value('(/lang//en/node())[1]', 'nvarchar(MAX)') AS EN
FROM CTE 

然后,您可以通过SQL中的XPath-Query获取值,并将字符串变量放入其中

filemeta.value('(/lang//' + @in_language + '/node())[1]', 'nvarchar(MAX)') AS bla

你可以像这样更新这个值:

UPDATE YOUR_TABLE
SET YOUR_XML_FIELD_NAME.modify('replace value of (/lang/de/text())[1] with "&quot;I am a ''value &quot;"')
WHERE id = 1 

/lang/de/…“…/' + @in_language + '/…'

有点像PostGre hstore,除了由于解析XML的开销(而不是从PG hstore中的关联数组中读取条目),它变得非常缓慢,加上XML编码使它太痛苦而没有用。


Method 5 (as recommended by SunWuKung, the one you should choose): One translation table for each "Product" table. That means one row per language, and several "text" fields, so it requires only ONE (left) join on N fields. Then you can easily add a default-field in the "Product"-table, you can easily delete and re-insert the translation table, and you can create a second table for custom-translations (on demand), which you can also delete and re-insert), and you still have all the foreign keys.

让我们举个例子来看看这是有效的:

首先,创建表:

CREATE TABLE dbo.T_Languages
(
     Lang_ID int NOT NULL
    ,Lang_NativeName national character varying(200) NULL
    ,Lang_EnglishName national character varying(200) NULL
    ,Lang_ISO_TwoLetterName character varying(10) NULL
    ,CONSTRAINT PK_T_Languages PRIMARY KEY ( Lang_ID )
);

GO




CREATE TABLE dbo.T_Products
(
     PROD_Id int NOT NULL
    ,PROD_InternalName national character varying(255) NULL
    ,CONSTRAINT PK_T_Products PRIMARY KEY ( PROD_Id )
); 

GO



CREATE TABLE dbo.T_Products_i18n
(
     PROD_i18n_PROD_Id int NOT NULL
    ,PROD_i18n_Lang_Id int NOT NULL
    ,PROD_i18n_Text national character varying(200) NULL
    ,CONSTRAINT PK_T_Products_i18n PRIMARY KEY (PROD_i18n_PROD_Id, PROD_i18n_Lang_Id)
);

GO

-- ALTER TABLE dbo.T_Products_i18n  WITH NOCHECK ADD  CONSTRAINT FK_T_Products_i18n_T_Products FOREIGN KEY(PROD_i18n_PROD_Id)
ALTER TABLE dbo.T_Products_i18n  
    ADD CONSTRAINT FK_T_Products_i18n_T_Products 
    FOREIGN KEY(PROD_i18n_PROD_Id)
    REFERENCES dbo.T_Products (PROD_Id)
ON DELETE CASCADE 
GO

ALTER TABLE dbo.T_Products_i18n CHECK CONSTRAINT FK_T_Products_i18n_T_Products
GO

ALTER TABLE dbo.T_Products_i18n 
    ADD  CONSTRAINT FK_T_Products_i18n_T_Languages 
    FOREIGN KEY( PROD_i18n_Lang_Id )
    REFERENCES dbo.T_Languages( Lang_ID )
ON DELETE CASCADE 
GO

ALTER TABLE dbo.T_Products_i18n CHECK CONSTRAINT FK_T_Products_i18n_T_Products
GO


    
CREATE TABLE dbo.T_Products_i18n_Cust
(
     PROD_i18n_Cust_PROD_Id int NOT NULL
    ,PROD_i18n_Cust_Lang_Id int NOT NULL
    ,PROD_i18n_Cust_Text national character varying(200) NULL
    ,CONSTRAINT PK_T_Products_i18n_Cust PRIMARY KEY ( PROD_i18n_Cust_PROD_Id, PROD_i18n_Cust_Lang_Id )
);

GO

ALTER TABLE dbo.T_Products_i18n_Cust  
    ADD CONSTRAINT FK_T_Products_i18n_Cust_T_Languages 
    FOREIGN KEY(PROD_i18n_Cust_Lang_Id)
    REFERENCES dbo.T_Languages (Lang_ID)

ALTER TABLE dbo.T_Products_i18n_Cust CHECK CONSTRAINT FK_T_Products_i18n_Cust_T_Languages

GO



ALTER TABLE dbo.T_Products_i18n_Cust  
    ADD CONSTRAINT FK_T_Products_i18n_Cust_T_Products 
    FOREIGN KEY(PROD_i18n_Cust_PROD_Id)
REFERENCES dbo.T_Products (PROD_Id)
GO

ALTER TABLE dbo.T_Products_i18n_Cust CHECK CONSTRAINT FK_T_Products_i18n_Cust_T_Products
GO

然后填写数据

DELETE FROM T_Languages;
INSERT INTO T_Languages (Lang_ID, Lang_NativeName, Lang_EnglishName, Lang_ISO_TwoLetterName) VALUES (1, N'English', N'English', N'EN');
INSERT INTO T_Languages (Lang_ID, Lang_NativeName, Lang_EnglishName, Lang_ISO_TwoLetterName) VALUES (2, N'Deutsch', N'German', N'DE');
INSERT INTO T_Languages (Lang_ID, Lang_NativeName, Lang_EnglishName, Lang_ISO_TwoLetterName) VALUES (3, N'Français', N'French', N'FR');
INSERT INTO T_Languages (Lang_ID, Lang_NativeName, Lang_EnglishName, Lang_ISO_TwoLetterName) VALUES (4, N'Italiano', N'Italian', N'IT');
INSERT INTO T_Languages (Lang_ID, Lang_NativeName, Lang_EnglishName, Lang_ISO_TwoLetterName) VALUES (5, N'Russki', N'Russian', N'RU');
INSERT INTO T_Languages (Lang_ID, Lang_NativeName, Lang_EnglishName, Lang_ISO_TwoLetterName) VALUES (6, N'Zhungwen', N'Chinese', N'ZH');

DELETE FROM T_Products;
INSERT INTO T_Products (PROD_Id, PROD_InternalName) VALUES (1, N'Orange Juice');
INSERT INTO T_Products (PROD_Id, PROD_InternalName) VALUES (2, N'Apple Juice');
INSERT INTO T_Products (PROD_Id, PROD_InternalName) VALUES (3, N'Banana Juice');
INSERT INTO T_Products (PROD_Id, PROD_InternalName) VALUES (4, N'Tomato Juice');
INSERT INTO T_Products (PROD_Id, PROD_InternalName) VALUES (5, N'Generic Fruit Juice');

DELETE FROM T_Products_i18n;
INSERT INTO T_Products_i18n (PROD_i18n_PROD_Id, PROD_i18n_Lang_Id, PROD_i18n_Text) VALUES (1, 1, N'Orange Juice');
INSERT INTO T_Products_i18n (PROD_i18n_PROD_Id, PROD_i18n_Lang_Id, PROD_i18n_Text) VALUES (1, 2, N'Orangensaft');
INSERT INTO T_Products_i18n (PROD_i18n_PROD_Id, PROD_i18n_Lang_Id, PROD_i18n_Text) VALUES (1, 3, N'Jus d''Orange');
INSERT INTO T_Products_i18n (PROD_i18n_PROD_Id, PROD_i18n_Lang_Id, PROD_i18n_Text) VALUES (1, 4, N'Succo d''arancia');
INSERT INTO T_Products_i18n (PROD_i18n_PROD_Id, PROD_i18n_Lang_Id, PROD_i18n_Text) VALUES (2, 1, N'Apple Juice');
INSERT INTO T_Products_i18n (PROD_i18n_PROD_Id, PROD_i18n_Lang_Id, PROD_i18n_Text) VALUES (2, 2, N'Apfelsaft');

DELETE FROM T_Products_i18n_Cust;
INSERT INTO T_Products_i18n_Cust (PROD_i18n_Cust_PROD_Id, PROD_i18n_Cust_Lang_Id, PROD_i18n_Cust_Text) VALUES (1, 2, N'Orangäsaft'); -- Swiss German, if you wonder

然后查询数据:

DECLARE @__in_lang_id int
SET @__in_lang_id = (
    SELECT Lang_ID
    FROM T_Languages
    WHERE Lang_ISO_TwoLetterName = 'DE'
)

SELECT 
     PROD_Id 
    ,PROD_InternalName -- Default Fallback field (internal name/one language only setup), just in ResultSet for demo-purposes
    ,PROD_i18n_Text  -- Translation text, just in ResultSet for demo-purposes
    ,PROD_i18n_Cust_Text  -- Custom Translations (e.g. per customer) Just in ResultSet for demo-purposes
    ,COALESCE(PROD_i18n_Cust_Text, PROD_i18n_Text, PROD_InternalName) AS DisplayText -- What we actually want to show 
FROM T_Products 

LEFT JOIN T_Products_i18n 
    ON PROD_i18n_PROD_Id = T_Products.PROD_Id 
    AND PROD_i18n_Lang_Id = @__in_lang_id 
    
LEFT JOIN T_Products_i18n_Cust 
    ON PROD_i18n_Cust_PROD_Id = T_Products.PROD_Id
    AND PROD_i18n_Cust_Lang_Id = @__in_lang_id
    

If you're lazy, then you can also use the ISO-TwoLetterName ('DE', 'EN', etc.) as primary-key of the language table, then you don't have to lookup the language id. But if you do so, you maybe want to use the IETF-language tag instead, which is better, because you get de-CH and de-DE, which is really not the same ortography-wise (double s instead of ß everywhere), although it's the same base-language. That just as a tiny little detail that may be important to you, especially considering that en-US and en-GB/en-CA/en-AU or fr-FR/fr-CA has similar issues. Quote: we don't need it, we only do our software in English. Answer: Yes - but which one ??

无论如何,如果您使用整数ID,那么您是灵活的,并且可以在以后的任何时候更改您的方法。 你应该使用这个整数,因为没有什么比一个拙劣的Db设计更烦人、更具破坏性和更麻烦的了。

参见RFC 5646, ISO 639-2,

而且,如果你仍然说“我们”只申请“一种文化”(就像en-US通常)——因此我不需要额外的整数,这将是一个很好的时间和地点提到IANA语言标签,不是吗? 因为它们是这样的:

de-DE-1901
de-DE-1996

and

de-CH-1901
de-CH-1996

(there was an orthography reform in 1996...) Try finding a word in a dictionary if it is misspelled; this becomes very important in applications dealing with legal and public service portals. More importantly, there are regions that are changing from cyrillic to latin alphabets, which may just be more troublesome than the superficial nuisance of some obscure orthography reform, which is why this might be an important consideration too, depending on which country you live in. One way or the other, it's better to have that integer in there, just in case...

编辑: 并通过添加ON DELETE CASCADE后

REFERENCES dbo.T_Products( PROD_Id )

你可以简单地说:DELETE FROM T_Products,并没有得到外键冲突。

至于整理,我会这样做:

A)拥有自己的DAL B)在语言表中保存所需的排序规则名称

你可能想把排序规则放在它们自己的表中,例如:

SELECT * FROM sys.fn_helpcollations() 
WHERE description LIKE '%insensitive%'
AND name LIKE '%german%' 

C)在auth.user.language信息中提供排序规则名称

D)像这样写SQL:

SELECT 
    COALESCE(GRP_Name_i18n_cust, GRP_Name_i18n, GRP_Name) AS GroupName 
FROM T_Groups 

ORDER BY GroupName COLLATE {#COLLATION}

E)然后,你可以在你的DAL中这样做:

cmd.CommandText = cmd.CommandText.Replace("{#COLLATION}", auth.user.language.collation)

这将为您提供完美组合的SQL-Query

SELECT 
    COALESCE(GRP_Name_i18n_cust, GRP_Name_i18n, GRP_Name) AS GroupName 
FROM T_Groups 

ORDER BY GroupName COLLATE German_PhoneBook_CI_AI

第三种选择是最好的,原因如下:

Doesn't require altering the database schema for new languages (and thus limiting code changes) Doesn't require a lot of space for unimplemented languages or translations of a a particular item Provides the most flexibility You don't end up with sparse tables You don't have to worry about null keys and checking that you're displaying an existing translation instead of some null entry. If you change or expand your database to encompass other translatable items/things/etc you can use the same tables and system - this is very uncoupled from the rest of the data.

亚当

我通常会选择这种方法(不是实际的sql),这对应于您的最后一个选项。

table Product
productid INT PK, price DECIMAL, translationid INT FK

table Translation
translationid INT PK

table TranslationItem
translationitemid INT PK, translationid INT FK, text VARCHAR, languagecode CHAR(2)

view ProductView
select * from Product
inner join Translation
inner join TranslationItem
where languagecode='en'

因为将所有可翻译文本放在一个地方可以使维护工作更加容易。有时翻译外包给翻译机构,这样你就可以给他们发送一个大的导出文件,然后很容易地导入回来。

我同意随机发生器。我不明白你为什么需要一个表“翻译”。

我想,这就足够了:

TA_product: ProductID, ProductPrice
TA_Language: LanguageID, Language
TA_Productname: ProductnameID, ProductID, LanguageID, ProductName

您需要记住,在创建多语言数据库时,您正在从产品表中删除名称或描述等字段,并将其移动到已翻译的资源中。

翻译后的资源可以是另一个表,就像我的例子一样,它被设计成与SQL视图一起工作,以简化查询和友好地开发底层应用程序

我分离了LabelTranslations,因为这是一个网页上字段标签的全局翻译表。你可以随意称呼它,它们是无状态的,不依赖于特定的产品或类别。

ProductTranslations的CategoryTranslations是有状态的,这意味着“name”的描述符将是实际的产品名称。

与简单的SQL视图相比,使用物化视图可以获得更好的性能(使用存储空间的成本和更多的精力用于底层应用程序开发来刷新它们),或者如果你想要使用更重的SQL视图。

在Postgres中创建类别物化视图:

CREATE MATERIALIZED VIEW VCategories AS (
    SELECT cat.id, lng.iso_639_1_code, ct.descriptor, ct.value
    FROM Categories cat
    JOIN CategoryTranslations ct ON ct.category_id = cat.id
    JOIN Languages lng ON lng.id = ct.language_id
);

查询ID为120的类别的所有翻译

SELECT * FROM VCategories WHERE id = 120 AND iso_639_1_code = 'en'

我觉得在使用应用程序的代码时很方便,您可以编写非常简单的代码来查询翻译和搜索记录