我正在寻求帮助,使用BULK INSERT将.csv文件导入SQL Server,我有一些基本的问题。

问题:

CSV文件的数据可能有,(逗号)之间(Ex: description),那么我如何使导入处理这些数据? 如果客户端从Excel中创建CSV,那么有逗号的数据被括在“”(双引号)[如下例],那么导入如何处理这一点? 我们如何跟踪某些行是否有坏数据,哪些导入被跳过?(导入会跳过不可导入的行)

下面是带标题的CSV示例:

Name,Class,Subject,ExamDate,Mark,Description
Prabhat,4,Math,2/10/2013,25,Test data for prabhat.
Murari,5,Science,2/11/2013,24,"Test data for his's test, where we can test 2nd ROW, Test."
sanjay,4,Science,,25,Test Only.

和SQL语句导入:

BULK INSERT SchoolsTemp
FROM 'C:\CSVData\Schools.csv'
WITH
(
    FIRSTROW = 2,
    FIELDTERMINATOR = ',',  --CSV field delimiter
    ROWTERMINATOR = '\n',   --Use to shift the control to next row
    TABLOCK
)

当前回答

All of the answers here work great if your data is "clean" (no data constraint violations, etc.) and you have access to putting the file on the server. Some of the answers provided here stop at the first error (PK violation, data-loss error, etc.) and give you one error at a time if using SSMS's built in Import Task. If you want to gather all errors at once (in case you want to tell the person that gave you the .csv file to clean up their data), I recommend the following as an answer. This answer also gives you complete flexibility as you are "writing" the SQL yourself.

注意:我将假设您运行的是Windows操作系统,并且能够访问Excel和SSMS。如果没有,我相信你可以调整这个答案来满足你的需求。

Using Excel, open your .csv file. In an empty column you will write a formula that will build individual INSERTstatements like =CONCATENATE("INSERT INTO dbo.MyTable (FirstName, LastName) VALUES ('", A1, "', '", B1,"')", CHAR(10), "GO") where A1 is a cell that has the first name data and A2 has the last name data for example. CHAR(10) adds a newline character to the final result and GO will allow us to run this INSERT and continue to the next even if there are any errors. Highlight the cell with your =CONCATENATION() formula Shift + End to highlight the same column in the rest of your rows In the ribbon > Home > Editing > Fill > Click Down This applies the formula all the way down the sheet so you don't have to copy-paste, drag, etc. down potentially thousands of rows by hand Ctrl + C to copy the formulated SQL INSERT statements Paste into SSMS You will notice Excel, probably unexpectedly, added double quotes around each of your INSERT and GO commands. This is a "feature" (?) of copying multi-line values out of Excel. You can simply find and replace "INSERT and GO" with INSERT and GO respectively to clean that up. Finally you are ready to run your import process After the process completes, check the Messages window for any errors. You can select all the content (Ctrl + A) and copy into Excel and use a column filter to remove any successful messages and you are left with any and all the errors.

这个过程肯定会比这里的其他答案花费更长的时间,但是如果您的数据是“脏的”并且充满SQL违规,您至少可以一次性收集所有错误并将它们发送给提供数据的人(如果您的场景是这样的话)。

其他回答

All of the answers here work great if your data is "clean" (no data constraint violations, etc.) and you have access to putting the file on the server. Some of the answers provided here stop at the first error (PK violation, data-loss error, etc.) and give you one error at a time if using SSMS's built in Import Task. If you want to gather all errors at once (in case you want to tell the person that gave you the .csv file to clean up their data), I recommend the following as an answer. This answer also gives you complete flexibility as you are "writing" the SQL yourself.

注意:我将假设您运行的是Windows操作系统,并且能够访问Excel和SSMS。如果没有,我相信你可以调整这个答案来满足你的需求。

Using Excel, open your .csv file. In an empty column you will write a formula that will build individual INSERTstatements like =CONCATENATE("INSERT INTO dbo.MyTable (FirstName, LastName) VALUES ('", A1, "', '", B1,"')", CHAR(10), "GO") where A1 is a cell that has the first name data and A2 has the last name data for example. CHAR(10) adds a newline character to the final result and GO will allow us to run this INSERT and continue to the next even if there are any errors. Highlight the cell with your =CONCATENATION() formula Shift + End to highlight the same column in the rest of your rows In the ribbon > Home > Editing > Fill > Click Down This applies the formula all the way down the sheet so you don't have to copy-paste, drag, etc. down potentially thousands of rows by hand Ctrl + C to copy the formulated SQL INSERT statements Paste into SSMS You will notice Excel, probably unexpectedly, added double quotes around each of your INSERT and GO commands. This is a "feature" (?) of copying multi-line values out of Excel. You can simply find and replace "INSERT and GO" with INSERT and GO respectively to clean that up. Finally you are ready to run your import process After the process completes, check the Messages window for any errors. You can select all the content (Ctrl + A) and copy into Excel and use a column filter to remove any successful messages and you are left with any and all the errors.

这个过程肯定会比这里的其他答案花费更长的时间,但是如果您的数据是“脏的”并且充满SQL违规,您至少可以一次性收集所有错误并将它们发送给提供数据的人(如果您的场景是这样的话)。

从2013-11-05如何使用SQL Server Management Studio将CSV文件导入到数据库中:

First create a table in your database into which you will be importing the CSV file. After the table is created: Log into your database using SQL Server Management Studio Right click on your database and select Tasks -> Import Data... Click the Next > button For the Data Source, select Flat File Source. Then use the Browse button to select the CSV file. Spend some time configuring how you want the data to be imported before clicking on the Next > button. For the Destination, select the correct database provider (e.g. for SQL Server 2012, you can use SQL Server Native Client 11.0). Enter the Server name; Check Use SQL Server Authentication, enter the User name, Password, and Database before clicking on the Next > button. On the Select Source Tables and Views window, you can Edit Mappings before clicking on the Next > button. Check the Run immediately check box and click on the Next > button. Click on the Finish button to run the package.

2)如果客户端从excel中创建了csv,那么有数据 逗号被括在"…(双引号)[如下所示 那么导入如何处理这个问题呢?

你应该使用FORMAT = 'CSV', FIELDQUOTE = '"'选项:

BULK INSERT SchoolsTemp
FROM 'C:\CSVData\Schools.csv'
WITH
(
    FORMAT = 'CSV', 
    FIELDQUOTE = '"',
    FIRSTROW = 2,
    FIELDTERMINATOR = ',',  --CSV field delimiter
    ROWTERMINATOR = '\n',   --Use to shift the control to next row
    TABLOCK
)

基于SQL Server的CSV导入

1) CSV文件数据之间可能有,(逗号)(例如: 描述),那么我如何才能使导入处理这些数据?

解决方案

如果使用,(逗号)作为分隔符,则无法区分作为字段结束符的逗号和数据中的逗号。我会使用不同的FIELDTERMINATOR,如||。代码看起来就像。这将完美地处理逗号和单斜杠。

2)如果客户端从excel中创建了csv,那么有数据 逗号被括在"…(双引号)[如下所示 那么导入如何处理这个问题呢?

解决方案

如果你使用BULK插入,那么没有办法处理双引号,数据将 用双引号插入行。 在将数据插入表后,您可以将这些双引号替换为"。

update table
set columnhavingdoublequotes = replace(columnhavingdoublequotes,'"','')

3)我们如何跟踪某些行是否有坏数据,哪些导入被跳过? 导入会跳过不可导入的行吗?

解决方案

处理由于无效数据或格式而未加载到表中的行,可以是 使用ERRORFILE属性,指定错误文件名,它将写入行 有错误到错误文件。代码应该像这样。

BULK INSERT SchoolsTemp
    FROM 'C:\CSVData\Schools.csv'
    WITH
    (
    FIRSTROW = 2,
    FIELDTERMINATOR = ',',  --CSV field delimiter
    ROWTERMINATOR = '\n',   --Use to shift the control to next row
    ERRORFILE = 'C:\CSVDATA\SchoolsErrorRows.csv',
    TABLOCK
    )

也许不是你想要的,但另一种选择是为notepad++使用CSV Lint插件

该插件可以事先验证csv数据,这意味着检查坏数据,如缺少引号、不正确的十进制分隔符、日期时间格式错误等。而不是BULK INSERT,它可以将csv文件转换为SQL插入脚本。

SQL脚本将为每1000条记录的csv行包含INSERT语句,并调整任何datetime和十进制值。该插件自动检测csv中的数据类型,它将包括一个CREATE TABLE部分,其中包含每个列的正确数据类型。