参数化SQL IN子句

我如何参数化一个包含有可变数量参数的IN子句的查询，就像这样?

SELECT * FROM Tags 
WHERE Name IN ('ruby','rails','scruffy','rubyonrails')
ORDER BY Count DESC

在这个查询中，参数的数量可以是1到5之间的任意值。

我不喜欢使用专门的存储过程(或XML)，但如果有一些特定于SQL Server 2008的优雅方式，我愿意接受。

当前回答

(编辑:如果表值参数不可用) 最好的方法似乎是将大量的IN参数分割为多个固定长度的查询，这样您就有了许多具有固定参数计数的已知SQL语句，并且没有虚值/重复值，也没有对字符串、XML等进行解析。

下面是我用c#写的一些关于这个主题的代码:

public static T[][] SplitSqlValues<T>(IEnumerable<T> values)
{
    var sizes = new int[] { 1000, 500, 250, 125, 63, 32, 16, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 };
    int processed = 0;
    int currSizeIdx = sizes.Length - 1; /* start with last (smallest) */
    var splitLists = new List<T[]>();

    var valuesDistSort = values.Distinct().ToList(); /* remove redundant */
    valuesDistSort.Sort();
    int totalValues = valuesDistSort.Count;

    while (totalValues > sizes[currSizeIdx] && currSizeIdx > 0)
    currSizeIdx--; /* bigger size, by array pos. */

    while (processed < totalValues)
    {
        while (totalValues - processed < sizes[currSizeIdx]) 
            currSizeIdx++; /* smaller size, by array pos. */
        var partList = new T[sizes[currSizeIdx]];
        valuesDistSort.CopyTo(processed, partList, 0, sizes[currSizeIdx]);
        splitLists.Add(partList);
        processed += sizes[currSizeIdx];
    }
    return splitLists.ToArray();
}

(你可能有进一步的想法，省略排序，使用valuesDistSort.Skip(processed). take (size[…])而不是list/array CopyTo)。

当插入参数变量时，您可以创建如下内容:

foreach(int[] partList in splitLists)
{
    /* here: question mark for param variable, use named/numbered params if required */
    string sql = "select * from Items where Id in("
        + string.Join(",", partList.Select(p => "?")) 
        + ")"; /* comma separated ?, one for each partList entry */

    /* create command with sql string, set parameters, execute, merge results */
}

我观察过NHibernate对象关系映射器生成的SQL(当查询数据并从中创建对象时)，它在多个查询下看起来最好。在NHibernate中，可以指定批处理大小;如果需要获取许多对象数据行，它将尝试检索与批处理大小相等的行数

SELECT * FROM MyTable WHERE Id IN (@p1, @p2, @p3, ... , @p[batch-size])

，而不是发送数百或数千

SELECT * FROM MyTable WHERE Id=@id

当剩余的id小于批处理大小，但仍然大于一个时，它会分割成更小的语句，但仍然具有一定的长度。

如果批处理大小为100，查询有118个参数，它将创建3个查询:

一个有100个参数(批量大小)，然后是12个另一个是6，

但没有一个是118或18。通过这种方式，它将可能的SQL语句限制为可能的已知语句，防止太多不同的查询计划，从而填充缓存，并且大部分永远不会被重用。上面的代码做了同样的事情，但是长度为1000、500、250、125、63、32、16、10到1。超过1000个元素的参数列表也会被分割，以防止由于大小限制而导致的数据库错误。

无论如何，最好有一个直接发送参数化SQL的数据库接口，而不需要单独的Prepare语句和句柄来调用。像SQL Server和Oracle这样的数据库通过字符串相等来记住SQL(值会改变，绑定SQL中的参数不会!)并重用查询计划(如果可用的话)。不需要单独的prepare语句，也不需要在代码中维护查询句柄!ADO。NET是这样工作的，但是Java似乎仍然使用prepare/execute by句柄(不确定)。

关于这个主题，我有自己的问题，最初建议用重复的IN子句填充，但后来更喜欢NHibernate样式语句split: 参数化SQL -在/不在与固定数量的参数，查询计划缓存优化?

这个问题仍然很有趣，即使在被问了5年多之后……

EDIT: I noted that IN queries with many values (like 250 or more) still tend to be slow, in the given case, on SQL Server. While I expected the DB to create a kind of temporary table internally and join against it, it seemed like it only repeated the single value SELECT expression n-times. Time was up to about 200ms per query - even worse than joining the original IDs retrieval SELECT against the other, related tables.. Also, there were some 10 to 15 CPU units in SQL Server Profiler, something unusual for repeated execution of the same parameterized queries, suggesting that new query plans were created on repeated calls. Maybe ad-hoc like individual queries are not worse at all. I had to compare these queries to non-split queries with changing sizes for a final conclusion, but for now, it seems like long IN clauses should be avoided anyway.

2014-01-12 00:32:38

其他回答

我有一个不需要UDF的答案，XML 因为IN接受一个选择语句例如:SELECT * FROM Test where Data IN (SELECT Value FROM TABLE)

您实际上只需要一种将字符串转换为表的方法。

这可以通过递归CTE或使用数字表(或Master..spt_value)的查询来完成。

这是CTE的版本。

DECLARE @InputString varchar(8000) = 'ruby,rails,scruffy,rubyonrails'

SELECT @InputString = @InputString + ','

;WITH RecursiveCSV(x,y) 
AS 
(
    SELECT 
        x = SUBSTRING(@InputString,0,CHARINDEX(',',@InputString,0)),
        y = SUBSTRING(@InputString,CHARINDEX(',',@InputString,0)+1,LEN(@InputString))
    UNION ALL
    SELECT 
        x = SUBSTRING(y,0,CHARINDEX(',',y,0)),
        y = SUBSTRING(y,CHARINDEX(',',y,0)+1,LEN(y))
    FROM 
        RecursiveCSV 
    WHERE
        SUBSTRING(y,CHARINDEX(',',y,0)+1,LEN(y)) <> '' OR 
        SUBSTRING(y,0,CHARINDEX(',',y,0)) <> ''
)
SELECT
    * 
FROM 
    Tags
WHERE 
    Name IN (select x FROM RecursiveCSV)
OPTION (MAXRECURSION 32767);

2011-05-13 15:03:27

下面是我用过的一个快速而又复杂的技巧:

SELECT * FROM Tags
WHERE '|ruby|rails|scruffy|rubyonrails|'
LIKE '%|' + Name + '|%'

下面是c#代码:

string[] tags = new string[] { "ruby", "rails", "scruffy", "rubyonrails" };
const string cmdText = "select * from tags where '|' + @tags + '|' like '%|' + Name + '|%'";

using (SqlCommand cmd = new SqlCommand(cmdText)) {
   cmd.Parameters.AddWithValue("@tags", string.Join("|", tags);
}

两个问题:

演出糟透了。像“%……%"查询没有索引。请确保没有任何|、blank或null标记，否则将无法工作

有些人可能认为还有其他更清洁的方法可以做到这一点，所以请继续阅读。

2008-12-03 16:41:17

如果你从。net调用，你可以使用Dapper dot net:

string[] names = new string[] {"ruby","rails","scruffy","rubyonrails"};
var tags = dataContext.Query<Tags>(@"
select * from Tags 
where Name in @names
order by Count desc", new {names});

这里是达普在思考，所以你不用思考。当然，类似的事情也可能发生在LINQ to SQL中:

string[] names = new string[] {"ruby","rails","scruffy","rubyonrails"};
var tags = from tag in dataContext.Tags
           where names.Contains(tag.Name)
           orderby tag.Count descending
           select tag;

2011-06-15 11:04:06

这可能是一种有点讨厌的方法，我用过一次，相当有效。

根据你的目标，它可能会有用。

创建一个只有一列的临时表。将每个查找值插入到该列中。不使用IN，只需使用标准JOIN规则。(灵活性++)

这为您所能做的事情提供了一些额外的灵活性，但它更适合这样的情况:需要查询一个大型表，有良好的索引，并且希望多次使用参数化列表。节省了执行两次，所有的卫生工作都是手动完成的。

我从来没有时间去分析它到底有多快，但在我的情况下，它是需要的。

2008-12-03 17:04:00

I think this is a case when a static query is just not the way to go. Dynamically build the list for your in clause, escape your single quotes, and dynamically build SQL. In this case you probably won't see much of a difference with any method due to the small list, but the most efficient method really is to send the SQL exactly as it is written in your post. I think it is a good habit to write it the most efficient way, rather than to do what makes the prettiest code, or consider it bad practice to dynamically build SQL.

I have seen the split functions take longer to execute than the query themselves in many cases where the parameters get large. A stored procedure with table valued parameters in SQL 2008 is the only other option I would consider, although this will probably be slower in your case. TVP will probably only be faster for large lists if you are searching on the primary key of the TVP, because SQL will build a temporary table for the list anyway (if the list is large). You won't know for sure unless you test it.

I have also seen stored procedures that had 500 parameters with default values of null, and having WHERE Column1 IN (@Param1, @Param2, @Param3, ..., @Param500). This caused SQL to build a temp table, do a sort/distinct, and then do a table scan instead of an index seek. That is essentially what you would be doing by parameterizing that query, although on a small enough scale that it won't make a noticeable difference. I highly recommend against having NULL in your IN lists, as if that gets changed to a NOT IN it will not act as intended. You could dynamically build the parameter list, but the only obvious thing that you would gain is that the objects would escape the single quotes for you. That approach is also slightly slower on the application end since the objects have to parse the query to find the parameters. It may or may not be faster on SQL, as parameterized queries call sp_prepare, sp_execute for as many times you execute the query, followed by sp_unprepare.

重用存储过程或参数化查询的执行计划可能会提高性能，但它会将您锁定在由执行的第一个查询决定的执行计划中。在许多情况下，这对于后续查询可能不太理想。在您的情况下，重用执行计划可能是一个加分项，但它可能根本没有任何区别，因为示例是一个非常简单的查询。

悬崖笔记:

对于您的情况，您所做的任何事情，无论是使用列表中固定数量的项进行参数化(如果不使用则为空)，动态地构建带有或不带有参数的查询，还是使用带有表值参数的存储过程，都不会产生太大的区别。不过，我的一般建议如下:

你的case/简单查询很少参数:

动态SQL，如果测试显示更好的性能，可能会使用参数。

具有可重用执行计划的查询，通过简单地更改参数或如果查询很复杂则调用多次:

带有动态参数的SQL。

带有大列表的查询:

具有表值参数的存储过程。如果列表变化很大，则在存储过程上使用WITH RECOMPILE，或者简单地使用不带参数的动态SQL为每个查询生成新的执行计划。

2010-06-09 20:28:50

参数化SQL IN子句

推荐文章

最新文章

标签