我们所有使用关系数据库的人都知道(或正在学习)SQL是不同的。获得期望的结果,并有效地这样做,涉及到一个乏味的过程,其部分特征是学习不熟悉的范例,并发现一些我们最熟悉的编程模式在这里不起作用。常见的反模式是什么?
以下是我的前3名。
1号。指定字段列表失败。(编辑:为了防止混淆:这是一个生产代码规则。它不适用于一次性分析脚本——除非我是作者。)
SELECT *
Insert Into blah SELECT *
应该是
SELECT fieldlist
Insert Into blah (fieldlist) SELECT fieldlist
2号。使用游标和while循环,当while循环和循环变量就可以了。
DECLARE @LoopVar int
SET @LoopVar = (SELECT MIN(TheKey) FROM TheTable)
WHILE @LoopVar is not null
BEGIN
-- Do Stuff with current value of @LoopVar
...
--Ok, done, now get the next value
SET @LoopVar = (SELECT MIN(TheKey) FROM TheTable
WHERE @LoopVar < TheKey)
END
3号。DateLogic通过字符串类型。
--Trim the time
Convert(Convert(theDate, varchar(10), 121), datetime)
应该是
--Trim the time
DateAdd(dd, DateDiff(dd, 0, theDate), 0)
我最近看到了一个高峰“一个问题总比两个好,对吧?”
SELECT *
FROM blah
WHERE (blah.Name = @name OR @name is null)
AND (blah.Purpose = @Purpose OR @Purpose is null)
这个查询需要两个或三个不同的执行计划,具体取决于参数的值。对于这个SQL文本,只生成一个执行计划并保存在缓存中。无论参数的值是多少,都将使用该计划。这会导致间歇性的性能不佳。最好编写两个查询(每个预期的执行计划一个查询)。
1)我不知道这是否是一个“官方的”反模式,但我不喜欢并试图避免在数据库列中使用字符串文字作为魔法值。
MediaWiki表'image'中的一个例子:
img_media_type ENUM("UNKNOWN", "BITMAP", "DRAWING", "AUDIO", "VIDEO",
"MULTIMEDIA", "OFFICE", "TEXT", "EXECUTABLE", "ARCHIVE") default NULL,
img_major_mime ENUM("unknown", "application", "audio", "image", "text",
"video", "message", "model", "multipart") NOT NULL default "unknown",
(我只是注意到不同的大小写,另一个要避免的事情)
我设计了这样的情况,int查找表ImageMediaType和ImageMajorMime与int主键。
2)日期/字符串转换,依赖于特定的NLS设置
CONVERT(NVARCHAR, GETDATE())
没有格式标识符
Human readable password fields, egad. Self explanatory. Using LIKE against indexed columns, and I'm almost tempted to just say LIKE in general. Recycling SQL-generated PK values. Surprise nobody mentioned the god-table yet. Nothing says "organic" like 100 columns of bit flags, large strings and integers. Then there's the "I miss .ini files" pattern: storing CSVs, pipe delimited strings or other parse required data in large text fields. And for MS SQL server the use of cursors at all. There's a better way to do any given cursor task.
编辑是因为有太多了!
我最担心的是450列的访问表,这些表是由总经理最好的朋友狗美容师的8岁儿子整理的,还有那个不可靠的查找表,它之所以存在,是因为有人不知道如何正确地规范化数据结构。
通常,这个查找表是这样的:
ID INT, Name NVARCHAR(132), IntValue1 INT, IntValue2 INT, CharValue1 NVARCHAR(255), CharValue2 NVARCHAR(255), Date1 DATETIME, Date2 DATETIME
我已经记不清有多少客户的系统依赖于这种可恶的东西了。
我一直对大多数程序员倾向于在数据访问层混合他们的ui逻辑感到失望:
SELECT
FirstName + ' ' + LastName as "Full Name",
case UserRole
when 2 then "Admin"
when 1 then "Moderator"
else "User"
end as "User's Role",
case SignedIn
when 0 then "Logged in"
else "Logged out"
end as "User signed in?",
Convert(varchar(100), LastSignOn, 101) as "Last Sign On",
DateDiff('d', LastSignOn, getDate()) as "Days since last sign on",
AddrLine1 + ' ' + AddrLine2 + ' ' + AddrLine3 + ' ' +
City + ', ' + State + ' ' + Zip as "Address",
'XXX-XX-' + Substring(
Convert(varchar(9), SSN), 6, 4) as "Social Security #"
FROM Users
通常,程序员这样做是因为他们想要将数据集直接绑定到一个网格上,而且在服务器端使用SQL Server格式比在客户端使用SQL Server格式更方便。
像上面所示的查询是非常脆弱的,因为它们将数据层与UI层紧密耦合在一起。最重要的是,这种编程风格彻底阻止了存储过程的可重用性。
select some_column, ...
from some_table
group by some_column
假设结果将按some_column排序。我在Sybase上看到过这种情况,其中假设成立(目前)。
使用SQL作为美化的ISAM(索引顺序访问方法)包。特别是嵌套游标,而不是将SQL语句组合成一个更大的语句。这也算“滥用优化器”,因为实际上优化器能做的不多。这可以与非准备语句结合使用,以获得最大的效率:
DECLARE c1 CURSOR FOR SELECT Col1, Col2, Col3 FROM Table1
FOREACH c1 INTO a.col1, a.col2, a.col3
DECLARE c2 CURSOR FOR
SELECT Item1, Item2, Item3
FROM Table2
WHERE Table2.Item1 = a.col2
FOREACH c2 INTO b.item1, b.item2, b.item3
...process data from records a and b...
END FOREACH
END FOREACH
正确的解决方案(几乎总是)是将两个SELECT语句合并为一个:
DECLARE c1 CURSOR FOR
SELECT Col1, Col2, Col3, Item1, Item2, Item3
FROM Table1, Table2
WHERE Table2.Item1 = Table1.Col2
-- ORDER BY Table1.Col1, Table2.Item1
FOREACH c1 INTO a.col1, a.col2, a.col3, b.item1, b.item2, b.item3
...process data from records a and b...
END FOREACH
双循环版本的唯一优点是,您可以很容易地发现表1中值之间的中断,因为内部循环结束了。这可能是控制中断报告中的一个因素。
此外,应用程序中的排序通常是不允许的。
我最不喜欢的是
Using spaces when creating tables, sprocs etc. I'm fine with CamelCase or under_scores and singular or plurals and UPPERCASE or lowercase but having to refer to a table or column [with spaces], especially if [ it is oddly spaced] (yes, I've run into this) really irritates me. Denormalized data. A table doesn't have to be perfectly normalized, but when I run into a table of employees that has information about their current evaluation score or their primary anything, it tells me that I will probably need to make a separate table at some point and then try to keep them synced. I will normalize the data first and then if I see a place where denormalization helps, I'll consider it. Overuse of either views or cursors. Views have a purpose, but when each table is wrapped in a view it's too much. I've had to use cursors a few times, but generally you can use other mechanisms for this. Access. Can a program be an anti-pattern? We have SQL Server at my work, but a number of people use access due to it's availabilty, "ease of use" and "friendliness" to non-technical users. There is too much here to go into, but if you've been in a similar environment, you know.
FROM TableA, TableB WHERE语法用于连接而不是FROM TableA内部连接TableB上 假设查询将以某种方式返回,而不放入ORDER BY子句,因为这是在查询工具中测试时显示的方式。
var query = "select COUNT(*) from Users where UserName = '"
+ tbUser.Text
+ "' and Password = '"
+ tbPassword.Text +"'";
盲目相信用户输入 不使用参数化查询 明文密码
我需要把我自己目前最喜欢的放在这里,只是为了使列表完整。我最喜欢的反模式是不测试您的查询。
这适用于以下情况:
您的查询涉及多个表。 您认为您有一个查询的最优设计,但不需要测试您的假设。 您接受第一个有效的查询,不知道它是否接近优化。
任何针对非典型或不充分数据进行的测试都不算数。如果它是一个存储过程,将测试语句放入注释中并保存它,并保存结果。否则,将其与结果一起放入代码中的注释中。
我发现,在性能方面,有两点是最重要的,并且可能会有很大的成本:
使用游标而不是基于集合 表达式。我想当程序员以过程的方式思考时,这种情况经常发生。 使用相关子查询,当a 连接到派生表可以执行 的工作。
The Altered View - A view that is altered too often and without notice or reason. The change will either be noticed at the most inappropriate time or worse be wrong and never noticed. Maybe your application will break because someone thought of a better name for that column. As a rule views should extend the usefulness of base tables while maintaining a contract with consumers. Fix problems but don't add features or worse change behavior, for that create a new view. To mitigate do not share views with other projects and, use CTEs when platforms allow. If your shop has a DBA you probably can't change views but all your views will be outdated and or useless in that case. The !Paramed - Can a query have more than one purpose? Probably but the next person who reads it won't know until deep meditation. Even if you don't need them right now chances are you will, even if it's "just" to debug. Adding parameters lowers maintenance time and keep things DRY. If you have a where clause you should have parameters. The case for no CASE - SELECT CASE @problem WHEN 'Need to replace column A with this medium to large collection of strings hanging out in my code.' THEN 'Create a table for lookup and add to your from clause.' WHEN 'Scrubbing values in the result set based on some business rules.' THEN 'Fix the data in the database' WHEN 'Formating dates or numbers.' THEN 'Apply formating in the presentation layer.' WHEN 'Createing a cross tab' THEN 'Good, but in reporting you should probably be using cross tab, matrix or pivot templates' ELSE 'You probably found another case for no CASE but now I have to edit my code instead of enriching the data...' END
反向观点:过度痴迷于正常化。
大多数SQL/ rbdb系统提供了许多非常有用的特性(事务、复制),即使对于非标准化的数据也是如此。磁盘空间很便宜,有时操作/过滤/搜索获取的数据比编写1NF模式更简单(更容易的代码,更快的开发时间),并处理其中的所有麻烦(复杂的连接,讨厌的子选择等)。
我发现过度标准化的系统通常是不成熟的优化,特别是在开发的早期阶段。
(再想想……http://writeonly.wordpress.com/2008/12/05/simple-object-db-using-json-and-python-sqlite/)
也许这不是一个反模式,但它惹恼了我,当某些数据库的DBA(好吧,我在这里说的是Oracle)用Oracle风格和代码约定编写SQL Server代码,当它运行如此糟糕时抱怨。受够了游标Oracle的人!SQL是基于设置的。
像这样将冗余表连接到查询中:
select emp.empno, dept.deptno
from emp
join dept on dept.deptno = emp.deptno;
我只是把这个放在一起,基于一些SQL响应这里在SO。
认为触发器之于数据库就像事件处理程序之于OOP是一种严重的反模式。有一种看法是,任何旧的逻辑都可以放入触发器中,当一个事务(事件)在表上发生时被触发。
不正确的。最大的区别之一是触发器是同步的——而且是完全同步的,因为它们在集合操作上是同步的,而不是在行操作上。在OOP方面,正好相反——事件是实现异步事务的有效方法。
使用@@IDENTITY代替SCOPE_IDENTITY()
引自以下回答:
@@IDENTITY returns the last identity value generated for any table in the current session, across all scopes. You need to be careful here, since it's across scopes. You could get a value from a trigger, instead of your current statement. SCOPE_IDENTITY returns the last identity value generated for any table in the current session and the current scope. Generally what you want to use. IDENT_CURRENT returns the last identity value generated for a specific table in any session and any scope. This lets you specify which table you want the value from, in case the two above aren't quite what you need (very rare). You could use this if you want to get the current IDENTITY value for a table that you have not inserted a record into.
SELECT FirstName + ' ' + LastName as "Full Name", case UserRole when 2 then "Admin" when 1 then "Moderator" else "User" end as "User's Role", case SignedIn when 0 then "Logged in" else "Logged out" end as "User signed in?", Convert(varchar(100), LastSignOn, 101) as "Last Sign On", DateDiff('d', LastSignOn, getDate()) as "Days since last sign on", AddrLine1 + ' ' + AddrLine2 + ' ' + AddrLine3 + ' ' + City + ', ' + State + ' ' + Zip as "Address", 'XXX-XX-' + Substring(Convert(varchar(9), SSN), 6, 4) as "Social Security #" FROM Users
或者,把所有内容都塞进一行。
临时表滥用。
特别是这类事情:
SELECT personid, firstname, lastname, age
INTO #tmpPeople
FROM People
WHERE lastname like 's%'
DELETE FROM #tmpPeople
WHERE firstname = 'John'
DELETE FROM #tmpPeople
WHERE firstname = 'Jon'
DELETE FROM #tmpPeople
WHERE age > 35
UPDATE People
SET firstname = 'Fred'
WHERE personid IN (SELECT personid from #tmpPeople)
不要从查询中构建临时表,只是为了删除不需要的行。
是的,我在生产db中看到过这种形式的代码页。
有一张桌子
code_1
value_1
code_2
value_2
...
code_10
value_10
而不是有3个表
Code, value和code_value
你永远不知道什么时候你可能需要10对以上的代码,价值。
如果只需要一对,就不会浪费磁盘空间。
在他们职业生涯的前6个月学习SQL,在接下来的10年里从不学习其他任何东西。特别是没有学习或有效地使用窗口/分析SQL特性。特别是over()和partition by的使用。
窗口函数,比如聚合 函数时,对对象进行聚合 定义的行集(组),但是 而不是返回一个值 组,窗口函数可以返回 每个组有多个值。
请参阅O'Reilly SQL Cookbook附录A,以获得窗口函数的良好概述。
没有使用With子句或适当的连接并依赖子查询。
反模式:
select
...
from data
where RECORD.STATE IN (
SELECT STATEID
FROM STATE
WHERE NAME IN
('Published to test',
'Approved for public',
'Published to public',
'Archived'
))
好: 我喜欢使用with子句使我的意图更易于阅读。
with valid_states as (
SELECT STATEID
FROM STATE
WHERE NAME IN
('Published to test',
'Approved for public',
'Published to public',
'Archived'
)
select ... from data, valid_states
where data.state = valid_states.state
最好的:
select
...
from data join states using (state)
where
states.state in ('Published to test',
'Approved for public',
'Published to public',
'Archived'
)
回复:使用@@IDENTITY代替SCOPE_IDENTITY()
两者都不能用;使用输出代替
参见https://connect.microsoft.com/SQLServer/feedback/details/328811/scope-identity-sometimes-returns-incorrect-value
我最喜欢的SQL反模式:
对非唯一列进行JOIN,并使用SELECT DISTINCT修剪结果。
创建连接多个表的视图,只是为了从一个表中选择少数列。
CREATE VIEW my_view AS
SELECT * FROM table1
JOIN table2 ON (...)
JOIN table3 ON (...);
SELECT col1, col2 FROM my_view WHERE col3 = 123;
我看到视图定义是这样的:
CREATE OR REPLACE FORCE VIEW PRICE (PART_NUMBER, PRICE_LIST, LIST_VERSION ...)
AS
SELECT sp.MKT_PART_NUMBER,
sp.PRICE_LIST,
sp.LIST_VERSION,
sp.MIN_PRICE,
sp.UNIT_PRICE,
sp.MAX_PRICE,
...
视图中大约有50个列。有些开发人员以不提供列别名而折磨他人为傲,因此必须计算两个位置的列偏移量,以便能够找出视图中对应的列。
应用程序连接 不仅仅是一个SQL问题,而是在寻找问题的描述和发现这个问题时,我很惊讶它没有被列出。
正如我所听到的那样,应用程序连接是指从两个或多个表中取出一组行,然后用一对嵌套的for循环将它们连接到(Java)代码中。这给系统(应用程序和数据库)带来了负担,必须识别整个叉乘,检索它并将其发送给应用程序。假设应用程序可以像数据库一样快地过滤叉乘(不确定),只是更快地削减结果集意味着更少的数据传输。
编写查询的开发人员没有很好地了解SQL应用程序(包括单个查询和多用户系统)的快慢。这包括对以下方面的无知:
physical I/O minimization strategies, given that most queries' bottleneck is I/O not CPU perf impact of different kinds of physical storage access (e.g. lots of sequential I/O will be faster than lots of small random I/O, although less so if your physical storage is an SSD!) how to hand-tune a query if the DBMS produces a poor query plan how to diagnose poor database performance, how to "debug" a slow query, and how to read a query plan (or EXPLAIN, depending on your DBMS of choice) locking strategies to optimize throughput and avoid deadlocks in multi-user applications importance of batching and other tricks to handle processing of data sets table and index design to best balance space and performance (e.g. covering indexes, keeping indexes small where possible, reducing data types to minimum size needed, etc.)
推荐文章
- 在SQL server查询中将NULL替换为0
- 在SQL中修改表的模式名
- 如何在SQL Server 2005的一条语句中更新两个表?
- 如何创建临时表与SELECT * INTO tempTable从CTE查询
- 用于查找计数为>的记录的SQL查询
- “从Table1左连接Table2”和“从Table2右连接Table1”可以互换吗?
- 在SQL Server的选择语句中使用带TOP的变量,而不是动态的
- 自然连接和内部连接的区别
- MySQL现在()+1天
- 在SQL中转换月号到月名函数
- 改变一个varchar列的最大长度?
- 外键约束:何时使用ON UPDATE和ON DELETE
- 暂时关闭约束(MS SQL)
- WHERE子句中的IF子句
- Postgresql列表和排序表的大小