为什么SELECT *是不好的做法?如果您添加了想要的新列,这难道不意味着需要更改的代码更少吗?

我知道SELECT COUNT(*)在某些db上是一个性能问题,但是如果您真的想要每个列呢?


当前回答

如果您将字段添加到表中,它们将自动包含在使用select *的所有查询中。这看起来很方便,但它会使您的应用程序变慢,因为您获取的数据比您需要的要多,并且它实际上会在某些时候使您的应用程序崩溃。

在结果的每一行中可以获取多少数据是有限制的。如果向表中添加字段导致结果超过了限制,则在尝试运行查询时将收到错误消息。

这是一种很难发现的错误。您在一个地方做了更改,而在另一个实际上根本不使用新数据的地方它就会崩溃。它甚至可能是一个不太常用的查询,以至于在有人使用它之前需要一段时间,这使得将错误与更改联系起来更加困难。

如果指定希望在结果中显示哪些字段,就不会出现这种开销溢出。

其他回答

我不认为这真的可以有一个通用的规则。在许多情况下,我避免使用SELECT *,但我也使用过SELECT *非常有用的数据框架。

和所有事情一样,有收益也有成本。我认为收益与成本的等式的一部分是你对数据结构有多少控制。在SELECT *工作良好的情况下,数据结构受到严格控制(它是零售软件),因此没有太大的风险,有人会将一个巨大的BLOB字段发送到表中。

有三个主要原因:

Inefficiency in moving data to the consumer. When you SELECT *, you're often retrieving more columns from the database than your application really needs to function. This causes more data to move from the database server to the client, slowing access and increasing load on your machines, as well as taking more time to travel across the network. This is especially true when someone adds new columns to underlying tables that didn't exist and weren't needed when the original consumers coded their data access. Indexing issues. Consider a scenario where you want to tune a query to a high level of performance. If you were to use *, and it returned more columns than you actually needed, the server would often have to perform more expensive methods to retrieve your data than it otherwise might. For example, you wouldn't be able to create an index which simply covered the columns in your SELECT list, and even if you did (including all columns [shudder]), the next guy who came around and added a column to the underlying table would cause the optimizer to ignore your optimized covering index, and you'd likely find that the performance of your query would drop substantially for no readily apparent reason. Binding Problems. When you SELECT *, it's possible to retrieve two columns of the same name from two different tables. This can often crash your data consumer. Imagine a query that joins two tables, both of which contain a column called "ID". How would a consumer know which was which? SELECT * can also confuse views (at least in some versions SQL Server) when underlying table structures change -- the view is not rebuilt, and the data which comes back can be nonsense. And the worst part of it is that you can take care to name your columns whatever you want, but the next guy who comes along might have no way of knowing that he has to worry about adding a column which will collide with your already-developed names.

但这对SELECT *来说也不全是坏事。我在以下用例中大量使用它:

Ad-hoc queries. When trying to debug something, especially off a narrow table I might not be familiar with, SELECT * is often my best friend. It helps me just see what's going on without having to do a boatload of research as to what the underlying column names are. This gets to be a bigger "plus" the longer the column names get. When * means "a row". In the following use cases, SELECT * is just fine, and rumors that it's a performance killer are just urban legends which may have had some validity many years ago, but don't now: SELECT COUNT(*) FROM table; in this case, * means "count the rows". If you were to use a column name instead of * , it would count the rows where that column's value was not null. COUNT(*), to me, really drives home the concept that you're counting rows, and you avoid strange edge-cases caused by NULLs being eliminated from your aggregates. Same goes with this type of query: SELECT a.ID FROM TableA a WHERE EXISTS ( SELECT * FROM TableB b WHERE b.ID = a.B_ID); in any database worth its salt, * just means "a row". It doesn't matter what you put in the subquery. Some people use b's ID in the SELECT list, or they'll use the number 1, but IMO those conventions are pretty much nonsensical. What you mean is "count the row", and that's what * signifies. Most query optimizers out there are smart enough to know this. (Though to be honest, I only know this to be true with SQL Server and Oracle.)

如果您真的想要每个列,我没有看到select(*)和命名列之间的性能差异。命名列的驱动程序可能只是为了明确您希望在代码中看到哪些列。

但是,通常情况下,您不希望每个列和select(*)会导致数据库服务器做不必要的工作,并且必须通过网络传递不必要的信息。它不太可能造成明显的问题,除非系统被大量使用或网络连接很慢。

引用自这篇文章。

永远不要用“SELECT *”,

我发现使用“SELECT *”的原因只有一个。

如有特殊要求和创建动态环境时添加或删除列,由应用程序代码自动处理。在这种特殊情况下,您不需要更改应用程序和数据库代码,这将自动影响生产环境。在这种情况下,您可以使用“SELECT *”。

If you name the columns in a SELECT statement, they will be returned in the order specified, and may thus safely be referenced by numerical index. If you use "SELECT *", you may end up receiving the columns in arbitrary sequence, and thus can only safely use the columns by name. Unless you know in advance what you'll be wanting to do with any new column that gets added to the database, the most probable correct action is to ignore it. If you're going to be ignoring any new columns that get added to the database, there is no benefit whatsoever to retrieving them.