MySQL数据库在什么时候开始失去性能?

物理数据库大小重要吗? 记录的数量重要吗? 性能下降是线性的还是指数级的?

我有一个我相信是一个大的数据库,大约有1500万条记录,占用了近2GB。基于这些数字,我是否有任何动机清理数据,或者我是否可以允许它继续扩展几年?


当前回答

物理数据库大小无关紧要。记录的数量并不重要。

In my experience the biggest problem that you are going to run in to is not size, but the number of queries you can handle at a time. Most likely you are going to have to move to a master/slave configuration so that the read queries can run against the slaves and the write queries run against the master. However if you are not ready for this yet, you can always tweak your indexes for the queries you are running to speed up the response times. Also there is a lot of tweaking you can do to the network stack and kernel in Linux that will help.

我的内存达到了10GB,只有中等数量的连接,它处理请求还不错。

我将首先关注您的索引,然后让服务器管理员查看您的操作系统,如果所有这些都没有帮助,那么可能是时候实现主/从配置了。

其他回答

还要注意复杂连接。除了交易量之外,交易复杂性也是一个很大的因素。

重构繁重的查询有时会大大提高性能。

I once was called upon to look at a mysql that had "stopped working". I discovered that the DB files were residing on a Network Appliance filer mounted with NFS2 and with a maximum file size of 2GB. And sure enough, the table that had stopped accepting transactions was exactly 2GB on disk. But with regards to the performance curve I'm told that it was working like a champ right up until it didn't work at all! This experience always serves for me as a nice reminder that there're always dimensions above and below the one you naturally suspect.

I'm currently managing a MySQL database on Amazon's cloud infrastructure that has grown to 160 GB. Query performance is fine. What has become a nightmare is backups, restores, adding slaves, or anything else that deals with the whole dataset, or even DDL on large tables. Getting a clean import of a dump file has become problematic. In order to make the process stable enough to automate, various choices needed to be made to prioritize stability over performance. If we ever had to recover from a disaster using a SQL backup, we'd be down for days.

Horizontally scaling SQL is also pretty painful, and in most cases leads to using it in ways you probably did not intend when you chose to put your data in SQL in the first place. Shards, read slaves, multi-master, et al, they are all really shitty solutions that add complexity to everything you ever do with the DB, and not one of them solves the problem; only mitigates it in some ways. I would strongly suggest looking at moving some of your data out of MySQL (or really any SQL) when you start approaching a dataset of a size where these types of things become an issue.

更新:几年后,我们的数据集已经增长到大约800 GiB。此外,我们还有一个200+ GiB的表和其他一些50-100 GiB的表。我之前说的都成立。它的性能仍然很好,但运行完整数据集操作的问题变得更糟了。

查询性能主要取决于它需要扫描的记录数,索引在其中起着很高的作用,索引数据大小与行数和索引数成正比。

带有索引字段条件和完整值的查询通常会在1毫秒内返回,但是starts_with, in, Between,显然包含条件可能需要更多的时间和更多的记录来扫描。

此外,您还将面临DDL的许多维护问题,如ALTER, DROP将缓慢且难以处理更多的实时流量,即使是添加索引或新列。

一般来说,建议将数据库集群到所需的尽可能多的集群中(500GB将是一个通用的基准,正如其他人所说,它取决于许多因素,并且可以根据用例而变化),这样可以提供更好的隔离性,并提供扩展特定集群的独立性(更适合B2B情况)

谈论“数据库性能”有点毫无意义,“查询性能”在这里是一个更好的术语。答案是:这取决于查询,它所操作的数据,索引,硬件等。您可以了解将要扫描多少行,以及使用EXPLAIN语法将使用哪些索引。

2GB并不算真正的“大”数据库——它更像是一个中等大小的数据库。