MySQL数据库在什么时候开始失去性能?
物理数据库大小重要吗? 记录的数量重要吗? 性能下降是线性的还是指数级的?
我有一个我相信是一个大的数据库,大约有1500万条记录,占用了近2GB。基于这些数字,我是否有任何动机清理数据,或者我是否可以允许它继续扩展几年?
MySQL数据库在什么时候开始失去性能?
物理数据库大小重要吗? 记录的数量重要吗? 性能下降是线性的还是指数级的?
我有一个我相信是一个大的数据库,大约有1500万条记录,占用了近2GB。基于这些数字,我是否有任何动机清理数据,或者我是否可以允许它继续扩展几年?
当前回答
这取决于您的查询和验证。
例如,我处理过一个包含10万种药物的表格,表格中每个药物都有一个超过15个字符的列通用名。我输入了一个查询来比较两个表格之间药物的通用名。查询需要更多的时间来运行。同样,如果使用药物索引,使用id列(如上所述)比较药物,只需要几秒钟。
其他回答
I once was called upon to look at a mysql that had "stopped working". I discovered that the DB files were residing on a Network Appliance filer mounted with NFS2 and with a maximum file size of 2GB. And sure enough, the table that had stopped accepting transactions was exactly 2GB on disk. But with regards to the performance curve I'm told that it was working like a champ right up until it didn't work at all! This experience always serves for me as a nice reminder that there're always dimensions above and below the one you naturally suspect.
不,这并不重要。MySQL的速度大约是每秒700万行。所以你可以把它放大一点
我将首先关注您的索引,然后让服务器管理员查看您的操作系统,如果所有这些都没有帮助,可能是时候进行主/从配置了。
这是真的。另一个通常有效的方法是减少重复处理的数据量。如果你有“旧数据”和“新数据”,并且99%的查询都使用新数据,只需将所有旧数据移动到另一个表中-并且不要查看它;)
->看看分区。
数据库大小确实与字节数和表的行数有关。您将注意到light数据库和blob填充数据库之间的巨大性能差异。有一次我的应用程序卡住了,因为我把二进制图像放在字段中,而不是把图像保存在磁盘上的文件中,只把文件名放在数据库中。另一方面,迭代大量的行并不是免费的。
I'm currently managing a MySQL database on Amazon's cloud infrastructure that has grown to 160 GB. Query performance is fine. What has become a nightmare is backups, restores, adding slaves, or anything else that deals with the whole dataset, or even DDL on large tables. Getting a clean import of a dump file has become problematic. In order to make the process stable enough to automate, various choices needed to be made to prioritize stability over performance. If we ever had to recover from a disaster using a SQL backup, we'd be down for days.
Horizontally scaling SQL is also pretty painful, and in most cases leads to using it in ways you probably did not intend when you chose to put your data in SQL in the first place. Shards, read slaves, multi-master, et al, they are all really shitty solutions that add complexity to everything you ever do with the DB, and not one of them solves the problem; only mitigates it in some ways. I would strongly suggest looking at moving some of your data out of MySQL (or really any SQL) when you start approaching a dataset of a size where these types of things become an issue.
更新:几年后,我们的数据集已经增长到大约800 GiB。此外,我们还有一个200+ GiB的表和其他一些50-100 GiB的表。我之前说的都成立。它的性能仍然很好,但运行完整数据集操作的问题变得更糟了。