在你回答这个问题之前,我从来没有开发过任何流行到足以达到高服务器负载的东西。请把我当作(唉)一个刚刚登陆地球的外星人,尽管我知道PHP和一些优化技术。


我正在开发一个PHP工具,可以获得相当多的用户,如果它是正确的。然而,虽然我完全有能力开发程序,但当涉及到制作可以处理巨大流量的东西时,我几乎一无所知。所以这里有一些关于它的问题(也可以把这个问题变成一个资源线程)。

数据库

At the moment I plan to use the MySQLi features in PHP5. However how should I setup the databases in relation to users and content? Do I actually need multiple databases? At the moment everything's jumbled into one database - although I've been considering spreading user data to one, actual content to another and finally core site content (template masters etc.) to another. My reasoning behind this is that sending queries to different databases will ease up the load on them as one database = 3 load sources. Also would this still be effective if they were all on the same server?

缓存

我有一个用于构建页面和交换变量的模板系统。主模板存储在数据库中,每当一个模板被调用时,它的缓存副本(html文档)就会被调用。目前,我在这些模板中有两种类型的变量-静态变量和动态变量。静态变量通常是像页面名称,网站的名称-不经常改变的东西;动态变量是在每次页面加载时改变的东西。

我的问题是:

比如说我对不同的文章有评论。这是一个更好的解决方案:存储简单的注释模板,并在每次页面加载时呈现注释(来自DB调用),或者将注释页面的缓存副本存储为html页面——每次添加/编辑/删除注释时,页面都会被重新检索。

最后

有人有任何提示/指针运行一个高负载的PHP网站。我很确定这是一种可行的语言——Facebook和Yahoo!优先考虑——但有什么经验是我应该注意的吗?


当前回答

没有两个站点是相同的。您确实需要使用像jmeter和benchmark这样的工具来查看问题点在哪里。您可以花费大量的时间来猜测和改进,但是在您度量和比较您的更改之前,您不会看到真正的结果。

例如,多年来,MySQL查询缓存是我们所有性能问题的解决方案。如果你的站点很慢,MySQL专家建议打开查询缓存。事实证明,如果你有一个高的写负载,缓存实际上是瘫痪的。如果你不经过测试就打开它,你永远不会知道。

别忘了,缩放永远不会结束。处理10req/s的站点将需要更改以支持1000req/s。如果您足够幸运,需要支持10,000req/s,那么您的体系结构可能也会完全不同。

数据库

Don't use MySQLi -- PDO is the 'modern' OO database access layer. The most important feature to use is placeholders in your queries. It's smart enough to use server side prepares and other optimizations for you as well. You probably don't want to break your database up at this point. If you do find that one database isn't cutting, there are several techniques to scale up, depending on your app. Replicating to additional servers typically works well if you have more reads than writes. Sharding is a technique to split your data over many machines.

缓存

You probably don't want to cache in your database. The database is typically your bottleneck, so adding more IO's to it is typically a bad thing. There are several PHP caches out there that accomplish similar things like APC and Zend. Measure your system with caching on and off. I bet your cache is heavier than serving the pages straight. If it takes a long time to build your comments and article data from the db, integrate memcache into your system. You can cache the query results and store them in a memcached instance. It's important to remember that retrieving the data from memcache must be faster than assembling it from the database to see any benefit. If your articles aren't dynamic, or you have simple dynamic changes after it's generated, consider writing out html or php to the disk. You could have an index.php page that looks on disk for the article, if it's there, it streams it to the client. If it isn't, it generates the article, writes it to the disk and sends it to the client. Deleting files from the disk would cause pages to be re-written. If a comment is added to an article, delete the cached copy -- it would be regenerated.

其他回答

PDO也非常慢,而且它的API相当复杂。如果不考虑可移植性,任何头脑正常的人都不应该使用它。让我们面对现实吧,99%的网络应用都不是这样的。你只需坚持使用MySQL或PostrgreSQL,或任何你正在使用的。

至于PHP的问题和要考虑什么。我认为过早的优化是万恶之源。,)首先完成你的应用程序,在编程时尽量保持干净,做一点文档并编写单元测试。有了以上所有的方法,在必要的时候重构代码就没有问题了。但首先你想把它做完,然后把它推出去,看看人们对它有什么反应。

谢谢你关于PHP缓存扩展的建议——你能解释一下为什么要使用一个而不是另一个吗?我听说过通过IRC的memcached很棒,但从来没有听说过APC -你对它们有什么看法?我认为使用多个缓存系统会适得其反。

事实上,很多人同时使用APC和memcached…

无论如何,在PHP中缓存是非常简单的,即使没有像memcached这样的扩展/帮助包。

你所需要做的就是使用ob_start()创建一个输出缓冲区。

创建全局缓存函数。调用ob_start,将函数作为回调函数传递。在函数中,查找页面的缓存版本。如果存在,就把它送上,然后结束。

如果不存在,脚本将继续处理。当它到达匹配的ob_end()时,它将调用您指定的函数。这时,您只需要获取输出缓冲区的内容,将它们放到一个文件中,保存文件,然后结束。

添加一些过期/垃圾收集。

许多人没有意识到可以嵌套ob_start()/ob_end()调用。如果你已经在使用输出缓冲区来解析广告或者做语法高亮等等,你可以嵌套另一个ob_start/ob_end调用。

APC是绝对必须的。它不仅是一个伟大的缓存系统,而且从自动缓存的PHP文件中获得的好处是天赐良机。至于多数据库的想法,我认为在同一台服务器上使用不同的数据库不会有什么好处。它可能会在查询时提高一些速度,但我怀疑为确保三者同步而部署和维护代码所付出的努力是否值得。

我还强烈建议运行Xdebug来查找程序中的瓶颈。它使优化对我来说轻而易举。

看来我错了。MySQLi仍在开发中。但是根据这篇文章,PDO_MySQL现在由MySQL团队贡献。摘自文章:

The MySQL Improved Extension - mysqli - is the flagship. It supports all features of the MySQL Server including Charsets, Prepared Statements and Stored Procedures. The driver offers a hybrid API: you can use a procedural or object-oriented programming style based on your preference. mysqli comes with PHP 5 and up. Note that the End of life for PHP 4 is 2008-08-08. The PHP Data Objects (PDO) are a database access abstraction layer. PDO allows you to use the same API calls for various databases. PDO does not offer any degree of SQL abstraction. PDO_MYSQL is a MySQL driver for PDO. PDO_MYSQL comes with PHP 5. As of PHP 5.3 MySQL developers actively contribute to it. The PDO benefit of a unified API comes at the price that MySQL specific features, for example multiple statements, are not fully supported through the unified API. Please stop using the first MySQL driver for PHP ever published: ext/mysql. Since the introduction of the MySQL Improved Extension - mysqli - in 2004 with PHP 5 there is no reason to still use the oldest driver around. ext/mysql does not support Charsets, Prepared Statements and Stored Procedures. It is limited to the feature set of MySQL 4.0. Note that the Extended Support for MySQL 4.0 ends at 2008-12-31. Don't limit yourself to the feature set of such old software! Upgrade to mysqli, see also Converting_to_MySQLi. mysql is in maintenance only mode from our point of view.

对我来说,这篇文章似乎偏向MySQLi。我想我偏向于PDO。 我真的很喜欢PDO胜过MySQLi。这对我来说很简单。这个API更接近于我编写的其他语言。OO数据库接口似乎工作得更好。

我还没有遇到过任何PDO无法提供的MySQL特性。如果有的话,我才会惊讶呢。