我对MongoDb非常兴奋,最近一直在测试它。我在MySQL中有一个叫posts的表,大约有2000万条记录,索引只在一个名为“id”的字段上。

我想与MongoDB比较速度,我运行了一个测试,从我们巨大的数据库中随机获取并打印15条记录。我为mysql和MongoDB分别运行了大约1000次查询,我很惊讶我没有注意到速度上有很大的差异。也许MongoDB快1.1倍。这太令人失望了。我做错什么了吗?我知道我的测试并不完美,但当涉及到读取密集的杂务时,MySQL与MongoDb不相上下。

注意:

我有双核+(2线程)i7 cpu和4GB ram 我在MySQL上有20个分区,每个分区有100万条记录

用于测试MongoDB的示例代码

<?php
function microtime_float()
{
    list($usec, $sec) = explode(" ", microtime());
    return ((float)$usec + (float)$sec);
}
$time_taken = 0;
$tries = 100;
// connect
$time_start = microtime_float();

for($i=1;$i<=$tries;$i++)
{
    $m = new Mongo();
    $db = $m->swalif;
    $cursor = $db->posts->find(array('id' => array('$in' => get_15_random_numbers())));
    foreach ($cursor as $obj)
    {
        //echo $obj["thread_title"] . "<br><Br>";
    }
}

$time_end = microtime_float();
$time_taken = $time_taken + ($time_end - $time_start);
echo $time_taken;

function get_15_random_numbers()
{
    $numbers = array();
    for($i=1;$i<=15;$i++)
    {
        $numbers[] = mt_rand(1, 20000000) ;

    }
    return $numbers;
}

?>

测试MySQL的示例代码

<?php
function microtime_float()
{
    list($usec, $sec) = explode(" ", microtime());
    return ((float)$usec + (float)$sec);
}
$BASE_PATH = "../src/";
include_once($BASE_PATH  . "classes/forumdb.php");

$time_taken = 0;
$tries = 100;
$time_start = microtime_float();
for($i=1;$i<=$tries;$i++)
{
    $db = new AQLDatabase();
    $sql = "select * from posts_really_big where id in (".implode(',',get_15_random_numbers()).")";
    $result = $db->executeSQL($sql);
    while ($row = mysql_fetch_array($result) )
    {
        //echo $row["thread_title"] . "<br><Br>";
    }
}
$time_end = microtime_float();
$time_taken = $time_taken + ($time_end - $time_start);
echo $time_taken;

function get_15_random_numbers()
{
    $numbers = array();
    for($i=1;$i<=15;$i++)
    {
        $numbers[] = mt_rand(1, 20000000);

    }
    return $numbers;
}
?>

当前回答

这里有一个小研究,探讨了RDBMS vs NoSQL使用MySQL vs Mongo,结论是一致的@Sean Reilly的回应。简而言之,好处来自于设计,而不是一些原始的速度差异。35-36页结论:

RDBMS vs NoSQL:性能和伸缩性比较

The project tested, analysed and compared the performance and scalability of the two database types. The experiments done included running different numbers and types of queries, some more complex than others, in order to analyse how the databases scaled with increased load. The most important factor in this case was the query type used as MongoDB could handle more complex queries faster due mainly to its simpler schema at the sacrifice of data duplication meaning that a NoSQL database may contain large amounts of data duplicates. Although a schema directly migrated from the RDBMS could be used this would eliminate the advantage of MongoDB’s underlying data representation of subdocuments which allowed the use of less queries towards the database as tables were combined. Despite the performance gain which MongoDB had over MySQL in these complex queries, when the benchmark modelled the MySQL query similarly to the MongoDB complex query by using nested SELECTs MySQL performed best although at higher numbers of connections the two behaved similarly. The last type of query benchmarked which was the complex query containing two JOINS and and a subquery showed the advantage MongoDB has over MySQL due to its use of subdocuments. This advantage comes at the cost of data duplication which causes an increase in the database size. If such queries are typical in an application then it is important to consider NoSQL databases as alternatives while taking in account the cost in storage and memory size resulting from the larger database size.

其他回答

这里有一个小研究,探讨了RDBMS vs NoSQL使用MySQL vs Mongo,结论是一致的@Sean Reilly的回应。简而言之,好处来自于设计,而不是一些原始的速度差异。35-36页结论:

RDBMS vs NoSQL:性能和伸缩性比较

The project tested, analysed and compared the performance and scalability of the two database types. The experiments done included running different numbers and types of queries, some more complex than others, in order to analyse how the databases scaled with increased load. The most important factor in this case was the query type used as MongoDB could handle more complex queries faster due mainly to its simpler schema at the sacrifice of data duplication meaning that a NoSQL database may contain large amounts of data duplicates. Although a schema directly migrated from the RDBMS could be used this would eliminate the advantage of MongoDB’s underlying data representation of subdocuments which allowed the use of less queries towards the database as tables were combined. Despite the performance gain which MongoDB had over MySQL in these complex queries, when the benchmark modelled the MySQL query similarly to the MongoDB complex query by using nested SELECTs MySQL performed best although at higher numbers of connections the two behaved similarly. The last type of query benchmarked which was the complex query containing two JOINS and and a subquery showed the advantage MongoDB has over MySQL due to its use of subdocuments. This advantage comes at the cost of data duplication which causes an increase in the database size. If such queries are typical in an application then it is important to consider NoSQL databases as alternatives while taking in account the cost in storage and memory size resulting from the larger database size.

来自MongoDB官方网站

观察这两个系统的一些高级查询行为,我们可以看到MySQL在选择大量记录时速度更快,而MongoDB在插入或更新大量记录时速度明显更快。

参考

在单服务器上,给定表/doc, MongoDb在读写方面不会比mysql MyISAM更快 大小从1gb到20gb不等。 在多节点集群上,MonoDB在并行缩减(Parallel Reduce)上速度更快,而Mysql不能水平扩展。

MongoDB并不是神奇地更快。如果您存储相同的数据,以基本相同的方式组织,并以完全相同的方式访问它,那么您真的不应该期望您的结果有很大的不同。毕竟,MySQL和MongoDB都是GPL,所以如果MongoDB有一些神奇的更好的IO代码,那么MySQL团队就可以把它合并到他们的代码库中。

人们看到真实世界的MongoDB性能很大程度上是因为MongoDB允许您以一种对您的工作负载更敏感的不同方式进行查询。

例如,考虑一个设计,它以规范化的方式保存了关于一个复杂实体的大量信息。这可以很容易地使用MySQL(或任何关系db)中的数十个表来以正常形式存储数据,需要许多索引来确保表之间的关系完整性。

现在考虑相同的文档存储设计。如果所有这些相关的表都从属于主表(通常是这样),那么您就可以对数据进行建模,使整个实体存储在单个文档中。在MongoDB中,您可以将其存储为单个文档,在单个集合中。这就是MongoDB开始实现卓越性能的地方。

在MongoDB中,要检索整个实体,你必须执行:

对集合进行一次索引查找(假设实体是通过id获取的) 检索一个数据库页面的内容(实际的二进制json文档)

b-树查找,二叉页读取。Log(n) + 1个IOs。如果索引可以完全驻留在内存中,则为1 IO。

在有20个表的MySQL中,你必须执行:

根表上的一次索引查找(同样,假设实体是通过id获取的) 对于聚集索引,我们可以假设根行的值在索引中 对实体的pk值进行20+范围查找(希望是在索引上) 这些可能不是聚集索引,因此一旦我们确定了适当的子行是什么,就需要进行相同的20+数据查找。

所以对于mysql,即使假设所有索引都在内存中(这很难,因为索引的数量是内存的20倍),也需要大约20次范围查找。

这些范围查找很可能由随机IO组成——不同的表肯定位于磁盘上的不同位置,对于同一个实体,同一表中相同范围中的不同行可能不是连续的(取决于实体是如何更新的,等等)。

因此,对于这个例子,与MongoDB相比,MySQL的每次逻辑访问的IO数量是MongoDB的20倍。

这就是MongoDB在某些用例中提高性能的原因。

来源:https://github.com/webcaetano/mongo-mysql

10行

mysql insert: 1702ms
mysql select: 11ms

mongo insert: 47ms
mongo select: 12ms

100行

mysql insert: 8171ms
mysql select: 10ms

mongo insert: 167ms
mongo select: 60ms

1000行

mysql insert: 94813ms (1.58 minutes)
mysql select: 13ms

mongo insert: 1013ms
mongo select: 677ms

10.000行

mysql insert: 924695ms (15.41 minutes)
mysql select: 144ms

mongo insert: 9956ms (9.95 seconds)
mongo select: 4539ms (4.539 seconds)