是Oracle, MySQL还是他们自己做的?
Spanner is Google's scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: non-blocking reads in the past, lock-free read-only transactions, and atomic schema changes, across all of Spanner.
Megastore is a storage system developed to meet the requirements of today's interactive online services. Megastore blends the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way, and provides both strong consistency guarantees and high availability. We provide fully serializable ACID semantics within fine-grained partitions of data. This partitioning allows us to synchronously replicate each write across a wide area network with reasonable latency and support seamless failover between datacenters. This paper describes Megastore's semantics and replication algorithm. It also describes our experience supporting a wide range of Google production services built with Megastore.
谷歌云数据存储在谷歌的生产中有超过100个面向内部和外部用户的应用程序。应用程序,如Gmail, Picasa,谷歌日历,Android市场和AppEngine使用云数据存储和Megastore。
谷歌趋势使用MillWheel进行流处理。谷歌广告最初使用MySQL,后来迁移到F1 DB -一个自定义编写的分布式关系数据库。Youtube在Vitess中使用MySQL。谷歌在谷歌文件系统的帮助下在商用服务器上存储艾字节的数据。
来源:谷歌数据库:谷歌服务如何存储pb - exabyte规模的数据?
Bigtable is a distributed storage system (built by Google) for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products.
fast and extremely large-scale DBMS a sparse, distributed multi-dimensional sorted map, sharing characteristics of both row-oriented and column-oriented databases. designed to scale into the petabyte range it works across hundreds or thousands of machines it is easy to add more machines to the system and automatically start taking advantage of those resources without any reconfiguration each table has multiple dimensions (one of which is a field for time, allowing versioning) tables are optimized for GFS (Google File System) by being split into multiple tablets - segments of the table as split along a row chosen such that the tablet will be ~200 megabytes in size.
In order to manage the huge tables, Bigtable splits tables at row boundaries and saves them as tablets. A tablet is around 200 MB, and each machine saves about 100 tablets. This setup allows tablets from a single table to be spread among many servers. It also allows for fine-grained load balancing. If one table is receiving many queries, it can shed other tablets or move the busy table to another machine that is not so busy. Also, if a machine goes down, a tablet may be spread across many other servers so that the performance impact on any given machine is minimal.
BigTable大量使用的另一个服务是Chubby,这是一个高可用性、可靠的分布式锁服务。Chubby允许客户端获取一个锁,可能将它与一些元数据相关联,它可以通过向Chubby发送keep alive消息来更新这些元数据。锁存储在类似文件系统的分层命名结构中。
Master servers: assign tablets to tablet servers, keeps track of where tablets are located and redistributes tasks as needed. Tablet servers: handle read/write requests for tablets and split tablets when they exceed size limits (usually 100MB - 200MB). If a tablet server fails, then a 100 tablet servers each pickup 1 new tablet and the system recovers. Lock servers: instances of the Chubby distributed lock service. Lots of actions within BigTable require acquisition of locks including opening tablets for writing, ensuring that there is no more than one active Master at a time, and access control checking.
A slice of an example table that stores Web pages. The row name is a reversed URL. The contents column family contains the page contents, and the anchor column family contains the text of any anchors that reference the page. CNN's home page is referenced by both the Sports Illustrated and the MY-look home pages, so the row contains columns named anchor:cnnsi.com and anchor:my.look.ca. Each anchor cell has one version; the contents column has three versions, at timestamps t3, t5, and t6.
在这里你可以找到谷歌的Jeff Dean在华盛顿大学演讲的视频,讨论谷歌后端使用的Bigtable内容存储系统。