我正在做一个web应用程序,我需要为一些主要的更改做一个分支,事情是,这些更改需要更改数据库模式,所以我想把整个数据库放在git下。

我怎么做呢?是否有一个特定的文件夹,我可以保存在git存储库下?我怎么知道是哪个?我如何确定我放入了正确的文件夹?

我需要确定,因为这些更改是不向后兼容的;我可不能搞砸。

在我的例子中,数据库是PostgreSQL

编辑:

有人建议进行备份并将备份文件置于版本控制之下,而不是将数据库置于版本控制之下。说实话,我觉得这真的很难接受。

肯定有更好的办法。

更新:

好吧,没有更好的方法了,但我还是不太相信,所以我要稍微改变一下问题:

我想将整个数据库置于版本控制之下,我可以使用什么数据库引擎来将实际数据库置于版本控制之下,而不是转储?

sqlite是git友好的吗?

因为这只是开发环境,所以我可以选择任何我想要的数据库。

Edit2:

我真正想要的不是跟踪我的开发历史,而是能够从我的“新的根本性变化”分支切换到“当前稳定的分支”,并且能够用当前稳定的分支修复一些错误/问题等。这样,当我切换分支时,数据库就会自动地与我当前所在的分支兼容。 我不太关心实际数据。


当前回答

我遇到过这个问题,因为我有一个类似的问题,其中一些近似于基于DB的目录结构,存储“文件”,我需要git来管理它。它是分布式的,在云上使用复制,因此它的接入点将通过MySQL。

上述答案的要点,似乎类似地提出了一个问题的替代解决方案,使用Git来管理数据库中的一些东西,这有点错过了重点,所以我将尝试回答这个问题。

Git是一个系统,它在本质上存储了一个增量(差异)数据库,可以对其进行重新组装,以重现上下文。git的正常使用假设上下文是一个文件系统,而那些增量是该文件系统中的diff,但实际上所有git都是一个增量的分层数据库(分层,因为在大多数情况下,每个增量都是一个至少有一个父级的提交,以树状排列)。

理论上,只要你能生成一个增量,git就可以存储它。问题是git通常期望它生成delta的上下文是一个文件系统,类似地,当您签出git层次结构中的一个点时,它期望生成一个文件系统。

If you want to manage change, in a database, you have 2 discrete problems, and I would address them separately (if I were you). The first is schema, the second is data (although in your question, you state data isn't something you're concerned about). A problem I had in the past, was a Dev and Prod database, where Dev could take incremental changes to the schema, and those changes had to be documented in CVS, and propogated to live, along with additions to one of several 'static' tables. We did that by having a 3rd database, called Cruise, which contained only the static data. At any point the schema from Dev and Cruise could be compared, and we had a script to take the diff of those 2 files and produce an SQL file containing ALTER statements, to apply it. Similarly any new data, could be distilled to an SQL file containing INSERT commands. As long as fields and tables are only added, and never deleted, the process could automate generating the SQL statements to apply the delta.

The mechanism by which git generates deltas is diff and the mechanism by which it combines 1 or more deltas with a file, is called merge. If you can come up with a method for diffing and merging from a different context, git should work, but as has been discussed you may prefer a tool that does that for you. My first thought towards solving that is this https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration#External-Merge-and-Diff-Tools which details how to replace git's internal diff and merge tool. I'll update this answer, as I come up with a better solution to the problem, but in my case I expect to only have to manage data changes, in-so-far-as a DB based filestore may change, so my solution may not be exactly what you need.

其他回答

使用版本控制的数据库,现在有几个这样的数据库。

https://www.dolthub.com/blog/2021-09-17-database-version-control/

这些产品没有在其他类型的数据库上应用版本控制——它们是自己的数据库引擎,支持版本控制操作。因此,您需要迁移到它们,或者在它们的基础上开始构建。

我编写了其中一个,DoltDB,它结合了MySQL和Git的接口。看看这里:

https://github.com/dolthub/dolt

没有原子性就无法做到这一点,如果不使用pg_dump或快照文件系统,就无法获得原子性。

我的postgres实例在zfs上,我偶尔会对它进行快照。它几乎是即时和一致的。

这个问题基本上已经回答了,但我想用一个小建议来补充X-Istence和Dana the Sane的回答。

如果您需要具有一定粒度的修订控制,比如每天,那么您可以使用rdiff-backup之类的工具将表和模式的文本转储与增量备份结合起来。这样做的好处是,不存储每日备份的快照,而只存储与前一天的差异。

这样你就有了修订控制的优势,也不会浪费太多的空间。

在任何情况下,直接在频繁更改的大平面文件上使用git都不是一个好的解决方案。如果数据库变得太大,git在管理文件时会出现一些问题。

我认为X-Istence在正确的轨道上,但你可以对这一策略进行更多的改进。首先,使用:

$pg_dump --schema ... 

转储表、序列等,并将此文件置于版本控制之下。您将使用它来分离分支之间的兼容性更改。

接下来,对一组表执行数据转储,这些表包含应用程序操作所需的配置(可能应该跳过用户数据等),比如表单默认值和其他不可修改的数据。你可以有选择地使用:

$pg_dump --table=.. <or> --exclude-table=..

这是一个好主意,因为当数据库达到100Mb以上时,在进行完整的数据转储时,回购会变得非常笨拙。更好的办法是备份你测试应用所需的最小数据集。如果你的默认数据非常大,这仍然可能导致问题。

如果您确实需要在回购中放置完全备份,请考虑在源代码树之外的分支中进行备份。不过,一个引用了匹配的svn rev的外部备份系统可能是最好的。

另外,我建议在修改时使用文本格式转储而不是二进制格式转储(至少对于模式而言),因为这些格式转储更容易区分。

最后,如果还没有查看postgres备份文档,请查看一下。你评论备份“数据库”而不是转储的方式让我怀疑你是否在考虑基于文件系统的备份(注意事项请参阅第23.2节)。

我遇到过这个问题,因为我有一个类似的问题,其中一些近似于基于DB的目录结构,存储“文件”,我需要git来管理它。它是分布式的,在云上使用复制,因此它的接入点将通过MySQL。

上述答案的要点,似乎类似地提出了一个问题的替代解决方案,使用Git来管理数据库中的一些东西,这有点错过了重点,所以我将尝试回答这个问题。

Git是一个系统,它在本质上存储了一个增量(差异)数据库,可以对其进行重新组装,以重现上下文。git的正常使用假设上下文是一个文件系统,而那些增量是该文件系统中的diff,但实际上所有git都是一个增量的分层数据库(分层,因为在大多数情况下,每个增量都是一个至少有一个父级的提交,以树状排列)。

理论上,只要你能生成一个增量,git就可以存储它。问题是git通常期望它生成delta的上下文是一个文件系统,类似地,当您签出git层次结构中的一个点时,它期望生成一个文件系统。

If you want to manage change, in a database, you have 2 discrete problems, and I would address them separately (if I were you). The first is schema, the second is data (although in your question, you state data isn't something you're concerned about). A problem I had in the past, was a Dev and Prod database, where Dev could take incremental changes to the schema, and those changes had to be documented in CVS, and propogated to live, along with additions to one of several 'static' tables. We did that by having a 3rd database, called Cruise, which contained only the static data. At any point the schema from Dev and Cruise could be compared, and we had a script to take the diff of those 2 files and produce an SQL file containing ALTER statements, to apply it. Similarly any new data, could be distilled to an SQL file containing INSERT commands. As long as fields and tables are only added, and never deleted, the process could automate generating the SQL statements to apply the delta.

The mechanism by which git generates deltas is diff and the mechanism by which it combines 1 or more deltas with a file, is called merge. If you can come up with a method for diffing and merging from a different context, git should work, but as has been discussed you may prefer a tool that does that for you. My first thought towards solving that is this https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration#External-Merge-and-Diff-Tools which details how to replace git's internal diff and merge tool. I'll update this answer, as I come up with a better solution to the problem, but in my case I expect to only have to manage data changes, in-so-far-as a DB based filestore may change, so my solution may not be exactly what you need.