

allows me to choose between including or excluding output, prevents me from accidentally committing output if I do not want it, allows me to keep output in my local version, allows me to see when I have changes in the inputs using my version control system (i.e. if I only version control the inputs but my local file has outputs, then I would like to be able to see if the inputs have changed (requiring a commit). Using the version control status command will always register a difference since the local file has outputs.) allows me to update my working notebook (which contains the output) from an updated clean notebook. (update)


I accidentally commit a version with the the output, thereby polluting my repository. I clear output to use version control, but would really rather keep the output in my local copy (sometimes it takes a while to reproduce for example). Some of the scripts that strip output change the format slightly compared to the Cell/All Output/Clear menu option, thereby creating unwanted noise in the diffs. This is resolved by some of the answers. When pulling changes to a clean version of the file, I need to find some way of incorporating those changes in my working notebook without having to rerun everything. (update)



更新:我一直在玩我修改过的笔记本版本,它可以选择保存一个.clean版本,每次保存都使用Gregory Crosswhite的建议。这满足了我的大部分约束条件,但留下了以下问题:

This is not yet a standard solution (requires a modification of the ipython source. Is there a way of achieving this behaviour with a simple extension? Needs some sort of on-save hook. A problem I have with the current workflow is pulling changes. These will come in to the .clean file, and then need to be integrated somehow into my working version. (Of course, I can always re-execute the notebook, but this can be a pain, especially if some of the results depend on long calculations, parallel computations, etc.) I do not have a good idea about how to resolve this yet. Perhaps a workflow involving an extension like ipycache might work, but that seems a little too complicated.



When the notebook is running, one can use the Cell/All Output/Clear menu option for removing the output. There are some scripts for removing output, such as the script nbstripout.py which remove the output, but does not produce the same output as using the notebook interface. This was eventually included in the ipython/nbconvert repo, but this has been closed stating that the changes are now included in ipython/ipython,but the corresponding functionality seems not to have been included yet. (update) That being said, Gregory Crosswhite's solution shows that this is pretty easy to do, even without invoking ipython/nbconvert, so this approach is probably workable if it can be properly hooked in. (Attaching it to each version control system, however, does not seem like a good idea — this should somehow hook in to the notebook mechanism.)




977:笔记本功能请求(打开)。 1280:清除-all保存选项(打开)。(从下面的讨论。) 3295:自动导出的笔记本:只导出显式标记的单元格(关闭)。扩展解决11添加写和执行魔法(合并)。


1621: clear In[] prompt numbers on "Clear All Output" (Merged). (See also 2519 (Merged).) 1563: clear_output improvements (Merged). 3065: diff-ability of notebooks (Closed). 3291: Add the option to skip output cells when saving. (Closed). This seems extremely relevant, however was closed with the suggestion to use a "clean/smudge" filter. A relevant question what can you use if you want to strip off output before running git diff? seems not to have been answered. 3312: WIP: Notebook save hooks (Closed). 3747: ipynb -> ipynb transformer (Closed). This is rebased in 4175. 4175: nbconvert: Jinjaless exporter base (Merged). 142: Use STDIN in nbstripout if no input is given (Open).








from IPython.nbformat import current
import io
from os import remove, rename
from shutil import copyfile
from subprocess import Popen
from sys import argv

for filename in argv[1:]:
    # Backup the current file
    backup_filename = filename + ".backup"

        # Read in the notebook
        with io.open(filename,'r',encoding='utf-8') as f:
            notebook = current.reads(f.read(),format="ipynb")

        # Strip out all of the output and prompt_number sections
        for worksheet in notebook["worksheets"]:
            for cell in worksheet["cells"]:
               cell.outputs = []
               if "prompt_number" in cell:
                    del cell["prompt_number"]

        # Write the stripped file
        with io.open(filename, 'w', encoding='utf-8') as f:

        # Run git add to stage the non-output changes
        print("git add",filename)

        # Restore the original file;  remove is needed in case
        # we are running in windows.

一旦脚本在您想要提交的文件上运行,只需运行git commit。



将包含此内容的文件保存在某个位置(对于下面的内容,让我们假设~/bin/ipynb_output_filter.py) (chmod +x ~/bin/ipynb_output_filter.py) 创建文件~/。Gitattributes,包含以下内容 *。ipynb过滤器= dropoutput_ipynb 执行如下命令: Git配置——全局核心。attributesfile ~ / .gitattributes Git配置——global filter.dropoutput_ipynb。干净的~ / bin / ipynb_output_filter.py Git配置——global filter.dropoutput_ipynb。涂抹的猫



it works only with git in git, if you are in branch somebranch and you do git checkout otherbranch; git checkout somebranch, you usually expect the working tree to be unchanged. Here instead you will have lost the output and cells numbering of notebooks whose source differs between the two branches. more in general, the output is not versioned at all, as with Gregory's solution. In order to not just throw it away every time you do anything involving a checkout, the approach could be changed by storing it in separate files (but notice that at the time the above code is run, the commit id is not known!), and possibly versioning them (but notice this would require something more than a git commit notebook_file.ipynb, although it would at least keep git diff notebook_file.ipynb free from base64 garbage). that said, incidentally if you do pull code (i.e. committed by someone else not using this approach) which contains some output, the output is checked out normally. Only the locally produced output is lost.



如果你确实采用了我建议的解决方案——也就是全局的解决方案——你会遇到一些麻烦,比如你想要版本输出的git repo。因此,如果你想禁用特定git存储库的输出过滤,只需在其中创建一个文件.git/info/attributes,使用 **ipynb过滤器=


代码现在在它自己的git repo中维护 如果上面的指令导致ImportErrors,尝试在脚本路径之前添加"ipython": Git配置——global filter.dropoutput_ipynb。清除ipython ~/bin/ipynb_output_filter.py

编辑:2016年5月(更新于2017年2月):我的脚本有几个替代方案-为了完整性,这里是我知道的列表:nbstripout(其他变体),nbstrip, jq。




它提供了一个CLI,使用git启发的语法来跟踪/更新/区分git repo中的笔记本。


# add a notebook to be tracked
gitnb add SomeNotebook.ipynb

# check the changes before commiting
gitnb diff SomeNotebook.ipynb

# commit your changes (to your git repo)
gitnb commit -am "I fixed a bug"

注意最后一步,我使用“gitnb commit”的地方是提交到你的git repo。它本质上是一个包装

# get the latest changes from your python notebooks
gitnb update

# commit your changes ** this time with the native git commit **
git commit -am "I fixed a bug"



import os
from subprocess import check_call

def post_save(model, os_path, contents_manager):
    """post-save hook for converting notebooks to .py scripts"""
    if model['type'] != 'notebook':
        return # only do this for notebooks
    d, fname = os.path.split(os_path)
    check_call(['ipython', 'nbconvert', '--to', 'script', fname], cwd=d)

c.FileContentsManager.post_save_hook = post_save
