snilzzor 8 days ago

I've been clearing my output using nbconvert before putting the notebook into version control. I have a precommit hook and a check in CI. This works for my use case but I can understand needing to preserve output.

jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace my_notebook_name.ipynb

PaulHoule 9 days ago

It is simple. Code is one thing and data is another thing; you can mix arbitrary code with arbitrary data but the result might not make sense!

Neurotypicals have a hard time with this kind of contradiction and will try one simplistic answer that almost works and then try a different one and then go back to the old one and eventually they will get interested in something else then give up.

For instance, it makes sense to strip the data out of a jupyter notebook before checking in. You can version manage the code that way. However, people also really want to look at the notebook in github and see the analysis, the data, the results.

  • chewxy 9 days ago

    > Neurotypicals have a hard time with this kind of contradiction ...


    The problem with Jupyter notebooks and version control is that Jupyter notebooks encapsulate temporality. You see this in the little [N] boxes on the left of each cell.

    I suppose this is what you mean by "data"? Its use is a little atypical.

    I have a slightly different workflow. I observed that the top of my Jupyter notebooks are generally more or less static when compared to the lower parts of it. This allows the top parts to slowly coalesce into a proper program over git commits.

    I also try to have a linear notion of variables (i.e. a variable is defined exactly once and used exactly once - permitting construction of values in loops of course).

    This style of development helps with version control as well. Restart the kernel and clear output, then run each cell exactly once before a git commit.

  • dvlat 9 days ago

    One solution can be storing the notebook separately from its rendered version?