Ultra-modern Python Cookiecutters

code
data science
research
Author

Arthur Turrell

Published

February 23, 2025

In January 2020, Claudio Jolowicz published an extremely influential post on Hypermodern Python. It was extremely influential on me, anyway, because it introduced me to a number of tools that I now consider essential to creating solid, high-quality and low-maintenance Python packages. As part of the article, a cookiecutter template was released to help people create new packages with all of these exciting features.1

1 Cookiecutter templates allow people to fill in a few details in the command line, like project name and Python version, to both copy a repo and populate it with their choices. It’s a simple trick that saves a lot of time in creating a basic project structure. The cookiecutter Python package is used to populate the templates.

But a lot has changed since 2020, and the brilliant hypermodern python cookiecutter repo hasn’t been updated in two years. There are at least three big changes (we’ll come to these) since then that mean it’s well worth revisiting what a modern (or ultramodern) Python package cookiecutter should look like… and I’ve tried to capture them.

So, in this post, I’m introducing two new, ultramodern repos: a Cookiecutter Python Package and a Cookiecutter Research Project. Neither of them are battle tested, far from it, but they do deliver the core goods they promise.

(a) Package
(b) Research
Figure 1: Logos for these dev tools

The cookiecutter python package is designed to help people (including me!) build open source software. The cookiecutter research project is designed to help people (especially me!) spend more research time thinking.

Ultramodern tooling

Both of these new cookiecutters take advantage of major developments in Python tooling:

  • Astral’s extremely fast linter and formatter, Ruff, has burst onto the scene since 2020. And in tjhis repo, it replaces Black (which was in the hypermodern setup) as a linter and formatter, and isort for sorting imports.

  • The arrival of another tool from Astral, uv, which replaces a whole host of tools (including Poetry, which was part of the hypermodern setup). At a minimum, this is a blazing fast package manager and drop-in replacement for pip, but it also:

    • resolves dependency conflicts
    • produces a lockfile
    • automatically creates virtual environments per-project, per-folder
    • can spin up an environment just to run a single script2
    • builds packages
    • can install some versions of Python for some systems
    • creates a valid pyproject.toml file
    • can install and run tools globally with uv tool install and uv tool run.
  • The release of Quarto, which, among many other things, can build websites, and Quartodoc, which can produce automatic reference documentation based on a code base in concert with Quarto. Quarto means you can write your documentation in a Jupyter Notebook, importing your actual package so that the code and code outputs are in lockstep with the codebase. The addition of Quartodoc means that the API reference on your docs is also exactly what is in your code base.3 Quarto requires a standalone installation, while Quartdoc is a Python package. You can find more about using Quarto in anger for research over at Coding for Economists.

2 Yes, that’s how fast it is!

3 Note that Quartodoc is only relevant for the Cookiecutter Python Package.

Cookiecutter Python Package

So what are the key features of the new ultramodern cookiecutter? As of the time of writing, it packs the following:

  • Nox for isolated testing
  • Modern Python dependency management with uv
  • pytest for testing
  • Code formatting with ruff (including formatting and import sorting)
  • xdoctest to check that examples of use in docstrings work as intended
  • typeguard for run-time type checking
  • Git pre-commit hooks for code quality:
    • Ruff lint/format/sort imports
    • check for added large files
      • check TOML
      • check YAML
      • end of file fixer
      • trailing whitespace trimmer
      • nbstripout for ensuring notebook outputs are not committed. (Notebook outputs are included when Quarto pushes the docs website to GitHub pages, however, as you’d expect.)
      • pydoclint for checking docstrings agree with function definitions
  • Continuous Integration/Continuous Deployment with GitHub Actions

For some of this, you’ll need to get API keys. But all of it is free if your repo is public and you have a paid GitHub account.

In the choice of what’s included, I’ve tried to strike a balance between not bloating the features and providing a serious foundation for a strong package. I took the decision not to include the following, for the following reasons:

  • an automatic updating utility, like dependabot; mostly because I’m not sure if there’s one that works well with uv yet.
  • a static type checker, like mypy, because it seemed to be of marginal value-add in other projects and its strictness can be daunting. There was a recent announcement that Astral is working on a static type checker code-named red-knot, and I’m inclined to see what that’s like given the quality of their other tools.

Both of these features could be added in future.

Cookiecutter Research Project

The second new dev tool is a Cookiecutter Research Project that is Python-oriented and flexible. It has a number of useful features designed to make starting, and maintaining, a research project less of a slog:

  • a well-designed folder structure with folders for data at different stages, models, notebooks, code, and outputs.
  • sensible defaults on which of these folders are ignored by git (via a .gitignore file). For example, code, references, paper, and slides folders are under version control. But data, logs, outputs, and so, folders are not.
  • a .env file for storing secrets—researchers are increasingly using cloud computer to do research (see this post for doing this with VS Code and Google Cloud.)
  • pre-commit with Ruff (linting, formating, import sorting), nbstripout, end of file fixer, large file check, trailing whitespace fixer, and toml/yaml checks.
  • uv for managing the Python environment, and making it reproducible via the lockfile.
  • a Makefile with commands for installing the environment (for reproducibility), and for compiling the paper, and the slides.
  • paper and slides based on Quarto—more detail on these below.
  • a project config TOML file where global project settings can be stored. For example, you could have all your chart configurations here, or the hyperparameter settings.

One of the big innovations is the use of Quarto for the paper and the slides (and for the references that get picked up in both.) Quarto is a document, slide, and website publishing tool. It uses an extended form of markdown, supports LaTeX equations, and can execute and insert code into the final document.

One small advantage of Quarto over using LaTeX directly is the use of citation style language files. References are in a .bib file as normal for a LaTeX document with a bibliography, but the style of citations is defined in a .csl (csl = citation style language) file, which is clearer and more flexible than other methods (at least in my view.) You can find a very long list of citation styles, including for most major journals, here.

Another advantage, and hold your horses, because it’s a big one, is that you can automatically update charts, tables, and even text** when your results change. People have long pointed their LaTeX document at their results folder so that when the results change, their paper updates. Some people have done it for slides too, if they’re created with Beamer. But now the text in the document can be updated too.

Quarto can support execution of code within the document itself. So you can have a code block that reads in your results and assigns them to the variables, say, number and big_pen. Then the syntax below shows how to insert these numbers directly into the text.

## Report

We find that the heaviest penguin, out of a total of `python f"{number}"` penguins, has a mass of `python f"{big_pen:.2f}"` kilograms.

Images and equations can be included using the standard markdown syntax, and LaTeX tables can be included in PDF outputs simply by reading and printing a LaTeX table .tex file with the #| output: asis option in a code block. You can see an example in the paper.qmd file in the repo. That paper is set up to compile into something that looks like the default arXiv style. It actually compiles by creating a .tex file as an intermediary and there’s a Quarto config option to save that, should you need too (eg arXiv prefers you to submit a .tex over a PDF.)

Currently, the slides use Reveal.js to output a .html file rather than a PDF so while the code/text and charts work in the same way, the command to include LaTeX is slightly different—and I’ve only tested it for equations. The syntax is: {{< include path/to/equation.tex >}}. Anything that has a HTML output should work too, and that’s true of most regression table packages (eg Pyfixest.) If you prefer Powerpoint4, you can export to that with Quarto too, though I haven’t tested it with typical research outputs.

4 Some people might. No-one I’ve ever met. Or heard about. But it could happen. Probably.

There are a couple of features one could argue for that I have not implemented for the research cookiecutter:

  • a way to re-run all of the analysis. This is quite bespoke to your project so it was hard to include anything meaningful in the template. There is a Makefile in there, so you could extend that to include a part that executes the analysis—you can see an example along these lines here.5
  • a Dockerfile to run your project reliably across systems. It’s possible this will be added in future. Again, you can find an example research project that uses uv and a Dockerfile here.
  • a way to automatically generate a DAG (directed acyclic graph) of all the operations that go into producing the outputs. I find this very useful to understand research projects (even if they’re my own.) The example I keep mentioning uses make2graph to do this if you want to see what I mean and why it might be useful.

5 For more on making your research project a reproducible analytical pipeline, check out this page on Coding for Economists.

Imagine how much time you will save if new results can be incorporated into your paper and slides simply by recompiling! For me, this is a huge win for the amount of time actually spent on the research question vs doing tedious updating.

Conclusion

While these templates are both still wet behind the ears, they could be useful to people today—which is why I’m releasing them now. Feedback and pull requests are welcome. Finally, if you use either of them to produce your own content, let me know by raising an issue. I’ll add your work to the relevant repo—showing that these cookiecutters can be part of a recipe for success!