Arthur Turrell

Author

Arthur Turrell

Untangling spaghetti code with smartrappy

python

data science

reproducibility

code

analysis

If you’ve ever inherited responsibility for a messy Python codebase, or returned to your own code after a few months, and wondered “what on earth is going on here?”, you’re…

Efficiency in the public sector: analysis and operations

productivity

public sector

work chat

rap

I’ve been thinking a lot about efficiency in the public sector recently. This post looks at ideas for increasing the efficiency of analysis and operations through…

Efficiency in the public sector: communication and co-ordination

productivity

public sector

work chat

management

I’ve been thinking a lot about efficiency in the public sector recently, particularly how we can improve it. In this post, I’ll focus on some ideas for improving…

The data validation landscape in 2025

data science

data

data engineering

public sector

What’s going on in the world of data validation? For those of you who don’t know, data validation is the process of checking data quality in an automated or semi-automated…

Ultra-modern Python Cookiecutters

code

data science

research

In January 2020, Claudio Jolowicz published an extremely influential post on Hypermodern Python. It was extremely influential on me, anyway, because it introduced me to a…

TIL: How to resume sessions on virtual machines

code

cloud

TIL

Today I learned how to resume sessions on virtual machines while using Visual Studio Code remote.

Converting handwritten, maths-heavy lecture notes to markdown using large language models

llm

python

data science

It would be nice to have digital copies of all of those old handwritten lecture notes that I so lovingly put together. Some of them might even still be useful, though I have…

The false economy of bad IT

work chat

productivity

public sector

Many of us will have experienced bad hardware or software at work. Applications that freeze when you try and do something. A lag when typing. Some programmes ceasing to work…

TIL: how to create and work with a MySQL database on Azure

code

cloud

In this TIL, I find out how to create a new MySQL database on Microsoft Azure. This is a place to store structured, tabular data. Note that the instructions below assume you…

TIL: how to create and work with remote blob storage on Azure

code

cloud

In this TIL, I find out how to create a new blob storage account on Microsoft Azure. This is a place to store unstructured data of any kind (as opposed to, say, a SQL…

TIL: how to connect Visual Studio Code to Azure Virtual Machines

code

research

cloud

python

rstats

In a previous blog post, I looked at how to connect desktop-based Visual Studio Code to a Google Cloud Virtual machine; today, it’s how to do the same using a virtual…

TIL: how to create a virtual desktop from a cloud virtual machine

code

data science

research

Researchers frequently want to be able to access a second computer that works like a normal computer (think a virtual desktop rather than a virtual machine + command line)…

Data science with impact

code

python

rstats

data science

public sector

I was recently asked to give a talk at No. 10 Downing Street on the topic of data science with impact and, in this post, I’m going to share some of what I said in that talk.…

Why have a model registry?

code

open-source

cloud

data science

Many large institutions, including in the public sector, have a set of forecasts, predictions, or estimated statistical relationships (perhaps from a linear regression)…

Building an API in the cloud in fewer than 200 lines of code

code

open-source

cloud

data science

python

Cloud tools and Python packages have become so powerful that you can build a (scalable) cloud-based API in fewer than 200 lines of code. In this blog post, you’ll see how to…

The explosion in time series forecasting packages in data science

code

data science

python

time series

There have been a series of sometimes jaw-dropping developments in data science in the last few years, with large language models by far the most prominent (and with good…

TIL: Obsidian, and integrating it with Zotero

writing

research

TIL

I’ve long been interested in how best to store knowledge; so much that I wrote about it in this post (in the context of the public sector). Today I learned how to combine…

The self-storage problem meets chatGPT

code

data

analysis

llm

In a previous post, I looked at four ways we might be able to establish the way that the number of self-storage facilities is trending over time. You can read that post using …

The mystery of stuff: why all the self-storage?

code

data

geospatial

analysis

llm

There’s a mystery at the fringes of our towns and cities: beyond the concrete circulars and just off the dual carriageways, a seemingly growing amount of our stuff is…

Data science maturity and the cloud

code

python

rstats

data science

cloud

work chat

public sector

Data science has enormous potential to do good in the public sector. The efficiencies that are possible from automation and reproducible analytical pipelines alone are…

The public sector could be better at managing knowledge ‘data’: what can we do?

economics

productivity

public sector

Who thinks the public sector is good enough at managing its stock of knowledge; the ideas, strategies, processes, and decisions that go into the efficient provision of…

Why you shouldn’t code on your work laptop

code

open-source

cloud

data science

work chat

“Nobody ever got fired for buying Microsoft” goes an old saying. Actually, it was probably first said in the 1980s in reference to IBM (School Microcomputing Bulletin 1983)…

In praise of APIs (application programming interfaces)

code

data

In this blog, I look at some of reasons why APIs are such a great way to share data.

TIL: How to break RSS feeds

blogging

code

writing

TIL

Note: this is the first post under a new tag called TIL or “today I learned”. These are shorter format posts that lower the barrier to blogging and capture a mini piece of…

Welcome To The New Home for My Blog

news

This is the first post on a brand new blog site: welcome!

Visual Studio Code on the Cloud

code

research

cloud

python

rstats

Visual Studio Code is incredibly powerful, whether it’s for writing markdown, writing quarto (.qmd) files, getting syntax highlighting and peerless language support (eg…

Writing a Research Blog Post

blogging

research

writing

In this post, you will find hints and tips for writing impactful blog posts that summarise research or analysis. This is a cross-post with a new page on Coding for Economists…

Setting up R in Visual Studio Code

code

rstats

This post will show you how to set up Visual Studio Code as an integrated development environment for the statistical language R. This will include some useful features such…

Three ways to blog with code

code

blogging

Typically, what I want to do when I create a blog post is to combine text, code, and code output, and then push it to the github repo that hosts my website. But what are the…

10 less well-known Python packages

code

open-source

python

(Remember that to use these, you will need to run pip install packagename on the command line.)

Get organised

code

data

research

This monster blog post is going to discuss how to organise your a data science project or research project: data, code and outputs. I’ll cover how to structure the project…

Specification curve analysis

code

research

open-source

econometrics

Since publishing this post, I have written the specification_curve package for Python. specification_curve automates some aspects of specification curve analysis, namely…

Putting women scientists onto Wikipedia

outreach

code

data

In a previous post, I shared links about the predictors for not participating in higher education, and about how it is difficult to reach audiences in “remote rural or…

Who is not participating in Higher Education?

outreach

data

Given my work in both economics and Science, Technology, Engineering, and Mathematics (STEM), I’ve become interested in what factors determine groups’ participation in…

Why the latest, most exciting thing in machine learning is… game theory

research

machine-learning

And when I say latest, this particular method was invented in 1953.

Econometrics in Python Part IV - Running many regressions alongside pandas

code

econometrics

python

The fourth in the series of posts covering econometrics in Python. This time: automating the boring business of running multiple regressions on columns in a pandas dataframe.

Econometrics in Python part III - Estimating heterogeneous treatment effects using random forests

code

econometrics

machine-learning

python

The third in a series of posts covering econometrics in Python. Here I look at ‘causal forests’.

Econometrics in Python Part II - Fixed effects

code

econometrics

python

In this second in a series on econometrics in Python, I’ll look at how to implement fixed effects.

Econometrics in Python part I - Double machine learning

code

econometrics

python

The idea is that this will be the first in a series of posts covering econometrics in Python.

Making a publication quality plot with Python (and latex)

research

visualisation

code

python

I now recommend the style file below for quick, publication quality plots in Python using Matplotlib (tested on 3.3.4 and Python 3.8). To use the style, save it in a file…

The ONS API

data

code

python

The Office for National Statistics (ONS) produces most of the macroeconomic statistics for the UK. I was delighted to discover recently that they had been working on an API.

A better narrative on narratives

research

news

Pigou, Keynes and Shiller all recognised the importance of narratives and sentiment for the economy. But we don’t know too much about how narratives spread. One of the most…