Markov Wanderer

Blog: Markov Wanderer

National statistics matter: they have a direct bearing on everything from government targets, to funding formulae, to the focus of the media, to the UK’s debt. But there…

If you’ve ever inherited responsibility for a messy Python codebase, or returned to your own code after a few months, and wondered “what on earth is going on here?”, you’re…

I’ve been thinking a lot about efficiency in the public sector recently. This post looks at ideas for increasing the efficiency of analysis and operations through…

I’ve been thinking a lot about efficiency in the public sector recently, particularly how we can improve it. In this post, I’ll focus on some ideas for improving…

What’s going on in the world of data validation? For those of you who don’t know, data validation is the process of checking data quality in an automated or semi-automated…

In January 2020, Claudio Jolowicz published an extremely influential post on Hypermodern Python. It was extremely influential on me, anyway, because it introduced me to a…

Today I learned how to resume sessions on virtual machines while using Visual Studio Code remote.

It would be nice to have digital copies of all of those old handwritten lecture notes that I so lovingly put together. Some of them might even still be useful, though I have…

Many of us will have experienced bad hardware or software at work. Applications that freeze when you try and do something. A lag when typing. Some programmes ceasing to work…

In this TIL, I find out how to create a new MySQL database on Microsoft Azure. This is a place to store structured, tabular data. Note that the instructions below assume you…

In this TIL, I find out how to create a new blob storage account on Microsoft Azure. This is a place to store unstructured data of any kind (as opposed to, say, a SQL…

In a previous blog post, I looked at how to connect desktop-based Visual Studio Code to a Google Cloud Virtual machine; today, it’s how to do the same using a virtual…

Researchers frequently want to be able to access a second computer that works like a normal computer (think a virtual desktop rather than a virtual machine + command line)…

I was recently asked to give a talk at No. 10 Downing Street on the topic of data science with impact and, in this post, I’m going to share some of what I said in that talk.…

Many large institutions, including in the public sector, have a set of forecasts, predictions, or estimated statistical relationships (perhaps from a linear regression)…

Cloud tools and Python packages have become so powerful that you can build a (scalable) cloud-based API in fewer than 200 lines of code. In this blog post, you’ll see how to…

There have been a series of sometimes jaw-dropping developments in data science in the last few years, with large language models by far the most prominent (and with good…

I’ve long been interested in how best to store knowledge; so much that I wrote about it in this post (in the context of the public sector). Today I learned how to combine…

In a previous post, I looked at four ways we might be able to establish the way that the number of self-storage facilities is trending over time. You can read that post using …

There’s a mystery at the fringes of our towns and cities: beyond the concrete circulars and just off the dual carriageways, a seemingly growing amount of our stuff is…

Data science has enormous potential to do good in the public sector. The efficiencies that are possible from automation and reproducible analytical pipelines alone are…

Who thinks the public sector is good enough at managing its stock of knowledge; the ideas, strategies, processes, and decisions that go into the efficient provision of…

“Nobody ever got fired for buying Microsoft” goes an old saying. Actually, it was probably first said in the 1980s in reference to IBM (School Microcomputing Bulletin 1983)

In this blog, I look at some of reasons why APIs are such a great way to share data.

Note: this is the first post under a new tag called TIL or “today I learned”. These are shorter format posts that lower the barrier to blogging and capture a mini piece of…

This is the first post on a brand new blog site: welcome!

Visual Studio Code is incredibly powerful, whether it’s for writing markdown, writing quarto (.qmd) files, getting syntax highlighting and peerless language support (eg…

In this post, you will find hints and tips for writing impactful blog posts that summarise research or analysis. This is a cross-post with a new page on Coding for Economists

This post will show you how to set up Visual Studio Code as an integrated development environment for the statistical language R. This will include some useful features such…

Typically, what I want to do when I create a blog post is to combine text, code, and code output, and then push it to the github repo that hosts my website. But what are the…

(Remember that to use these, you will need to run pip install packagename on the command line.)

This monster blog post is going to discuss how to organise your a data science project or research project: data, code and outputs. I’ll cover how to structure the project…

Since publishing this post, I have written the specification_curve package for Python. specification_curve automates some aspects of specification curve analysis, namely…

In a previous post, I shared links about the predictors for not participating in higher education, and about how it is difficult to reach audiences in “remote rural or…

Given my work in both economics and Science, Technology, Engineering, and Mathematics (STEM), I’ve become interested in what factors determine groups’ participation in…

And when I say latest, this particular method was invented in 1953.

The fourth in the series of posts covering econometrics in Python. This time: automating the boring business of running multiple regressions on columns in a pandas dataframe.

The third in a series of posts covering econometrics in Python. Here I look at ‘causal forests’.

In this second in a series on econometrics in Python, I’ll look at how to implement fixed effects.

The idea is that this will be the first in a series of posts covering econometrics in Python.

I now recommend the style file below for quick, publication quality plots in Python using Matplotlib (tested on 3.3.4 and Python 3.8). To use the style, save it in a file…

The Office for National Statistics (ONS) produces most of the macroeconomic statistics for the UK. I was delighted to discover recently that they had been working on an API.

Pigou, Keynes and Shiller all recognised the importance of narratives and sentiment for the economy. But we don’t know too much about how narratives spread. One of the most…
No matching items