How much economic growth can you generate from making a private dataset open? Let’s create a simple model.
Assume firms have total factor productivity \(A\), with baseline \(A_0\). When a firm has access to new data relevant to its production, it moves to TFP \(A_d>A_0\) with
\[ A_{\text{data}} \equiv A_d = A_0(1+\gamma),\qquad \gamma>0 \]
\(\gamma\) is the proportional TFP boost from using the data, \(d\). A fraction \(f\in(0,1)\) of firms can use \(d\) (\(f\ll1\) for any given data release). The other \(1-f\) firms do not directly use the data. However, there are some spillover benefits to them, which have strength \(\varphi\in[0,1]\). The non-data firms’ TFP becomes
\[ A_{\text{other}} = A_0\big(1 + \varphi\cdot f\cdot\gamma\big). \]
Aggregate TFP \(A_{\mathrm{agg}}\) is the weighted average of firm TFPs:
\[ \begin{aligned} A_{\mathrm{agg}} &= f\cdot A_{\text{data}} \;+\; (1-f)\cdot A_{\text{other}} \\ &= f A_0(1+\gamma) \;+\; (1-f) A_0\big(1+\varphi f\gamma\big)\\ &= A_0\Big[1 \;+\; f\gamma \;+\; (1-f)\varphi f\gamma\Big]\\ &= A_0\Big[1 \;+\; f\gamma\big(1+\varphi(1-f)\big)\Big]. \end{aligned} \]
If \(f\) is small, then you have
\[ A_{\mathrm{agg}}\approx A_0\big[1 + f\gamma(1+\varphi)\big]. \]
What is the (one-off) growth effect of releasing data?
Assume capital is fixed in the short run (so output changes proportionally to TFP). Then the immediate jump in aggregate output is
\[ g \equiv \frac{\Delta Y}{Y} \approx \frac{\Delta A_{\mathrm{agg}}}{A_0} = f\gamma\big(1+\varphi(1-f)\big)\approx f\gamma(1+\varphi). \]
What’s going on here? A small fraction \(f\) of firms each get a boost \(\gamma\); spillovers \(\varphi\) multiply that effect on the rest of the economy.
Let’s see how this would look for some plausible numbers. Let’s say a dataset was made public that affects 10,000 firms. For reference, there are around 25,000 estate agents in the UK. You could imagine a dataset relevant to addresses being made open and some number of those estate agents benefitting from it. The UK has 2.72 million businesses, so we’re looking at \(f \approx 0.37\%\). Let’s say the data makes those firms 1% more productive in TFP terms, so \(\gamma = 1\%\).
And what of spillovers? Not so clear but for research and development investments, it seems like spillovers are around 2.5 times the private increase (see this Frontier Economics article). So let’s run with \(\varphi = 2.5\).
Plugging those numbers in, you could see a whole economy one-off growth benefit of
\[ g = f\gamma\big(1+\varphi(1-f)\big) = 0.37\%\cdot 1\%\cdot (1 + 2.5(1-0.37\%)) \approx 0.01 \% \]
Recent UK GDP growth year-on-year has been in the 1% to 2% range, so while 0.01 pp extra is small, it’s not nothing. 2024 UK GDP was 2,851 billion GBP, so we’re talking about 285 million pounds. Worth having!
What does this framework tell us about a real case?
There’s a lovely paper by Loomis et al. (2015) that uses contingent valuation to work out that the value of opening up the US’ Landsat satellite imagery. The authors find that the value of opening up the data is 1.8 billion USD. Using this and other numbers from the paper, and our equation, we can back out the productivity boost.
Based on the valuation, the one-time economic benefit is equivalent to 0.012% of US GDP,
\[\frac{\$1.8 \text{ billion}}{ \$15.094 \text{ trillion}} \times 100 \approx 0.012\%\]
This is pretty close to the made-up numbers. For the Landsat case study, there were 13,473 users, so let’s use that for \(f\) with the US’s 32 million firms. Then
\[ f\gamma(1+\varphi) = \frac{13473}{32\times10^6} \gamma (1+2.5) = 0.012\% \longrightarrow \gamma \approx 8.1\% \]
This makes sense: compared to the made-up example, the fraction of firms directly affected is substantially smaller at 0.04% but the growth is similar—so the productivity must be proportionately higher.
Of course, this is all just back-of-the-envelope, tons of caveats apply, and there are downsides to releasing (some) data. But, to an order of magnitude, open data can have a positive effect on growth.