“Nobody ever got fired for buying Microsoft” goes an old saying. Actually, it was probably first said in the 1980s in reference to IBM (School Microcomputing Bulletin 1983), but the meaning remains the same: as the Chief Technology Officer, or similar, you’re not going to get in trouble for buying the boring old thing that everyone else is buying. (But you might get in trouble if you bought something that many of your staff weren’t comfortable with.) The attraction to what is considered safe, known, and widely-used is especially strong for large public institutions because they tend to be risk averse, subject to intense scrutiny, and responsible for statutatory processes that simply cannot go wrong.
Enterprise IT can sacrifice productivity for safety
There’s a trade-off though; when it comes to software and hardware, what is widely used and known may not be at the cutting-edge and, if it’s made endlessly safe, it can actively stifle innovation. To create, at least, a sense of safety, the “nobody ever got fired for buying …” type of software tends to be locked down. In the case of Microsoft’s popular operating system, Windows, which is endemic in large organisations, you may not even be able to change the screen saver: large, otherwise proud organisations humbly display whatever screensaver Microsoft has deemed appropriate rather than anything to do with the firm. Of course, screensavers don’t matter much: it’s when you want to automate the start-up of the computer, or edit a particular type of file, or create a reproducible analytical pipeline, that the locked-down nature of enterprise IT starts to preserve safety at the cost of producitivity. It isn’t just control either; the quality of laptop needed by different employees will vary enormously, but enterprise IT will often see everyone landed with the same kit.
Why is this such a problem in large organisations? You can’t imagine a tech start-up stopping its staff from installing whatever is needed to get the job done. The assumption that CTOs have made is that giving all users the power to change their screensaver or execute scripts will mean someone will eventually harm the organisation (for example, through unintentionally deleting someone else’s data or releasing something publicly that should have been private). Or perhaps it’s just that enterprise IT doesn’t give the fine-grained control needed to empower staff to be productive—perhaps it’s that the systems that CTOs of large organisations can buy are either locked down, or they are not; there’s no customisability. In any case, the fact that large organisations harbour a very large and heterogeneous set of IT users is behind the need for policies and processes that stop people installing the software they need or changing settings to get things done. One size fits all, and the system is crafted around minimising risk rather than maximising productivity. What makes the trade-off much harder is that safety breaches and computing accidents are very countable and discoverable, but lost productivity from innovations that didn’t happen are not.
The locked down nature of enterprise IT presents a real challenge for anyone trying to make their organisation more efficient using data science (or any other innovation for that matter). Want to deploy a dashboard? Hard. Want to deploy a machine learning model? Very hard. Want to ensure everyone has the same code environment for a training course? Tricky, especially if you’ve got people who have different ‘home’ IT because they’re drawn from different units from across the enterprise, each with their own variant of the IT. Even downloading the software to build a machine learning model is nigh on impossible in a locked-down Windows ecosystem: installing Python is often blocked or requires a call to a service desk; installing packages is often blocked and, even when a package delivery solution is in place, it may not work as intended; some packages are frequently blocked from running because they require on-the-fly compilation (PyMC, for example); and then many frameworks do not work on Windows itself. (Windows Subsystem for Linux is not a silver bullet for these problems.) Even the basic automation of scripts and so on is more tricky on Windows, assuming that you are able to run scripts. As the final cherry on the cake, Microsoft Outlook blocks .py
files as they might be harmful (though please put your code under version control rather than emailing it around).
You might think I’m picking on Microsoft here. I am. Because, even though they make some fantastic software (Visual Studio Code is genuinely incredible), they are so dominant in the marketplace. A typical day at a large firm will often involve logging on to a Microsoft Windows computer, opening up Microsoft Outlook to read emails, having calls on Microsoft Teams, surfing the internet via Microsoft Edge, writing a note on Microsoft Word, creating a slide deck on Microsoft Powerpoint, taking notes on Microsoft OneNote, entering a discussion on Microsoft’s Yammer network, and sharing files on the dreaded Microsoft Sharepoint. I simply do not believe that Microsoft produces the best tools for email, calendar, operating system, word processing, data analysis, file sharing, internet browsing, presentations, community discussion, and video conferencing. (Do you think there might be a competition problem here?) I’ll allow that OneNote is pretty good though.
You might also think that enterprise IT solutions mean that everything that works on one computer will work on another. You’d be wrong! Updates are applied at different times for different people, hardware is rolled out gradually rather than all at once, and people can still change their own systems through choosing to install extras from a provided ‘Software Centre’. So, for all that enterprise IT is controlled, it still suffers from the “it works on my computer” problem.
From problems to solutions
There are a lot of issues to sort here, and something as drastic as competition policy may be needed to unleash productivity from better software for most firms. But I do think there is a potential solution for data scientists and people working on automation, and one that the CTO and CDO can happily support.
The problem we’re really trying to solve for data scientists who want to improve their organisation is: how can we run the latest, greatest packages on the same infrastructure without dealing with locked-down IT? And while also retaining as much of the safety that large organisations hanker for? Switching everyone to Linux might help, sure; it could save some money as the operating system is free (though human support isn’t), and there’s evidence that some firms using (free versions of) Linux are more productive (Nagle 2019). But this would require organisation-level change and mass upskilling, and is unlikely to happen due to proprietary software lock-in. Fortunately, there’s a simpler way.
My proposal is that we should simply stop coding on work laptops. Just stop. It is simply too difficult to get enterprise IT Windows laptops that are locked down to do everything we really need to improve an organisation while still satisfying the security constraints.
Where can I code?
So, if I’m saying do not code on your work Windows laptop, where should you code? The answer, in short, is the cloud. At its best, this provides an isolated, reproducible, environment. It completely solves the “it doesn’t work on my computer” problem. It solves the operating system problem too, because cloud computing can be on any operating system—including ones that are specified in code (“infrastructure as code”). It better integrates with (and even encourages) version control and Continuous Integration and Continuous Deployment (CI/CD). Best of all, these isolated environments aren’t subject to the vagaries of enterprise IT because they are separate, and accessed (typically) only through a browser window.
I want to be clear: this does not mean that doing your coding via cloud computing is unsafe. It’s almost certainly safer, and in many ways: whatever you are doing on the cloud should not go anywhere near your email application. If your IT department has cloud expertise, they can do things that will greatly reduce the risk of any kind of cloud-based data leak. Best practice for sensitive data is considered to be holding them in a secure cloud environment anyway. And, with asset-level control, you can grant access only to the users who need it—quite a contrast to having a writeable file on a network drive (yes, this still happens in 2023). Of course, someone actively trying to do harm still can, but this is true on any system.
There are a number of services now out there that provide these reproducible coding environments at low or even no cost, depending on the number of hours used per month. The big players are Google Cloud Compute, Amazon Web Services, and Microsoft Azure. These require a bit more expertise to set up, and typically have to be used with buy-in and help from architecture experts. But there are increasingly off the shelf reproducible code environments that you can use for all but very sensitive and or confidential data. These include:
- Github Codespaces, which has a free tier, uses Visual Studio Code by default, and can be accessed in browser or via Visual Studio Code desktop. It works at the level of a GitHub repo so has particularly good integration with version control.
- Gitpod, which has a free tier, uses Visual Studio Code by default, and can be accessed in browser or via Visual Studio Code desktop.
- Google Cloud Workstations, which takes more set up and uses Code-OSS (the open source version of Visual Studio Code).
Github Codespaces is probably the easiest service to access if your IT department has no real expertise in cloud computing. It only requires that each person has a GitHub account, that your IT department has unblocked Github’s website, and that you have some billing in place if you go over the free tier hours. Of course, Github is actually owned by Microsoft, and your firm is probably already buying Microsoft (“Nobody ever got fired for buying Microsoft”!) so all you need to do is to convince IT to pay for an extra service from a firm they already have a relationship with. (If you’re lucky, your IT department will already know a lot about cloud computing and have arranged empowered access to it for you—but a large number of firms are unlikely to be able to provide this.)
Now, you will still need to think carefully about where any data you are using will live and how you get it into your codespace (or similar). But having a (secure) connection to some sort of cloud storage bucket is a good default if there isn’t an API around that you can consume directly.
Summary
In short, if you’re looking to get a reproducible, working code environment that is consistent across users and you have an IT department that doesn’t have much expertise in coding or cloud computing, my recommendation is that you:
- do not try and get your IT dept to put analytical programming languages on work laptops;
- instead, get them to unblock GitHub and have any users create an account on it (with billing if necessary);
- use GitHub for version control; and
- use Github Codespaces for your coding environment (with optional docker containers for reproducible environments)