In this TIL, I find out how to create a new MySQL database on Microsoft Azure. This is a place to store structured, tabular data. Note that the instructions below assume you are using a bash-like terminal, for example zsh, rather than Powershell.
Prerequisites
You’ll need to sign up for a Microsoft Azure account for this, and create a “resource group”. You’ll also need the Azure Command Line Interface (CLI), which you can find information on here. (Alternatively, you can do this through the Azure Portal, but it’s often useful to have the reproducible commands.)
It’s also helpful to understand the lingo and how different bits of the cloud-based SQL setup relate to one another:
- there is the resource group, which many Azure services could live under
- there is a server, which co-ordinates resources
- there are individual databases, which live on particular services
These three are hierarchical: a database lives on a server that sits under a particular resource group.
You may be wondering why MySQL rather than another type of SQL database? The pricing information is quite difficult to parse, but for hobby projects, it looks like a MySQL database is a lot cheaper than an Azure SQL (MSSQL) database. A basic MySQL database (burstable, b1ms model) is around US$13.00 per month whereas a provisioned Azure SQL database is hundreds of dollars a month. The latter option has a more complex set of options too, though, so there’s more to make a mistake with. So we’ll be going with the open source MySQL in this case.
Instructions
Creating a MySQL server
First set and load your configs. You’ll need to decide on a login and password for your SQL server.
resourcegroup=<>
server=<>
login=<>
password=<>
location=<> # eg "uksouth"
Then to create the server (with conservative cost settings), it’s
az mysql flexible-server create --location $location --resource-group $resourcegroup --name $server --admin-user $login --admin-password $password --sku-name Standard_B1ms --tier Burstable --public-access Enabled --version 5.7 --high-availability Disabled --storage-auto-grow Disabled
Configure permissions to use the SQL server
The default options are cautious on two sets of permissions, the certificate and the IP addresses that are allowed to connect. We need to open these up for your computer.
Go on to Azure portal page for the newly created server and click “configure networking”. Then enable connections from your specific IP address through the firewall (“add current client IP address”). On the same page, there’s an option to download the SSL certificate. Do this, and put it in a secrets folder that is not under version control.
Connect to the SQL server using the command line
Let’s check that it worked! Assuming you saved the SSL certificate as sql_cert.crt.pem
in the same directory in which you are working, you should be able to run the following on the command line:
mysql -h $server.mysql.database.azure.com -u $login -p "--ssl-ca=sql_cert.crt.pem"
If you saved the certificate in secrets/
, change the path in the above command appropriately.
Note that you may need to regenerate the .pem
file from time to time.
Note also that if you are accessing this resource through a VPN or other managed network, this is where you may have difficulties. To circumvent these, you can use a normal internet connection or connect from a virtual machine.
Working with the SQL database from Python
Python has a number of great tools for working with SQL databases. For this, we’ll look at just two: MySQL Connector and the wonderful Ibis project, which translates across a wide number of data analysis backends.
Just as we did before, we need to plug in our credentials. The syntax is slightly different from bash and it’s recommended to use a secrets manager like python-dotenv to manage these and ensure they do not get mistakenly committed to version control.
= <>
username = <>
password = "<>.mysql.database.azure.com" hostname
Connecting and creating a database with MySQL Connector
Next, we start up the connection.
= mysql.connector.connect(
mydb =hostname,
host=username,
user=password,
password="secrets/sql_cert.crt.pem",
ssl_ca )
where it’s assumed that the certificate is in a folder called secrets.
If running this works, you won’t see an error message or a confirmation message, but you can check all is as it should be by running:
print(mydb)
which should result in something like:
<mysql.connector.connection_cext.CMySQLConnection object at 0x3213r4g90>
Now it’s not very exciting to have a SQL server with no databases in. So, next, we’re going to create a database.
= "CREATE DATABASE online_movie_rating"
create_db_query = mydb.cursor()
mycursor mycursor.execute(create_db_query)
If you then run
"Show databases;")
mycursor.execute(for db in mycursor:
print(db)
you’ll get a list of databases that should include online_movie_rating
.
But an empty database is no fun—let’s populate it. First, the table we wish to create:
= """
create_movies_table_query CREATE TABLE movies(
id INT AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(100),
release_year YEAR(4),
genre VARCHAR(100),
collection_in_mil INT
)
"""
And then the execution of that create table command:
= mysql.connector.connect(
mydb =hostname,
host=username,
user=password,
password="online_movie_rating",
database="secrets/sql_cert.crt.pem",
ssl_ca
)= mydb.cursor()
mycursor
mycursor.execute(create_movies_table_query) mydb.commit()
Note the difference with the earlier command—this one has a database=
keyword argument in it.
With the table created, we can also populate it with some data:
= """
insert_movies_query INSERT INTO movies (title, release_year, genre, collection_in_mil)
VALUES
("Forrest Gump", 1994, "Drama", 330.2),
("3 Idiots", 2009, "Drama", 2.4),
("Eternal Sunshine of the Spotless Mind", 2004, "Drama", 34.5),
("Good Will Hunting", 1997, "Drama", 138.1),
("Skyfall", 2012, "Action", 304.6),
("Gladiator", 2000, "Action", 188.7),
("Black", 2005, "Drama", 3.0),
("Titanic", 1997, "Romance", 659.2),
("The Shawshank Redemption", 1994, "Drama",28.4),
("Udaan", 2010, "Drama", 1.5),
("Home Alone", 1990, "Comedy", 286.9),
("Casablanca", 1942, "Romance", 1.0),
("Avengers: Endgame", 2019, "Action", 858.8),
("Night of the Living Dead", 1968, "Horror", 2.5),
("The Godfather", 1972, "Crime", 135.6),
("Haider", 2014, "Action", 4.2),
("Inception", 2010, "Adventure", 293.7),
("Evil", 2003, "Horror", 1.3),
("Toy Story 4", 2019, "Animation", 434.9),
("Air Force One", 1997, "Drama", 138.1),
("The Dark Knight", 2008, "Action",535.4),
("Bhaag Milkha Bhaag", 2013, "Sport", 4.1),
("The Lion King", 1994, "Animation", 423.6),
("Pulp Fiction", 1994, "Crime", 108.8),
("Kai Po Che", 2013, "Sport", 6.0),
("Beasts of No Nation", 2015, "War", 1.4),
("Andadhun", 2018, "Thriller", 2.9),
("The Silence of the Lambs", 1991, "Crime", 68.2),
("Deadpool", 2016, "Action", 363.6),
("Drishyam", 2015, "Mystery", 3.0)
"""
mycursor.execute(insert_movies_query) mydb.commit()
Let’s see if we can retrieve some of this:
= "SELECT * FROM movies LIMIT 5"
select_movies_query
mycursor.execute(select_movies_query)= mycursor.fetchall()
result for row in result:
print(row)
(1, 'Forrest Gump', 1994, 'Drama', 330)
(2, '3 Idiots', 2009, 'Drama', 2)
(3, 'Eternal Sunshine of the Spotless Mind', 2004, 'Drama', 35)
(4, 'Good Will Hunting', 1997, 'Drama', 138)
(5, 'Skyfall', 2012, 'Action', 305)
Finally, to close the connection, use mydb.close()
.
Using Ibis to work with a cloud-based SQL database
The Ibis project makes working with a ton of different back-end data tools easy by making the syntax consistent. Examples are: MySQL, BigQuery, DuckDB, pandas, MSSQL, and much, much more!
Once you’ve imported Ibis (import ibis
), you may find it more fun and useful to use it in interactive mode, which looks more like how pandas works:
= True ibis.options.interactive
To see some data, we need to create a connection:
= ibis.mysql.connect(
con =username,
user=password,
password=hostname,
host=3306,
port="online_movie_rating",
database="sql/sql_cert.crt.pem",
ssl_ca )
And to grab our “movies” table again, but as a pandas dataframe:
= con.table("movies")
t t.head().to_pandas()
id | title | release_year | genre | collection_in_mil | |
---|---|---|---|---|---|
0 | 1 | Forrest Gump | 1994 | Drama | 330 |
1 | 2 | 3 Idiots | 2009 | Drama | 2 |
2 | 3 | Eternal Sunshine of the Spotless Mind | 2004 | Drama | 35 |
3 | 4 | Good Will Hunting | 1997 | Drama | 138 |
4 | 5 | Skyfall | 2012 | Action | 305 |
Ibis is great to explore data. It has fully fledged pandas-style and SQL-style APIs. So you can do:
filter(t.release_year<1996) t.
But you can also get this as SQL:
filter(t.release_year<1996)) ibis.to_sql(t.
SELECT
id,
t0.
t0.title,
t0.release_year,
t0.genre,
t0.collection_in_milFROM movies AS t0
WHERE
< 1996 t0.release_year
It’s a really powerful downstream analysis tool.
And that’s it! Most other SQL commands are as you’d expect and can be looked up in the MySQL documentation.