A picture of me smiling at the thought of eating waffles in Camden.

Hi, I’m Kemi. I’m currently doing a masters degree in Data Science, and this is a place to explore my Data Visualisation skills using tools like Tableau, R, Python, D3.js and more.

I previously worked as a journalist, and wrote for The Mirror, The Week UK + many more publications. You can read some of my work at kemielizabeth.com

Please scroll down to see some Data Visualisations I’ve worked on.

To celebrate halloween 2020, here is an interactive visualisation showcasing the highest rated horror movies by year. I made this using the plotly package in python. The dataset was curated from the 10 highest grossing horror movies from 2015-present, and merged with a dataset of rotten tomato ‘tomato meter’ ratings. To view the interactive version of this visualisation, click here.

If you hover over each horizontal bar, you will see the name of the film, how much it made in the box office in it’s year of release, as well as the rotten tomato rating. You can also see, through the gradient colouring, that the darkest orange bars (ie. Venom, It) made the most in the box office, while the lightest did not. This is accompanied by the bar lengths, which show that while ‘A Quiet Place’ did not do spectacularly in the box office, it has the highest Rotten tomato rating.
This is an interactive data visualisation based on a dataset regarding pollution levels from 2013 and 2016 in London. It allows users to see how levels developed in their particular boroughs, and how schools in their areas were affected. To use the interactive version of this visualisation, please click here.
These are some tree visualisations learned by a decision tree algorithm, which was trained on some data containing variables like humidity, light and more, with the aim of predicting how these all affect the rates of room occupancy. I chose to use the rpart and rpart.plot packages in R for this model as it looked the most aesthetically pleasing, and displayed the data results as clearly and coherently as possible.
These visualisations were made with the ggplot2 package in R. I used a density plot, scatter plot and histogram to visualise the relationship between ‘Popularity’ and ‘Genre’ (coupled with ‘Energy’ in the Scatter plot). This was performed on a dataset obtained from Kaggle that portrayed 2019’s top 50 tracks on Spotify, ranked by variables like liveliness, energy, BPM, length, Danceability and more.
The bottom left linear regression was performed on a dataset in R, and it shows the strong negative relationship between the horsepower and miles per gallon variables. I also plotted the confidence and prediction intervals on the model. The three ROC curve visualisations were derived from a logistic regression model trained on a room occupancy dataset.
This visualisation was actually made in Tableau, and it’s a brief analysis of affiliate revenue, sales and clicks during lockdown. I imported the data into an Excel spreadsheet and cleaned it there too, before opening it up in Tableau and exploring the relationships between each of the variables through interactive scatter plots, clusters and line graphs. I was also keen to make it look eye-catching, hence the use of a multi-coloured palette.
A quick run through of some of the books I read (or re-read) last year that made the most impact. They’re grouped by genre, country and year of publication. I created the visualisations using Tableau and put them together in Photoshop.