Hi! I’m a data scientist / neuroscientist / cat lover interested in the complex
dynamics of the world we live in and how we can use data and science to solve
humanity’s big and small challenges. One idea at a time.
Fun fact: I started university when I was 15 years old and got my PhD at 25,
and, while I wouldn’t recommend it, this has allowed me to explore many of my
interests throughout my life and to develop a diverse and unique set of skills
🙃 You can find out more about my professional background on
LinkedIn

Posts
This is a selection of hobby projects I have worked on in the past:
Detecting fraudulent transactions is essential in keeping financial systems trustworthy. Traditionally, fraud detection is done through the analysis and vetting of carefully engineered features of individual transactions or of the individual entities involved (companies, accounts, individuals). Here I illustratre an end-to-end approach of node classification by graph neural networks to identify suspicious transactions. I compare my results on the elliptic dataset with the available literature and propose further ideas to be explored in the future.
Dec 15, 2019
Some time ago at the beginning of the COVID-19 pandemic, I decided to create a Twitter bot to automatically gather the latest data in Germany and share it to the Twittersphere. I also created and deployed a dashboard heroku app (links below). I was motivated by the lack, at the time, of easily accessible incidence and ICU occupancy data at a local and city level. Back then, only aggregated data by _Bundesland_ and at a national level were available through the official Robert Koch Institute website. Meanwhile, both the RKI and DIVI for ICU data have improved their dashboard data granularity. I still maintain both the Twitter bot and the dashboard, as they run with little overhead. Link to the dashboard
Jan 1, 2021
This is a time-series analysis of activity and sleep data from a fitbit user throughout a year. I use this data to predict an additional year of the life of the user using Generalized Additive Models.
Apr 1, 2018
This was my approach to the Personalized Healthcare Redefining Cancer Treatment Kaggle competition. The goal of the competition was to create a machine learning algorithm that can classify genetic variations that are present in cancer cells.
Oct 7, 2017
Health care systems world-wide are under pressure due to the high costs associated with disease. In this post, I performed an analysis of Medicare data in the USA. Furthermore I used a drug-disease open database to cluster the costs by disease. I identified the most expensive diseases (mostly chronic diseases such as Diabetes) and the most expensive medicines.
Feb 6, 2017
In this post, I will use Python to visualize two different series of events, plotting them on top of each other to gain insights from time series data."
Feb 6, 2017
I take a look at how we can model the future revenue of a product by making certain assumptions and running a Monte Carlo simulation.
Oct 15, 2016