A step-by-step illustration of how to use SOLID to solve a refactoring challenge Photo by Lucas Davies on UnsplashIntroduction Code refactor challenges are well-known by software engineers, but less so by data scientists, though data scientists can also highly benefit from practising such challenges. By practising these, especially when applying the SOLID principles, you learn…
I discovered the Himalayan Database a few weeks ago and decided to create a few “whimsical” visualizations based on this dataset. In two previous articles I created a simple elevation plot for Everest expeditions and a plot showing the relative number of deaths for 5 Himalayan peaks. This time I wanted to explore expedition accident…
Data comes in different shapes and forms. One of those shapes and forms is known as categorical data. This poses a problem because most Machine Learning algorithms use only numerical data as input. However, categorical data is usually not a challenge to deal with, thanks to simple, well-defined functions that transform them into numerical values.…
Removing the outer border of Landsat satellite images using the stac file (source: author)Telling stories with satellite images is straightforward. The mesmerising landscapes do most of the work. Yet, visualising them takes some work such as selecting and scaling the RGB channels. In this article, we will go further. We will see how we can…
On a scale from 1 to 10 how good are your data ingestion skills? Photo by Blake Connally on UnsplashData ingestion is a crucial step in data engineering. Data engineers load huge amounts of data into various database systems for further transformation and processing. While dealing with relatively small amounts of data on staging we…
Four Apache Airflow internals you might have missed Image generated via DALL-EI have been working with Airflow for more than three years now and overall, I am quite confident with it. It’s a powerful orchestrator that helps me build data pipelines quickly and in a scalable fashion while for most things I am looking to…
Insights after two years in the industry Example of an encoder and a graph in the latent space (image by author)The scenario: a high-speed production line is producing thousands of products. Two cameras are installed to continuously control the quality of each product. The goal: develop an algorithm that can check each product as fast…
If you have been experimenting with large language models (LLMs) for search and retrieval tasks, you have likely come across retrieval augmented generation (RAG) as a technique to add relevant contextual information to LLM generated responses. By connecting an LLM to private data, RAG can enable a better response by feeding relevant data in the…
Image generated by MidjourneyAnd 5 ways to use it in data science and machine learning @property is my favorite decorator in Python. I have been using Python for many years now, and with each passing year, my expertise and comfort level with the language gradually grows. Among all the techniques and tricks that I’ve learned…
Do you find it difficult to keep up with the latest ML research? Are you overwhelmed with the massive amount of papers about LLMs, vector databases, or RAGs? In this post, I will show how to build an AI assistant that mines this large amount of information easily. You’ll ask it your questions in…