Guide to using the standardized syntax within Tidymodels to build and compare various models and metrics When I first learned to build models, eons ago, there were many approaches to constructing models across different packages with different parameter names. Then…
Source link
How I became proficient in SQL to help land my first data science job Photo by Windows on UnsplashSo, you want to learn SQL? Well, in this article, I will run through how I learned SQL in just 2 weeks, which helped me land my first entry-level data science role.
Source link
How Transformer architecture has been adapted to computer vision tasks Photo by kyler trautner on UnsplashIn 2017, the paper “Attention is all you need” [1] took the NLP research community by storm. Cited more than 100,000 times so far, its Transformer has become the cornerstone of most major NLP architectures nowadays. To learn about…
…
Discussing the basic principles and methodology of data validation Photo by Vardan Papikyan on UnsplashAlthough it may not be the most glamorous aspect of data work, data validation is crucial to any data-related task. Data validation can be tedious. When we think of validation of data, what is the first thing that comes into your…
Choosing between frequentist and Bayesian approaches is the great debate of the last century, with a recent surge in Bayesian adoption in the sciences. Number of articles referring Bayesian statistics in sciencedirect.com (April 2024) — Graph by the authorWhat’s the difference? The philosophical difference is actually quite subtle, where some propose that the great bayesian…
A deep dive into biases in machine learning, with a focus on historical (or social) biases. Humans are biased. To anyone who has had to deal with bigoted individuals, unfair bosses, or oppressive systems — in other words, all of us — this is no surprise. We should thus welcome machine learning models which can…
Sports Analytics Which players could help Fulham overcome their major flaws? Photo by Mario Klassen on UnsplashSome days ago, I was fortunate to be able to participate in a football analytics hackathon that was organized by xfb Analytics[1], Transfermarkt[2], and Football Forum Hungary[3]. As we recently received permissions to share our work, I decided to…
Fabric Madness part 2 Image by author and ChatGPT. “Design an illustration, focusing on a basketball player in action, this time the theme is on using pyspark to generate features for machine leaning models in a graphic novel style” prompt. ChatGPT, 4, OpenAI, 4 April. 2024. https://chat.openai.com.A Huge thanks to Martim Chaves who co-authored this…
A comparative overview What is cuDF Pandas? If you’re a user of the Pandas library in Python, and you want or need to maximise your program run times, then you have a few options available to you. Most of these options revolve around the use of external libraries that supplant existing Pandas operations and are…
Missing puzzle piece to LLM Enterprise Augmentation Since early last year, when we led the development of an enterprise-level GenAI-as-a-service platform, we have understandably been bombarded with questions like “What are the art of possibles for …” or “Can LLM do …” In this blog post, we will dive into a critical skill that will…