Skip to content Skip to sidebar Skip to footer

Comparing Outlier Detection Methods | by John Andrews | Dec, 2023

Using batting stats from Major League Baseball’s 2023 season Shohei Ohtani, photo by Erik Drost on Flikr, CC BY 2.0Outlier detection is an unsupervised machine learning task to identify anomalies (unusual observations) within a given data set. This task is helpful in many real-world cases where our available dataset is already “contaminated” by anomalies. Scikit-learn…

Read More

The Unstructured Data Funnel. Why a funnel is the centre of the war… | by Hugo Lu | Dec, 2023

Why a funnel is the centre of the war between data’s heaviest hitters Unstructured data takes varying forms. It’s typically text-heavy, but may contain data such as dates, numbers, and dictionaries as well. Data Engineers commonly encounter unstructured data in the form of deeply-nested jsons. However the term “unstructured” data really refers to anything non-tabular;…

Read More

How to Improve Your ChatGPT Outputs Using Configuration Parameters | by Angelica Lo Duca | Dec, 2023

ChatGPT, Generative AI A focus on configuring the temperature, the Top P, the frequency penalty, and the presence penalty directly in your ChatGPT prompts Photo by Growtika on UnsplashI’ve recently been reading a very interesting book by David Clinton, entitled The Complete Obsolete Guide to Generative AI, published by Manning Publications. In the second chapter,…

Read More

Solving Autocorrelation Problems in General Linear Model on a Real-World Application | by Rodrigo da Motta | Dec, 2023

Delving into one of the most common nightmares for data scientists Introduction One of the biggest problems in linear regression is autocorrelated residuals. In this context, this article revisits linear regression, delves into the Cochrane–Orcutt procedure as a way to solve this problem, and explores a real-world application in fMRI brain activation analysis. Photo by…

Read More

Evaluating RAG Applications with RAGAs | by Leonie Monigatti | Dec, 2023

RAGAs (Retrieval-Augmented Generation Assessment) is a framework (GitHub, Docs) that provides you with the necessary ingredients to help you evaluate your RAG pipeline on a component level. Evaluation Data What’s interesting about RAGAs is that it started out as a framework for “reference-free” evaluation [1]. That means, instead of having to rely on human-annotated ground…

Read More