In the trade flow maps, I aimed to represent two-way trade relationships between countries. For example, the export from Nepal to India would be represented by the first arrow (A1-A2) and the import by Nepal from India would be represented by a second arrow (A3-A4). In this way, each country pair relationship would require four…
Modern Data Warehousing. State-of-the-art data platform design | by đź’ˇMike Shakhomirov | Dec, 2023
State-of-the-art data platform design 12 min read · 13 hours ago Photo by Nubelson Fernandes on UnsplashIn this story, I will try to shed some light on the benefits of modern data warehouse solutions (DWH) compared to other data platform architecture types. I would dare to say that DWH is the most…
Using batting stats from Major League Baseball’s 2023 season Shohei Ohtani, photo by Erik Drost on Flikr, CC BY 2.0Outlier detection is an unsupervised machine learning task to identify anomalies (unusual observations) within a given data set. This task is helpful in many real-world cases where our available dataset is already “contaminated” by anomalies. Scikit-learn…
Understand the logic behind the fundamental algorithm used inside the gradient descent In time series analysis, there is often a need to understand the trend direction of a sequence by taking into account previous values. Approximation of the next values in a sequence can be performed in several ways, including the usage of simple…
Recall that Rule 6, from Part 1, shows how to make Rust SIMD algorithms fully generic across type and LANES. We next need to pick our algorithm and set LANES. In this rule, we’ll see how to use the popular criterion crate to benchmark and evaluate our algorithms and options. In the context of range-set-blaze,…
1. Initial Setup Before we start coding our AI agent, it is recommended that you have a solid understanding of Object Oriented Programming (OOP) principles in Python. If you do not have Python installed already, below is a simple tutorial by Bhargav Bachina to get you started. The version I will be using is 3.11.6.…
Why a funnel is the centre of the war between data’s heaviest hitters Unstructured data takes varying forms. It’s typically text-heavy, but may contain data such as dates, numbers, and dictionaries as well. Data Engineers commonly encounter unstructured data in the form of deeply-nested jsons. However the term “unstructured” data really refers to anything non-tabular;…
You might say 2023 was an eventful year for data scientists and ML professionals, but that wouldn’t quite capture the amount of hectic activity we’ve seen in the field in the past 12 months. As much as we always aim to resist hype and hyperbole, we have to concede that yes, we’ve seen some dramatic…
How to Improve Your ChatGPT Outputs Using Configuration Parameters | by Angelica Lo Duca | Dec, 2023
ChatGPT, Generative AI A focus on configuring the temperature, the Top P, the frequency penalty, and the presence penalty directly in your ChatGPT prompts Photo by Growtika on UnsplashI’ve recently been reading a very interesting book by David Clinton, entitled The Complete Obsolete Guide to Generative AI, published by Manning Publications. In the second chapter,…
Delving into one of the most common nightmares for data scientists Introduction One of the biggest problems in linear regression is autocorrelated residuals. In this context, this article revisits linear regression, delves into the Cochrane–Orcutt procedure as a way to solve this problem, and explores a real-world application in fMRI brain activation analysis. Photo by…