And easy solutions that can immediately turn them around Photo by t Kaiser on UnsplashEvery data engineer wants to feel like they are constantly evolving as a professional and growing their technical skills. As data engineers we like to be challenged and feel we are progressing towards our end goal. This is the nature of…
In this case, assuming I am the owner of an ecommerce website. I would like to create a Chatbot, so my users can ask specific questions regarding anything about this website (price, product, service, shipping, etc.) as they are in the store. The Chatbot will be supplied with the “private knowledge” and ground its answers…
After a credit card? An insurance policy? Ever wondered about the three-digit number that shapes these decisions? Introduction Scores are used by a large number of industries to make decisions. Financial institutions and insurance providers are using scores to determine whether someone is right for credit or a policy. Some nations are even using social…
Learn how to ensure the quality of your embeddings, which can be essential for your machine-learning system. Creating quality embeddings is an essential part of most AI systems. Embeddings are the foundation on which an AI model can do its job, and creating high-quality embeddings is, therefore, an important element in making high-accuracy AI models.…
If you like or want to learn machine learning with scikit-learn, check out my tutorial series on this amazing package: Sklearn tutorial All images by author. Dummy models are very simplistic models that are meant to be used as a baseline to compare your actual models. A baseline is just some kind of reference…
The dataset used in Part 1 is simple and can be easily modeled with just a mixture of Gaussians. However, most real-world datasets are far more complex. In this part of the story, we will apply several synthetic data generators to some popular real-world datasets. Our primary focus is on comparing the distributions of maximum…
In this new post I present the outcome of my quest for the most advanced and powerful libraries for web-based data visualization and analysis as judged by me after a careful analysis of performance, flexibility, and richness of features. Some of the libraries I selected are not popular at all, but they offer surprising capabilities…
As human beings, we can read and understand texts (at least some of them). Computers in opposite “think in numbers”, so they can’t automatically grasp the meaning of words and sentences. If we want computers to understand the natural language, we need to convert this information into the format that computers can work with —…
How to know the unknowable in observational studies Introduction Problem Setup 2.1. Causal Graph 2.2. Model With and Without Z 2.3. Strength of Z as a Confounder Sensitivity Analysis 3.1. Goal 3.2. Robustness Value PySensemakr Conclusion Acknowledgements References The specter of unobserved confounding (aka omitted variable bias) is a notorious problem in observational studies. In…
Automation, machine learning and LLMs in the chip industry (source: chatGPT)I felt like one of those guys from Monsters Inc. You know, the ones in the big yellow hazmat suits. A necessary precaution! I was entering the most complex manufacturing environment in the world. One that requires so much precision that even microscopic particulates from…