Optimize the Embedding Space for Improving RAG Image by author. AI generated.Embeddings are vector representations that capture the semantic meaning of words or sentences. Besides having quality data, choosing a good embedding model is the most important and underrated step for optimizing your RAG application. Multilingual models are especially challenging as most are pre-trained on…
Image by Author
You’ve read on these pages (and I’m guilty of writing some of those articles) that data science projects are crucial for developing the whole package of technical data science skills. That’s true, they are. But what’s also vital is having high-quality datasets for your data science projects. Collecting quality data…
Large Language Models, GPT-1 — Generative Pre-Trained Transformer | by Vyacheslav Efimov | Jan, 2024
Diving deeply into the working structure of the first version of gigantic GPT-models 2017 was a historical year in machine learning. Researchers from the Google Brain team introduced Transformer which rapidly outperformed most of the existing approaches in deep learning. The famous attention mechanism became the key component in the future models derived from…
Image by Author
Are you looking to switch to a data science career? If so, chances are you’ve already signed up for an online course, a bootcamp, or the like. Perhaps, you’ve bookmarked a self-study data science roadmap that you’re planning to work through. So how will yet another guide—this guide—help you?
If…
The way you retrieve variables from Airflow can impact the performance of your DAGs Photo by Daniele Franchi on UnsplashWhat happens if multiple data pipelines need to interact with the same API endpoint? Would you really have to declare this endpoint in every pipeline? In case this endpoint changes in the near future, you will…
Image by Author
In the world of data, SQL still stands as the lingua franca for interacting with databases.
Still today it stands as one of the most used languages to deal with data and is still considered a must-have for any good data professional.
However, anyone who has worked with complex SQL…
Recent advancements in generative models for text-to-image (T2I) tasks have led to impressive results in producing high-resolution, realistic images from textual prompts. However, extending this capability to text-to-video (T2V) models poses challenges due to the complexities introduced by motion. Current T2V models face limitations in video duration, visual quality, and realistic motion generation, primarily due…
A few personal lessons learned from developing LLM applications Source DALL·E 3 prompted with “Operationalizing LLMs, watercolor”It’s been fun posting articles exploring new Large Language Model (LLM) techniques and libraries as they emerge, but most of the time has been spent behind the scenes working on the operationalization of LLM solutions. Many organizations are working…
Image by Author
There are many courses and resources available on machine learning and data science, but very few on data engineering. This raises some questions. Is it a difficult field? Is it offering low pay? Is it not considered as exciting as other tech roles? However, the reality is that many companies…
When you think of AI, you might think of ChatGPT, AI-generated art, or maybe something like the Terminator. But let’s take a step back and ask the basic question, “What is AI?” AI is short for artificial intelligence — which may not tell us much because one of these words is problematic. The first word,…