Ten of my LinkedIn posts on LLMs 1. Non-determinism in LLMs The best LLM use cases are where you use LLM as a tool rather than expose it directly. As Richard Seroter says, how many chatbots do you need? However, this use case of replacing static product pages by personalized product summaries is like many…
Experimenting with Large Language Models for free (Part 2) Photo by Glib Albovsky, UnsplashIn the first part of the story, we used a free Google Colab instance to run a Mistral-7B model and extract information using the FAISS (Facebook AI Similarity Search) database. In this part, we will go further, and I will show how…
Repositories with the most stars! Happy New Year 2024! As the first post in the new year, just like what I did before, I’m very curious about what were the most popular Python projects so far. GitHub is definitely the most suitable place to have these statistics. Although not all the open-sourced projects will be…
Use various data source types to quickly generate text data for artificial datasets. Image generated with DALL-E 3In a previous article, we explored creating many-to-one relationships between columns in a synthetic PySpark DataFrame. This DataFrame only consisted of Foreign Key information and we didn’t produce any textual information that might be useful in a demo…
Part 3: Causality Image by Cottonbro Studios from Pexels.comMy hope is that by the end of this article you will have a good understanding of how philosophical thinking around causation applies to your work as a data scientist. Ideally you will have a deeper philosophical perspective to give context to your work! This is the…
Discussion backed up by some concrete examples, sketching broad guidelines on how to develop better AI systems Photo by National Cancer Institute on UnsplashArtificial Intelligence has become an integral tool in scientific research, but concerns are growing that the misuse of these powerful tools is leading to a reproducibility crisis in science and its technological…
PYTHON PROGRAMMING Tuples are a powerful Python type — but named tuples even more so! Named tuples join the strengths of names and tuples. Photo by Ainur Iman on UnsplashThe three most popular Python data types are the list, the dictionary, and the tuple. Lists and dictionaries are mutable, meaning that their elements can be…
Geospatial indexing, or Geocoding, is the process of indexing latitude-longitude pairs to small subdivisions of geographical space, and it is a technique that we data scientists often find ourselves using when faced with geospatial data. Though the first popular geospatial indexing technique “Geohash” was invented as recently as 2008, indexing latitude-longitude pairs to manageable subdidivisions…
A Clinical Perspective on Medical Innovation Image generated by Dall-E 3Being an oncologic surgeon is my primary job and passion. It allows me to interact with people and immerse myself in the healthcare system, not the fancy corporate Healthcare, just everyday medicine. And, as a researcher in AI, I’m noticing a growing disconnect between…
Boost the performance of your supervised fine-tuned models 10 min read · 14 hours ago Image by authorPre-trained Large Language Models (LLMs) can only perform next-token prediction, making them unable to answer questions. This is why these base models are then fine-tuned on pairs of instructions and answers to act as helpful…