Repositories with the most stars! Happy New Year 2024! As the first post in the new year, just like what I did before, I’m very curious about what were the most popular Python projects so far. GitHub is definitely the most suitable place to have these statistics. Although not all the open-sourced projects will be…
Use various data source types to quickly generate text data for artificial datasets. Image generated with DALL-E 3In a previous article, we explored creating many-to-one relationships between columns in a synthetic PySpark DataFrame. This DataFrame only consisted of Foreign Key information and we didn’t produce any textual information that might be useful in a demo…
Part 3: Causality Image by Cottonbro Studios from Pexels.comMy hope is that by the end of this article you will have a good understanding of how philosophical thinking around causation applies to your work as a data scientist. Ideally you will have a deeper philosophical perspective to give context to your work! This is the…
Discussion backed up by some concrete examples, sketching broad guidelines on how to develop better AI systems Photo by National Cancer Institute on UnsplashArtificial Intelligence has become an integral tool in scientific research, but concerns are growing that the misuse of these powerful tools is leading to a reproducibility crisis in science and its technological…
PYTHON PROGRAMMING Tuples are a powerful Python type — but named tuples even more so! Named tuples join the strengths of names and tuples. Photo by Ainur Iman on UnsplashThe three most popular Python data types are the list, the dictionary, and the tuple. Lists and dictionaries are mutable, meaning that their elements can be…
Geospatial indexing, or Geocoding, is the process of indexing latitude-longitude pairs to small subdivisions of geographical space, and it is a technique that we data scientists often find ourselves using when faced with geospatial data. Though the first popular geospatial indexing technique “Geohash” was invented as recently as 2008, indexing latitude-longitude pairs to manageable subdidivisions…
A Clinical Perspective on Medical Innovation Image generated by Dall-E 3Being an oncologic surgeon is my primary job and passion. It allows me to interact with people and immerse myself in the healthcare system, not the fancy corporate Healthcare, just everyday medicine. And, as a researcher in AI, I’m noticing a growing disconnect between…
Boost the performance of your supervised fine-tuned models 10 min read · 14 hours ago Image by authorPre-trained Large Language Models (LLMs) can only perform next-token prediction, making them unable to answer questions. This is why these base models are then fine-tuned on pairs of instructions and answers to act as helpful…
Learning to code when AI assistants already master the skill Image created by author using Midjourney.The revelation came in the summer of 2023, when I took on a high school student as a summer intern. Their task was to develop a machine learning model to predict air quality in our city, using Jupyter notebooks, basic…
What can Microsoft Fabric Bring to the Table in 2024? 17 min read · 17 hours ago Photo by Ricardo Loaiza on UnsplashIntroduction What is Microsoft Fabric? The Major Components of Microsoft Fabric 3 Upsides to Using Microsoft Fabric 3 Downsides to Using Microsoft Fabric Should you Change? Wrapping Up Microsoft Fabric…