My learnings from Databricks customer engagements Figure 1: a technical diagram of how to write apache spark. Image by author.At Databricks, I help large retail organizations deploy and scale data and machine learning pipelines. Here are the 8 most important spark tips/tricks I’ve learned in the field. Throughout this post, we assume a general working…
A comprehensive guide to the best open-source GIS software 11 min read · 14 hours ago Photo by Louis Hansel on UnsplashMore than 10 years when I started my data career as a GIS (Geographic Information System) analyst, two pieces of do-it-all GIS software were prominent. 10 years later, it is still…
Here we won’t start from scratch. As stated earlier, we already developed the code that builds a Pyomo model of the TSP and solves it in sprint 3. And trust me, that was the hardest part. Now, we have the easier task of organizing what we did in a way that makes it general, hiding…
Additionally, Gaussian splatting doesn’t involve any neutral network at all. There isn’t even a small MLP, nothing “neural”, a scene is essentially just a set of points in space. This in itself is already an attention grabber. It is quite refreshing to see such a method gaining popularity in our AI-obsessed world with research companies…
The recent exponential advances in natural language processing capabilities from large language models (LLMs) have stirred tremendous excitement about their potential to achieve human-level intelligence. Their ability to produce remarkably coherent text and engage in dialogue after exposure to vast datasets seems to point towards flexible, general purpose reasoning skills. However, a growing chorus of…
When I began my data science journey in grad school, I had a naive view of the discipline. Namely, I was hyper-focused on learning tools and technologies (e.g. LSTM, SHAP, VAE, SOM, SQL, etc.) While a technical foundation is necessary to be a successful data scientist, focusing too much on tools creates the “Hammer Problem”…
Math behind this parameter efficient finetuning method Fine-tuning large pre-trained models is computationally challenging, often involving adjustment of millions of parameters. This traditional fine-tuning approach, while effective, demands substantial computational resources and time, posing a bottleneck for adapting these models to specific tasks. LoRA presented an effective solution to this problem by decomposing the update…
1. Choosing a Chatbot As simple as this one may sound, it is far from a trivial question. The options are manifold and include choosing to build your own chatbot using open-sourced code.[1] Using one of the gazillion chatbot APIs offered on the market, that allow you the simplest and quickest ready-set-go set-up.[2] Finetuning your…
Social media spam as a case study Photo by Nong on UnsplashDisclaimer: the examples in this post are for illustrative purposes and are not commentary on any specific content policy at any specific company. All views expressed in this article are mine and do not reflect my employer. Why is there any spam on social…
GENERATIVE AI A step-by-step tutorial on query SQL databases with human language Image by the author (generated via Midjourney)Many businesses have a lot of proprietary data stored in their databases. If there’s a virtual agent that understands human language and can query these databases, it opens up big opportunities for these businesses. Think of customer…