CFDI is an electronic invoicing (e-invoice) standard and related process required to conduct business in Mexico and Latin America. This article focuses on Mexico’s digital transformation program requiring CFDI-compliant electronic sales invoices instead of paper-based invoices in Mexico. What is CFDI? CFDI (Comprobante Fiscal Digital por Internet) is a government standard for conducting business in…
When working with time-series data it can be important to apply filtering to remove noise. This story shows how to implement a low-pass filter in SQL / BigQuery that can come in handy when improving ML features. Filtering of time-series data is one of the most useful preprocessing tools in Data Science. In reality, data…
Task-agnostic model pre-training is now the norm in Natural Language Processing, driven by the recent revolution in large language models (LLMs) like ChatGPT. These models showcase proficiency in tackling intricate reasoning tasks, adhering to instructions, and serving as the backbone for widely used AI assistants. Their success is attributed to a consistent enhancement in performance…
Where it stands out from other swarm algorithms This article is a continuation of my nature-inspired series. Previously, I talked about Evolutionary Algorithm (EA), Particle Swarm Optimization (PSO), as well as Artificial Bee Colony (ABC). Nature is everywhere, and there’s certainly more areas where humans can benefit by learning from nature. Today, we focus on…
Image by Author
You’re reading this because you’re thinking about joining the ranks of aspiring data scientists. And who can blame you? Data science is a growing field, even a decade after its now-infamous “sexiest job” accolade from the Harvard Business Review. The US Bureau of Labor Statistics currently predicts the employment rate…
In the constantly evolving field of machine learning, particularly in semantic segmentation, the accurate estimation and validation of uncertainty have become increasingly vital. Despite numerous studies claiming advances in uncertainty methods, there remains a disconnection between theoretical development and practical application. Fundamental questions linger, such as whether it is feasible to separate data-related (aleatoric) and…
Automate resource provisioning with modern tools 12 min read · 13 hours ago Photo by Ehud Neuhaus on UnsplashModern data stacks consist of various tools and frameworks to process data. Typically it would be a large collection of different cloud resources aimed to transform the data and bring it to the state…
Image Generated with DALL-E
In a time where data analytic processing is the critical difference between a successful business and not, we need a tool stack that could support the needs. The advancement of technology has helped advance all these data tools that we need, namely DuckDB and MotherDuck.
DuckDB is an…
The practical deployment of multi-billion parameter neural rankers in real-world systems poses a significant challenge in information retrieval (IR). These advanced neural rankers demonstrate high effectiveness but are hampered by their substantial computational requirements for inference, making them impractical for production use. This dilemma poses a critical problem in IR, as it is necessary to…
The question is not anymore whether we can solve the problem with AI but to what extent it returns sustainable and reliable results. Good craftsmanship, governance, ethics, and education on AI are what we need now. Photo by Karan Suthar on UnsplashSince I was a kid, I have always been intrigued and interested in new…