Building a 28% more accurate multimodal image search engine with VLMs. Until recently, AI models were narrow in scope and limited to understanding either language or specific images, but rarely both. In this respect, general language models like GPTs were a HUGE leap since we went from specialized models to general yet much more powerful…
