Video generation has rapidly become a focal point in artificial intelligence research, especially in generating temporally consistent, high-fidelity videos. This area involves creating video sequences that maintain visual coherence across frames and preserve details over time. Machine learning models, particularly diffusion transformers (DiTs), have emerged as powerful tools for these tasks, surpassing previous methods like…
Building a 28% more accurate multimodal image search engine with VLMs. Until recently, AI models were narrow in scope and limited to understanding either language or specific images, but rarely both. In this respect, general language models like GPTs were a HUGE leap since we went from specialized models to general yet much more powerful…
Understanding and analyzing long videos has been a significant challenge in AI, primarily due to the vast amount of data and computational resources required. Traditional Multimodal Large Language Models (MLLMs) struggle to process extensive video content because of limited context length. This challenge is especially evident with hour-long videos, which need hundreds of thousands of…
Technologies
Published
30 October 2024
…
What working as a data scientist at various companies and industries over the past 6+ years has taught me of the future of data science and AI engineering GenAI and Large Language Models (LLMs) continue changing how we work and what work will mean in the future, especially for the data science domain, where in…