Multimodal Large Language Models (MLLMs), having contributed to remarkable progress in AI, face challenges in accurately processing and responding to misleading information, leading to incorrect or hallucinated responses. This vulnerability raises concerns about the reliability of MLLMs in applications where accurate interpretation of text and visual data is crucial.
Recent research has explored visual instruction…
