LLaVA vs. GPT-4V Amidst Snow Geese Migration

The Goose Chase for Large Multimodal Model Supremacy

Published in

Level Up Coding

10 min readNov 13, 2023

Photo by author at Middle Creek Wildlife Refuge in Pennsylvania

We explored LLaVA (Large Language and Vision Assistant) 1.5, a large multimodal model, in our last article. OpenAI released GPT-4V (GPT-4 with Vision), a multimodal model, on its first-ever DevDay on November 6, 2023. In this article, let’s put them side by side, compare and contrast their pros and…

LLaVA vs. GPT-4V Amidst Snow Geese Migration

The Goose Chase for Large Multimodal Model Supremacy

Written by Wenqi Glantz