LLaVA vs. GPT-4V Amidst Snow Geese Migration

The Goose Chase for Large Multimodal Model Supremacy

Wenqi Glantz
Level Up Coding
Published in
10 min readNov 13, 2023

--

Photo by author at Middle Creek Wildlife Refuge in Pennsylvania

We explored LLaVA (Large Language and Vision Assistant) 1.5, a large multimodal model, in our last article. OpenAI released GPT-4V (GPT-4 with Vision), a multimodal model, on its first-ever DevDay on November 6, 2023. In this article, let’s put them side by side, compare and contrast their pros and…

--

--