🕹️ Try and Play with VAR! We provide a demo website for you to play with VAR models and generate images interactively. Enjoy the fun of visual autoregressive modeling! We provide a demo website for ...
Abstract: Vision-Language Models (VLMs), such as CLIP, excel in zero-shot image-level visual understanding but struggle with object-based tasks requiring precise localization and recognition. Visual ...
If you find any work missing or have any suggestions (papers, implementations, and other resources), feel free to pull requests. We will add the missing papers to this repo as soon as possible. You ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results