Comparison Of Blip2 Captioning Models With 1 Click Windows & RunPod Installer (Patreon)
Content
I have recently coded from a scratch Gradio app for the famous Blip2 captioning models.
1 Click auto installers with instructions are posted here : https://www.patreon.com/posts/sota-image-for-2-90744385
This post also have 1 click Windows & RunPod installers with Gradio interfaces supporting batch captioning as well for the following image vision models : LLaVA (4-bit, 8-bit, 16-bit, 7b, 13b, 34b), Qwen-VL (4-bit, 8-bit, 16-bit), Clip_Interrogator Gradio APP that supports 115 Clip Vision models with combination of 5 caption models.
All precisions are working on Windows as well with our special installers.
16-bit mode works fastest meanwhile 8-bit mode works slowest. 4-bit mode is slower than 16-bit precision but faster than 8-bit precision.
Look at all the information below.
Blip 2 Models Batch Image Captioning App
The testings are as below.
When doing batch processing, only 1 image at a time is captioned. So there weren't parallel captioning of images.
Salesforce/blip2-opt-6.7b — 16-bit precision
Batch processing speed on RTX A6000 : Speed: 0.32 second/image
<figure>![](https://miro.medium.com/v2/resize:fit:875/1*o3JPh2Da-_bdijucQc_hxA.png)
Salesforce/blip2-opt-6.7b — 8-bit precision
Batch processing speed on RTX A6000 : Speed: 1.7 second/image
<figure>![](https://miro.medium.com/v2/resize:fit:875/1*iE5nwcLOhxcBNBMfcH11tw.png)
Salesforce/blip2-opt-6.7b — 4-bit precision
Batch processing speed on RTX A6000 : Speed: 0.65 second/image
<figure>![](https://miro.medium.com/v2/resize:fit:875/1*glfI1mxxDWiSKUW0xDaVnw.png)
Salesforce/blip2-flan-t5-xxl— 16-bit precision
Batch processing speed on RTX A6000 : Speed: 0.41 second/image
<figure>![](https://miro.medium.com/v2/resize:fit:875/1*YcPfxsIVwbPbGB9PZsKx-w.png)
Salesforce/blip2-flan-t5-xxl — 8-bit precision
Batch processing speed on RTX A6000 : Speed: 1.6 second/image
<figure>![](https://miro.medium.com/v2/resize:fit:875/1*roNWZVRfl-I5rfMGaT8IYA.png)
Salesforce/blip2-flan-t5-xxl — 4-bit precision
Batch processing speed on RTX A6000 : Speed: 0.82 second/image
<figure>![](https://miro.medium.com/v2/resize:fit:875/1*LtUVcPGHIwX3PsOTFwSq5A.png)
Salesforce/blip2-opt-6.7b-coco— 16-bit precision
Batch processing speed on RTX A6000 : Speed: 0.39 second/image
<figure>![](https://miro.medium.com/v2/resize:fit:875/1*tc4JNK-D7Fg3bxMny80OWw.png)
Salesforce/blip2-opt-6.7b-coco — 8-bit precision
Batch processing speed on RTX A6000 : Speed: 2.01 second/image
<figure>![](https://miro.medium.com/v2/resize:fit:875/1*ilCCfeP_3zwqPs9RduKiJQ.png)
Salesforce/blip2-opt-6.7b-coco — 4-bit precision
Batch processing speed on RTX A6000 : Speed: 0.74 second/image
<figure>![](https://miro.medium.com/v2/resize:fit:875/1*ClJ6SCIGVo9_PdAb2R6nqg.png)