microsoft/kosmos-2.5 · Kosmos-2.5 - Containerized & made available over an API

While Kosmos-2.5 is an incredibly useful model, and especially precious as an open-source MLLM that excels at OCR (not so much at markdown-generation in my testing!), it is also incredibly difficult to deploy & get working locally. It's even more difficult to deploy it in a useful way - one wherein it can be made available usefully to other applications & for development tasks.

This is due in large part to its many specific requirements, both hardware and software. One such example is its use of a custom version of the transformers library: Kosmos-2.5 requires a special "v4.32.0.dev0", whereas newer LLMs such as Google's Gemma2 require more recent version to work correctly. Another is the custom Fairseq lib that does not work outside of Python v3.10. Such requirements can hamper other applciation development & utilization tasks.

In terms of hardware requirements, the use of Flash Attention limits the userbase to specific generations of Nvidia GPUs only.

While not much can be done about the hardware requirements, I did see an opportunity to ease up the software challenges: by containerizing the model and its dependencies and leveraging PyFlask to expose the model over a RESTful API, Kosmos-2.5 can be made available as a service, thus providing fully local & high-performance OCR capabilities leveraging a cutting-edge MLLM!

I have open-sourced the prebuilt images, and detailed everything pertaining to the deployment and use of these images, along with all details on how to build these images from scratch and even deploy the model uncontainerized in my GitHub repo: https://github.com/abgulati/kosmos-2_5-containerized

Hope this helps the community deploy, port and use the model more easily!