Skip to content

PyTorch implementation of Google’s Paligemma VLM with SigLip image encoder, KV caching, Rotary embeddings and Grouped Query attention . Modular, research-friendly, and easy to extend for experimentation.

Notifications You must be signed in to change notification settings

6DEADSHOT9/Pali-pa-Jamma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PaliGemma

This repository contains an implementation of Google's Paligemma model in PyTorch.

Features

  • PyTorch-based implementation

  • Modular configuration for vision transformer models

  • Easily extensible for research and experimentation

  • Contains the implementation of SigLip.

You can modify or extend the model in siglip.py.

Project Structure

  • siglip.py: Contains the Siglip model implementation.

  • requirements.txt: Python dependencies.

  • pyproject.toml: Project metadata and dependencies.

License

This project is for research and educational purposes only. Please refer to Google's original Paligemma license for usage restrictions.

About

PyTorch implementation of Google’s Paligemma VLM with SigLip image encoder, KV caching, Rotary embeddings and Grouped Query attention . Modular, research-friendly, and easy to extend for experimentation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages