Vision Transformers · Self-Supervised Learning · Multimodal Reasoning
I am a Deep Learning Engineer and AI researcher working at the intersection of computer vision, multimodal learning, and large-scale model training. My work focuses on pushing transformer architectures into resource-constrained, real-world settings — from medical imaging to autonomous perception. I enjoy turning recent papers into production systems, and shipping models that survive contact with real data.
Highlights
- Trained ViT-B/16 from scratch on a custom 5M image dataset
- Published at top-tier vision conferences (CVPR / ICCV workshop track)
- Led on-device CV inference on Jetson Orin (sub-30ms latency)
- Open-source contributor with 1.5k+ GitHub stars across repos