The problem
Public CLIP weights are biased toward web data. Domain-specific retrieval needs custom training.
The approach
Two-tower contrastive architecture, mixed-precision training across 8×A100, and learned temperature scaling.
Results
Reached 71.4% R@1 on a held-out test set, only 3 points behind a 6× larger baseline.