The problem
Edge hardware cannot run the full teacher network in real-time, but accuracy must remain useful.
The approach
Feature-level + response-level distillation, INT8 quantization, and TensorRT graph optimization.
Results
mAP@0.5 dropped only 2.1 points while throughput went from 9 FPS → 38 FPS on Orin Nano.