Multimodal Retrieval: Learning Deep Supervised Representations with Online Hard Triplet Mining

Finetuning a modified version of OpenAI’s CLIP to leverage already aligned image and text encoders for accurate, jointly learned…Continue reading on Medium »

Multimodal Retrieval: Learning Deep Supervised Representations with Online Hard Triplet Mining

Finetuning a modified version of OpenAI’s CLIP to leverage already aligned image and text encoders for accurate, jointly learned…