Scalable Zero-Shot Logo Recognition
Abstract
Brand logo recognition is a task focused on the identification and classification of logos, with various applications such as brand protection and market discovery. Since the number of brands is dynamic and evolves over time, we propose a two-step, zero-shot framework for this problem. In the first step, we train a Scaled-YOLOv4 single-stage universal logo detector to identify regions containing logo-like objects. Our results indicate that this detector achieves generalizability comparable to the two-stage Faster-RCNN model. In the second step, we employ an enhanced CLIP model for zero-shot classification of the identified regions. Our experiments demonstrate that the CLIP model outperforms state-of-the-art few-shot classifiers in terms of accuracy. Additionally, we adopt test-time augmentation to improve the model’s resistance against false positives. We also present a proof-of-concept for fine-tuning the CLIP model, which enhances its cosine similarity measures. Our proposed end-to-end solution is scalable in terms of the number of brands and requires only the brand names for detection. The logo detector achieves superior performance on the FlickrLogos-32 dataset without the need for additional fine-tuning.
Similar publications
partnership