Models

  • Object Detection – A Research Review

    Object Detection – A Research Review

    Introduction Object detection is a computer vision task that involves identifying and localizing objects within an image or video. It consists of two main steps: The output of an object detection model for each identified instance is a tuple comprising a class label (ci​), the bounding box parameters (e.g., center coordinates, width, and height: xi,…

    Read more


  • MViTv2: Improved Multi-scale Vision Transformersfor Classification and Detection

    MViTv2: Improved Multi-scale Vision Transformersfor Classification and Detection

    MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Facebook AI Research, UC Berkeley Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. Introduction Motivation Designing a single, simple, yet effective architecture for diverse visual recognition tasks (image, video, detection). While Vision Transformers (ViT) are powerful, their standard architecture struggles with…

    Read more


  • MViT: Multiscale Vision Transformer

    MViT: Multiscale Vision Transformer

    Multiscale Vision Transformers Facebook AI Research, UC Berkeley Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021. Introduction Motivation Convolutional Neural Networks (CNNs) have long benefited from multiscale feature hierarchies (pyramids), where spatial resolution decreases while channel complexity increases through the network. Vision Transformers (ViT) maintain a constant resolution and channel capacity throughout,…

    Read more


  • [ViT] An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale

    [ViT] An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale

    ViT: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale Google Research, Brain Team The 9th International Conference on Learning Representations, ICLR, 2021. Introduction Motivation The Transformer model and its variants have been successfully shown that they can be comparable to or even better than the state-of-the-art in several tasks, especially in the field of NLP.  Objective Related…

    Read more