AI Roadmap

  • MViTv2: Improved Multi-scale Vision Transformersfor Classification and Detection

    MViTv2: Improved Multi-scale Vision Transformersfor Classification and Detection

    MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Facebook AI Research, UC Berkeley Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. Introduction Motivation Designing a single, simple, yet effective architecture for diverse visual recognition tasks (image, video, detection). While Vision Transformers (ViT) are powerful, their standard architecture struggles with…

    Read more


  • MViT: Multiscale Vision Transformer

    MViT: Multiscale Vision Transformer

    Multiscale Vision Transformers Facebook AI Research, UC Berkeley Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021. Introduction Motivation Convolutional Neural Networks (CNNs) have long benefited from multiscale feature hierarchies (pyramids), where spatial resolution decreases while channel complexity increases through the network. Vision Transformers (ViT) maintain a constant resolution and channel capacity throughout,…

    Read more


  • [SAM] Segment Anything

    Paper: https://arxiv.org/abs/2304.02643 Code: Web: https://segment-anything.com/ Motivation Objective Segment Anything (SAM) = Interactive Segmentation + Automatic Segmentation Global Framework Loss Function 1) Lmask: Supervise mask prediction with a linear combination of focal loss [65] and dice loss [73] in a 20:1 ratio of focal loss to dice loss. 2) LIoU: The IoU prediction head is trained…

    Read more


  • [ViT] An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale

    Read more