Mask2Former — Universal Image Segmentation by Meta 🖼️ → ❓

▶️ 658 runs 📅 Jan 2022 ⚙️ Cog 0.1.3+shimmed 🔗 GitHub 📄 Paper ⚖️ License
image-segmentation instance-segmentation masked-attention panoptic-segmentation semantic-segmentation transformer universal-segmentation segmentation computer-vision

Mask2Former by Meta AI Research is a universal segmentation model. It replaces the need for separate panoptic, instance, and semantic segmentation pipelines with one architecture that handles all three.

About

Mask2Former is Meta's unified architecture for image segmentation. It handles panoptic, instance, and semantic segmentation with a single model, instead of requiring separate specialized networks for each task.

The model uses masked attention to focus on specific regions during prediction, which improves both accuracy and efficiency compared to earlier approaches like MaskFormer.

What it can do

  • Panoptic segmentation — label every pixel as either a "thing" (countable object) or "stuff" (background region)
  • Instance segmentation — detect and separate individual objects, even overlapping ones
  • Semantic segmentation — classify every pixel by category without distinguishing instances

Mask2Former achieves state-of-the-art results across all three tasks on standard benchmarks like COCO and ADE20K.

Example Output

Output

[object Object]

Performance Metrics

2.90s Prediction Time
111.19s Total Time
Input Parameters
image Type: string
Input image for segmentation. Output will be the concatenation of Panoptic segmentation (top), instance segmentation (middle), and semantic segmentation (bottom).
Output Schema

Type: arrayItems Type: object

Version Details
Version ID
97c0c2edeeb7c120c2859dca4fdee58d185131f79c857ba519e3a5cb7cdd7c66
Version Created
February 20, 2022
Run on Replicate →