ADE20K
Canonical43papers using it
2016first seen
ADE20K is a dataset used for evaluating semantic segmentation models, containing a diverse set of images annotated with object and stuff categories.
Papers using ADE20K (43)
- Masked-attention Mask Transformer For Universal Image SegmentationSemantic Understanding Of Scenes Through The ADE20K DatasetTopformer: Token Pyramid Transformer For Mobile Semantic SegmentationMulti-scale High-resolution Vision Transformer For Semantic SegmentationDsnet: A Novel Way To Use Atrous Convolutions In Semantic SegmentationIn Defense Of Lazy Visual Grounding For Open-vocabulary Semantic SegmentationFull Contextual Attention For Multi-resolution Transformers In Semantic SegmentationEnhancing Transformer-based Vision Models: Addressing Feature Map Anomalies Through Novel Optimization StrategiesDiffusion For Out-of-distribution Detection On Road Scenes And BeyondToken Cropr: Faster Vits For Quite A Few TasksSOS: Segment Object System For Open-world Instance Segmentation With Object PriorsARTA: Adaptive Mixed-resolution Token Allocation For Efficient Dense Feature ExtractionLocality-Attending Vision TransformerExploring Open-Vocabulary Object Recognition in Images using CLIPSeeing Through Clutter: Structured 3D Scene Reconstruction via Iterative Object RemovalCross-domain Semantic Segmentation With Large Language Model-assisted Descriptor GenerationContext Encoding For Semantic SegmentationSegnext: Rethinking Convolutional Attention Design For Semantic SegmentationK-net: Towards Unified Image SegmentationSeaformer++: Squeeze-enhanced Axial Transformer For Mobile Visual RecognitionYou Only Segment Once: Towards Real-time Panoptic SegmentationMVP: Multimodality-guided Visual Pre-trainingSemantic Segmentation Via Highly Fused Convolutional Network With Multiple Soft Cost FunctionsContent-aware Token Sharing For Efficient Semantic Segmentation With Vision TransformersRest V2: Simpler, Faster And StrongerRemax: Relaxing For Better Training On Efficient Panoptic SegmentationSkip-attention: Improving Vision Transformers By Paying Less AttentionIncepformer: Efficient Inception Transformer With Pyramid Pooling For Semantic SegmentationDmformer: Closing The Gap Between CNN And Vision TransformersPAUMER: Patch Pausing Transformer For Semantic SegmentationStructtoken : Rethinking Semantic Segmentation With Structural PriorMambavision: A Hybrid Mamba-transformer Vision BackboneSpiralmlp: A Lightweight Vision MLP ArchitecturePNM: Pixel Null Model For General Image SegmentationConv2Former: A Simple Transformer-Style ConvNet for Visual RecognitionA Unified View of Masked Image ModelingDecoder Denoising Pretraining for Semantic SegmentationUnderstanding Gaussian Attention Bias of Vision Transformers Using
Effective Receptive FieldsFeature Selective Transformer for Semantic Image SegmentationHCFormer: Unified Image Segmentation with Hierarchical ClusteringLow-Resolution Self-Attention for Semantic SegmentationA Simple Latent Diffusion Approach for Panoptic Segmentation and Mask
InpaintingTransformer Scale Gate for Semantic Segmentation