Csmoe: An Efficient Remote Sensing Foundation Model With Soft Mixture-of-experts
2025 · Leonard Hackel, Tom Burgert, Begüm Demir
Abstract
Self-supervised learning through masked autoencoders has attracted great attention for remote sensing (RS) foundation model (FM) development, enabling improved representation learning across diverse sensors and downstream tasks. However, existing RS FMs often either suffer from substantial computational complexity during both training and inference or exhibit limited representational capacity. These issues restrict their practical applicability in RS. To address this limitation, we propose an adaptation for enhancing the efficiency of RS FMs by integrating the Soft mixture-of-experts (MoE) mechanism into the FM. The integration of Soft MoEs into the FM allows modality-specific expert specialization alongside shared cross-sensor representation learning. To demonstrate the effectiveness of our adaptation, we apply it on the Cross-Sensor Masked Autoencoder (CSMAE) model, resulting in the Cross-Sensor Mixture-of-Experts (CSMoE) model. In addition, we introduce a thematic-climatic descripto
Authors
(none)
Tags
Stats
Related papers
- Exploring Masked Autoencoders For Sensor-agnostic Image Retrieval In Remote Sensing (2024)10.74
- Exploring A Fine-grained Multiscale Method For Cross-modal Remote Sensing Image Retrieval (2022)16.73
- A Novel Self-supervised Cross-modal Image Retrieval Method In Remote Sensing (2022)8.35
- Mixture Of Experts With Soft Nearest Neighbor Loss: Resolving Expert Collapse Via Representation Disentanglement (2026)0.00
- Vlm2geovec: Toward Universal Multimodal Embeddings For Remote Sensing (2025)0.00
- CSMF: Cascaded Selective Mask Fine-tuning For Multi-objective Embedding-based Retrieval (2025)0.00
- SMEC: Rethinking Matryoshka Representation Learning For Retrieval Embedding Compression (2025)0.00
- Large Language Models For Captioning And Retrieving Remote Sensing Images (2024)0.00