Abstract— The increasing scale and complexity of remote sensing (RS) observations demand distributed processing to effectively manage the vast volumes of data generated. However, distributed processing presents significant challenges, including bandwidth limitations, high latency, and privacy concerns, espe- cially when transmitting high-resolution images. To address these issues, we propose a novel scheme leveraging the encoder of a masked autoencoder (MAE) to generate associated embedding (CLS tokens) from masked images, which enables training deep learning models under federated learning (FL) scenarios. This approach enables the transmission of compact image patches instead of full images to processing nodes, drastically reducing bandwidth usage. On the processing nodes, classifiers are trained with the CLS tokens, and model weights are aggregated using FedAvg and FedProx FL algorithms. Experimental results on benchmark datasets demonstrate that the proposed approach significantly reduces data transmission requirements while main- taining and even surpassing the accuracy of systems with access to full data.