【Note】DiffusionSAT - Bili-Sakura/NOTES GitHub Wiki

Note on DiffusionSAT

Khanna, S., Liu, P., Zhou, L., Meng, C., Rombach, R., Burke, M., Lobell, D., & Ermon, S. (2023, December 6). DiffusionSat: A Generative Foundation Model for Satellite Imagery. International Conference on Learning Representations (ICLR 2024). https://doi.org/10.48550/arXiv.2312.03606

Overview

Introduction

Main Contributions

To fill this gap, we propose DiffusionSat, a generative foundation model for satellite imagery inspired from SD. Using commonly associated metadata with satellite images including latitude, longitude, timestamp, and ground-sampling distance (GSD), we train our model for single-image generation on a collection of publicly available satellite image data sets. Further, inspired from ControlNets Zhang & Agrawala (2023), we design conditioning models that can easily be trained for specific generative tasks or inverse problems including super-resolution, in-painting, and temporal generation. Specifically, our contributions include:

  1. We propose a novel generative foundation model for satellite image data with the ability to generate high-resolution satellite imagery from numerical metadata as well as text.
  2. We design a novel 3D-conditioning extension which enables DiffusionSat to demonstrate state-of-the-art performance on super-resolution, temporal generation, and in-painting
  3. We collect and compile a global generative pre-training dataset from large, publicly available satelilte image datasets.

Methodology

Results

Appendix

Datasets

References

Lvmin Zhang and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.