OP-Gen: A High-Quality Remote Sensing Image Generation Algorithm Guided by OSM Images and Textual Prompts
Huolin Xiong, Zekun Li, Qunbo Lv, Baoyu Zhu, Yu Zhang, Chaoyang Yu, Zheng TanThe application of diffusion models in the field of remote sensing image generation has significantly improved the performance of generation algorithms. However, existing methods still exhibit certain limitations, such as the inability to generate images with rich texture details and minimal geometric distortions in a controllable manner. To address these shortcomings, this work introduces an innovative remote sensing image generation algorithm, OP-Gen, which is guided by textual descriptions and OpenStreetMap (OSM) images. OP-Gen incorporates two information extraction branches: ControlNet and OSM-prompt (OP). The ControlNet branch extracts structural and spatial information from OSM images and injects this information into the diffusion model, providing guidance for the overall structural framework of the generated images. In the OP branch, we design an OP-Controller module, which extracts detailed semantic information from textual prompts based on the structural information of the OSM image. This information is subsequently injected into the diffusion model, enriching the generated images with fine-grained details, aligning the generated details with the structural framework, and thus significantly enhancing the realism of the output. The proposed OP-Gen algorithm achieves state-of-the-art performance in both qualitative and quantitative evaluations. The qualitative results demonstrate that OP-Gen outperforms existing methods in terms of structural coherence and texture detail richness. Quantitatively, the algorithm achieves a Fréchet inception distance (FID) of 45.01, a structural similarity index measure (SSIM) of 0.1904, and a Contrastive Language-Image Pretraining (CLIP) score of 0.3071, all of which represent the best performance among the current algorithms of the same type.