A descriptive text for the image

Abstract

Most existing diffusion models have primarily utilized reference images for image-to-image translation rather than for super-resolution (SR). In SR-specific tasks, diffusion methods rely solely on low-resolution (LR) inputs, limiting their ability to leverage reference information. Prior reference-based diffusion SR methods have shown that incorporating appropriate references can significantly enhance reconstruction quality; however, identifying suitable references in real-world scenarios remains a critical challenge. Recently, Retrieval-Augmented Generation (RAG) has emerged as an effective framework that integrates retrieval-based and generation-based information from databases to enhance the accuracy and relevance of responses. Inspired by RAG, we propose an image-based RAG framework (iRAG) for realistic super-resolution, which employs a trainable hashing function to retrieve either real-world or generated references given an LR query. Retrieved patches are passed to a restoration module that generates high-fidelity super-resolved features, and a hallucination filtering mechanism is used to refine generated references from pre-trained diffusion models. Experimental results demonstrate that our approach not only resolves practical difficulties in reference selection but also delivers superior performance over existing diffusion and non-diffusion RefSR methods.

iRAG pipeline

Method

In the context of image super-resolution (SR), an auxiliary high-resolution (HR) image, containing semantically or texturally similar information to the input low-resolution (LR) image, is often leveraged to guide the restoration of fine details and structural integrity. However, obtaining such a reference image for real-world datasets is challenging. Large datasets, such as ImageNet, typically consist of single-view images, making it difficult for direct use as reference patches. Furthermore, the process of curating a reference image from these extensive datasets is computationally onerous. To address this challenge, we propose a reference-based SR method that involves three key steps: (i) augmenting the existing dataset by enriching it with auxiliary HR images, (ii) retrieving a relevant HR image from a large database to match the target LR image, and (iii) generating a high-quality HR image by integrating LR and reference features into a diffusion model.

Reference Database

We split the DF2K-OST dataset such that 75% of the images form the training set for image restoration. The remaining 25% is used to build the reference database, which is further augmented with synthetic images. To enrich this reference database, we add synthetic images in a one-to-one ratio with the real patches. We trained an unsupervised hashing model on these patches using positive pairs generated via standard data augmentation. Synthetic images were generated using the diffusers pipeline and SDEdit. A hallucination threshold (average variance of 0.03) was applied; if exceeded, the generation was repeated up to 10 times, retaining the sample with the lowest variance.

Reference Database

Conclusion

We proposed a novel image-based Retrieval Augmented Generation framework that combines latent diffusion models with an efficient hashing code vector strategy achieving robust reference matching and realistic reference-based SR. Operating in a compact latent space by short binary hash codes, our method addressed the challenges of reference selection and improves domain consistency between low-resolution inputs and high-resolution references. Experiments on real-world datasets demonstrate that our approach outperforms existing diffusion-based super-resolution methods and reference-based methods in terms of fidelity, perceptual quality, and computational efficiency.

Citation

If you use this work or find it helpful, please consider citing:

@InProceedings{lee2025irag,
    author    = {Lee, Byeonghun and Cho, Hyunmin and Choi, Hong Gyu and Kang, Soo Min and Ahn, Iljun and Jin, Kyong Hwan},
    title     = {Reference-based Super-Resolution via Image-based Retrieval-Augmented Generation Diffusion},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {10764-10774}
}