20. Thank you!
Hyungjin Chung Byeongsu Sim Jong Chul Ye
hj.chung@kaist.ac.kr byeongsu.s@kaist.ac.kr jong.ye@kaist.ac.kr
Editor's Notes
Hi, I’m Hyungjin Chung, and I will be presenting the work come-closer-diffuse-faster, which is a work aimed for accelerating diffusion models for inverse problems.
Diffusion models are slow. They define an iterative chain, starting from pure Gaussian noise, slowly denoised to form a high fidelity image. Note that the diffusion process takes few thousand iterations to perform well. Writing in terms of stochastic difference equation, one can write as follows, where z represents some gaussian noise.
Diffusion models are very flexible such that the unconditional models can be used directly to solve inverse problems without any finetuing. Just by imposing data consistency constraints in-between, one can sample from some conditional distribution given a measurement. However, again, the process is slow.
When we inspect the diffusion process, specifically of conditional diffusion, it seems odd that almost half of the reverse diffusion process does not contain any relevant information about the reconstruction. So we question: Is this part necessary?
Let us assume that the total discretization step, N, is set to one thousand, and N’ is defined as t_0 times N.
For solving inverse problems, recall that we have the corrupted image x_0, and due to the linearity of the forward diffusion, one can sample from the intermediate step at N’ with a single reparameterization step.
When we compare this to the forward diffusion of the ground truth image, tilde x_0, the two images look very similar.
Our intuition is that starting the reverse diffusion step from x N’ will eventually lead us to tilde x_0, given that our model is properly trained.
In order to study this property in a more rigorous fashion, let us first define epsilon_0, standing for the error between the ground truth, and the measurement.
Define bar epsilon N’ as the expected distance between the two images after the forward diffusion.
Then, we finally define bar epsilon 0, reverse, as the expected distance between the two images after the reverse diffusion.
Then, our first theorem tells us that the error bar epsilon 0, reverse, will exponentially decrease with the reverse diffusion, according to the theory of stochastic contraction. Here, the specific values for lambda, C, and tau is defined by the type of diffusion process you use, and the ill-posedness of the problem, respectively.
Visually, we can plot the error as the reverse diffusion progresses, as follows. This will be the case when we use full reverse diffusion.
Our second theorem, which is our main theorem, states that for any mu between 0 and 1, there exists a minimum N’ such that final error is bounded by mu times epsilon 0. Moreover, such optimal N’ will decrease, as epsilon_0, implying the initial error of the problem, gets smaller.
Visually speaking, the error will polynomially increase with the forward diffusion initialization, and decrease exponentially with the reverse diffusion, such that we can find an optimal value of N’, or t_0.
Even better, if we have a feed-forward network that is trained for a specific task, for example, super resolution, we can directly start from that estimate, and use even smaller number of diffusion steps, as in this figure.
Using our strategy, we show that we can achieve state-of-the-art reconstruction results for various types of imaging problems. First, for super-resolution, we achieve high quality reconstructions using only 20 step diffusion, where as other diffusion based methods such as SR3 or ILVR suffer heavily.
Plotting the FID score, we again observe that CCDF consistently performs better than ILVR under the same budget.
We see similar high quality results with 20 step diffusion with image inpainting, where we see that the strategy adopted in, for example, score-sde fails, not being able to remove noise properly.
Our final application shows that our strategy can also be directly applied to compressed sensing MRI, which is supposedly the most practical, since fast reconstruction is of utmost importance in medical imaging. We show that we can even outperform the strategy that uses full reverse diffusion, with our 20 step diffusion counterpart.