<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.3.3">Jekyll</generator><link href="https://icewired-yy.github.io/feed.xml" rel="self" type="application/atom+xml"/><link href="https://icewired-yy.github.io/" rel="alternate" type="text/html" hreflang="en"/><updated>2026-05-11T10:58:42+00:00</updated><id>https://icewired-yy.github.io/feed.xml</id><title type="html">blank</title><subtitle>A simple, whitespace theme for academics. Based on [*folio](https://github.com/bogoli/-folio) design. </subtitle><entry><title type="html">NDF and Microfacet Model</title><link href="https://icewired-yy.github.io/blog/2025/EN-NDF-and-Microfacet-Theory/" rel="alternate" type="text/html" title="NDF and Microfacet Model"/><published>2025-11-11T00:00:00+00:00</published><updated>2025-11-11T00:00:00+00:00</updated><id>https://icewired-yy.github.io/blog/2025/EN-NDF-and-Microfacet-Theory</id><content type="html" xml:base="https://icewired-yy.github.io/blog/2025/EN-NDF-and-Microfacet-Theory/"><![CDATA[<h2 id="preliminary">Preliminary</h2> <p>The target of writing this post is to record my understanding of microfacet theory. It will be updated whenever my understanding is refined. I will try to derive the microfacet model in a friendly way for all the readers that want to get familiar with this theory as well. It is greatly appreciated if one can figure out the mistake in this post and show in the comment below. Also, any discussion on this topic is welcome.</p> <hr/> <h2 id="microsurface-and-geometric-surface">Microsurface and Geometric surface</h2> <table> <thead> <tr> <th>Symbol</th> <th>Definition</th> </tr> </thead> <tbody> <tr> <td>$\mathcal{M}$</td> <td>The surface area of microsurface</td> </tr> <tr> <td>$\mathcal{G}$</td> <td>The surface area of geometric surface</td> </tr> <tr> <td>$p_m$</td> <td>The point on the microsurface $\mathcal{M}$</td> </tr> <tr> <td>$p_g$</td> <td>The point on the geometric surface $\mathcal{G}$</td> </tr> <tr> <td>$\omega_g$</td> <td>The normal of the geometric surface</td> </tr> <tr> <td>$&lt;\cdot, \cdot&gt;$</td> <td>The inner product of two vectors clamp to 0</td> </tr> </tbody> </table> <p>Before get deeper into the microfacet theory, we need to answer why we need this theory. The reason is easy to realize. Let’s look at the figure below:</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <img src="/assets/posts/ndf_microfacet/macrosurface_microsurface.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <div class="caption"> The observer receives the light from a pixel, whose footprint covers a large area of surface with complex microstructure. </div> <p>We called the area of the object surface that covered by one pixel the <code class="language-plaintext highlighter-rouge">footprint</code>. As illustrated in the figure, there are many complex microstructures within that footprint, we call them <code class="language-plaintext highlighter-rouge">microsurface</code>. Due to these microstructures, the appearance of this area should be highly spatial-varying. However, since <strong>one pixel can only return one RGB</strong>, we need to summarize the appearance of these microstructures with only one RGB value. Obviously, we need to derive the statistical (or aggregated) properities of the microsurface from its spatial properities. Once we have its statistical properites, we can assume that the area within the footprint is flat, which can use only one normal vector to describe it. called <code class="language-plaintext highlighter-rouge">geometric surface</code> or <code class="language-plaintext highlighter-rouge">macrosurface</code>. This assumed macrosurface needs a complex microfacet theory that summarizes the appearance into one value from its statistical properties. That’s why we need the microfacet model.</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <img src="/assets/posts/ndf_microfacet/geo-micro.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <div class="caption"> The bijecction between microsurface and geometric surface. </div> <p>The classical microfacet theory has a basic assumption, that is there need to be a bijection between microsurface $\mathcal{M}$ and geometric surface $\mathcal{G}$.</p> <blockquote> <p><strong><em>Assumption 1</em></strong>: One, and only one point $p_m$ on the $\mathcal{M}$ can be projected to one point $p_g$ on the $\mathcal{G}$ along the geometric normal $\omega_g$.</p> </blockquote> <p>This assumption leads to:</p> <p>\begin{equation} \label{eq: Relationship between micro and geo} \int_{\mathcal{M}} &lt;\omega_m(p_m), \omega_g&gt; \mathrm{d} p_m = \int_{\mathcal{G}} \mathrm{d} p_g = A_g, \end{equation}</p> <p>where $A_g$ is the area of geometric surface. One can make $A_g = 1m^2$ without loss of generality. This term will always be cancelled out in the following derivation.</p> <hr/> <h2 id="normal-distribution-function">Normal Distribution Function</h2> <table> <thead> <tr> <th>Symbol</th> <th>Definition</th> </tr> </thead> <tbody> <tr> <td>$\Omega$</td> <td>The spherical space</td> </tr> <tr> <td>$D(\omega)$</td> <td>The normal distribution function</td> </tr> <tr> <td>$\delta_{\omega^{\prime}}(\omega)$</td> <td>The dirac delta function, $+\infty$ if $\omega = \omega^{\prime}$ and 0 otherwise, normalized in the integral</td> </tr> <tr> <td>$\omega_i$</td> <td>The incident direction</td> </tr> <tr> <td>$\omega_o$</td> <td>The outgoing direction</td> </tr> <tr> <td>$\omega_m$</td> <td>The normal of one point on the \mathcal{M}</td> </tr> </tbody> </table> <p>One important information we need to use to describe the microsurface’s geometric appearance, is the summary of the normal on the microsurface. We can use a function called <code class="language-plaintext highlighter-rouge">Normal Distribution Function (NDF)</code> to describe it. One need to distinguish it from the <code class="language-plaintext highlighter-rouge">Probability Density Function (PDF)</code> of normal, which describe the probability of normal of a uniformly random-sampled point on the microsurface. The definition of the NDF is:</p> <p>\begin{equation} \label{eq: def of NDF} D(\omega) = \int_{\mathcal{M}} \delta_{\omega}(\omega_m(p_m)) \mathrm{d} p_m. \end{equation}</p> <p>It describes the total area of the microsurface that their normal pointing at direction $\omega$. Thus, the unit of $D(\omega)$ is <code class="language-plaintext highlighter-rouge">$\frac{m^2}{sr}$</code>. Why we need to define the NDF? The reason is that, <strong>NDF is a bridge to connect two different spaces</strong>: one is the spatial space, that is, the microsurface space $\mathcal{M}$, and another is the statistical space, that is, the spherical space $\Omega$. An informal understanding from the relationship of unit is that, $D(\omega)\mathrm{d}\omega$ ($\frac{m^2}{sr} \cdot sr$) indicates all the differential area $\mathrm{d}p_m$ ($m^2$) whose normal pointing toward $\omega$. A more precise description of this relationship is that, given a subset $\Omega^{\prime}$ from the $\Omega$, we can have a corresponding subset $\mathcal{M}^{\prime}$ from $\mathcal{M}$ that $\mathcal{M}^{\prime}$ contains all the points on the microsurface whose normal inside the $\Omega^{\prime}$, and we have:</p> \[\int_{\Omega^{\prime}} D(\omega) \mathrm{d} \omega = \int_{\mathcal{M}^{\prime}} \mathrm{d}p_m\] <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <img src="/assets/posts/ndf_microfacet/relationship_ndf.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <div class="caption"> The relationship between the statistical integral via NDF and the sptial integral to sum the specific area. </div> <p><strong>We need to understand the NDF well before we get into the following content, this is the base of the microfacet theory.</strong> Here are some deduction related to NDF:</p> <p><strong><em>Statistical area counting</em></strong>. The counting of the microsurface area can be converted from spatial integral to statistical integral via NDF, leading to:</p> \[\int_{\Omega} D(\omega) &lt;\omega, \omega_g&gt; \mathrm{d} \omega = \int_{\mathcal{M}} &lt;\omega_m(p_m), \omega_g&gt; \mathrm{d} p_m = \int_{\mathcal{G}} \mathrm{d} p_g = A_g \left(= 1m^2\right)\] <p><strong><em>Relationship between NDF and normal PDF</em></strong>. The unit of PDF of normal $p(\omega)$ is $\frac{1}{sr}$, so it is obviously that:</p> \[p(\omega) = \frac{D(\omega)}{\int_{\Omega} D(\omega) \mathrm{d} \omega},\] <p>where the denominator is the total area of microsurface.</p> <hr/> <h2 id="masking-function">Masking Function</h2> <table> <thead> <tr> <th>Symbol</th> <th>Definition</th> </tr> </thead> <tbody> <tr> <td>$W_m(p_m, \omega_o)$</td> <td>The projection factor at $p_m$ toward direction $\omega_o$</td> </tr> <tr> <td>$A_{proj}(\omega_o)$</td> <td>The view-depenent projected area of microsurface toward direction $\omega_o$.</td> </tr> <tr> <td>$L(\omega_o)$</td> <td>The aggregated outgoing radiance across the whole microsurface toward $\omega_o$</td> </tr> <tr> <td>$L(p_m, \omega_o)$</td> <td>The outgoing radiance at the point on the microsurface $p_m$ toward direction $\omega_o$</td> </tr> <tr> <td>$G(p_m, \omega_o) \in {0, 1}$</td> <td>The spatial geometric masking term toward direction $\omega_o$</td> </tr> <tr> <td>$G(\omega_m, \omega_o) \in [0, 1]$</td> <td>The statistical geometric masking term describing the ratio of all the microsurface with normal $omega_m$ not being masked.</td> </tr> </tbody> </table> <p>The reason why we introduce the microfacet theory is to calculate the aggregated outgoing radiance from the microsurface covered by the pixel’s footprint. Thus, we need to first figure out two magnitudes: the view-dependent projected area and the formula of the outgoing radiance.</p> <h3 id="view-dependent-projected-area">View-dependent Projected Area</h3> <p>This projected area measures the area of the microsurface that we can observe from one given view direction. To analyze the formula, we can assume that every differential area $\mathrm{d}p_m$ has a projection factor $W_m(p_m, \omega_o)$ toward the given view direction $\omega_o$. Thus, the projected area can be formulated as:</p> \[A_{proj}(\omega_o) = \int_{\mathcal{M}} W_m(p_m, \omega_o)\mathrm{d}p_m\] <p>Due to the intrinsic of the microfacet assumption, the view-dependent projected area is always $&lt;\omega_g, \omega_o&gt;$, leading to:</p> \[\int_{\mathcal{M}} W_m(p_m, \omega_o)\mathrm{d}p_m = &lt;\omega_g, \omega_o&gt; = \cos\theta_o\] <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <img src="/assets/posts/ndf_microfacet/view-dependent%20projected%20area.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <div class="caption"> How the microsurface projected to the given view direction. </div> <h3 id="outgoing-radiance">Outgoing Radiance</h3> <p>Obviously, observer can only receive the outgoing radiance from the view-dependent projected area, that is, the visible part of the microsurface. To formulate this, we could guess that every point $p_m$ on the microsurface should have an outgoing radiance $L(p_m, \omega_o)$ that may contribute to the final aggregated outgonig radiance $L(\omega_o)$, and the contribution weight is the view-dependent projected area of $p_m$. Since this weight is not guaranteed to be normalized, so the final formula is:</p> \[L(\omega_o) = \frac{\int_{\mathcal{M}} W_m(p_m, \omega_o) L(p_m, \omega_o) \mathrm{d} p_m }{\int_{\mathcal{M}} W_m(p_m, \omega_o) \mathrm{d} p_m}\] <p>Maybe you have thought about how to convert this spatial integral into a statistical integral as before. However, we have not analyzed the components of $W_m(p_m, \omega_o)$ so far, which we will discuss in the next subsection.</p> <h3 id="geometric-masking">Geometric Masking</h3> <p>Obviously, we can notice that there are many place on the microsurface whose outgoing radiance will be occluded (or masked) be another part of the microsurface. Like the visibility term, we also need to use a term called masking function $G(p_m, \omega_o)$ to indicate whether the outgoing radiance will not be occluded. After we define this masking function, the projection factor $W_m(p_m, \omega_o)$ can be represented by:</p> \[W_m(p_m, \omega_o) = G(p_m, \omega_o) &lt;\omega_m(p_m), \omega_o&gt;.\] <p>And we can immediately get the following spatial integral equations:</p> <p>\begin{equation} \label{eq: Spatial Projected Area} \int_{\mathcal{M}} G(p_m, \omega_o) &lt;\omega_m(p_m), \omega_o&gt;\mathrm{d}p_m = \cos\theta_o \end{equation}</p> <p>\begin{equation} \label{eq: Spatial Outgoing Radiance} L(\omega_o) = \frac{\int_{\mathcal{M}} G(p_m, \omega_o) &lt;\omega_m(p_m), \omega_o&gt; L(p_m, \omega_o) \mathrm{d} p_m }{\int_{\mathcal{M}} G(p_m, \omega_o) &lt;\omega_m(p_m), \omega_o&gt; \mathrm{d} p_m} \end{equation}</p> <p>Conventionally, we need to have a statistical version of the masking function. Different from the normal distribution function, the statistical masking function $G(\omega, \omega_o)$ is defined as the visible ratio of the microsurface area whose normal is pointing toward $\omega$, leading to:</p> \[G(\omega, \omega_o) = \frac{\int_{\mathcal{M}}\delta_{\omega}(\omega_m(p_m)) G(p_m, \omega_o)\mathrm{d}p_m}{\int_{\mathcal{M}}\delta_{\omega}(\omega_m(p_m))\mathrm{d}p_m}\] <p>Then, the equation \eqref{eq: Spatial Projected Area} has them statistical version:</p> <p>\begin{equation} \label{eq: Statistical Projected Area} \int_{\Omega} G(\omega, \omega_o) &lt;\omega, \omega_o&gt; D(\omega) \mathrm{d}\omega = \cos\theta_o \end{equation}</p> <p>This is also a constraint of the masking function.</p> <aside> <figure> <picture> <img src="/assets/posts/ndf_microfacet/Masking%20function%20from%20Heitz.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> <p> E. Heitz, "Understanding the Masking-Shadowing Function in Microfacet-Based BRDFs". </p> </aside> <hr/> <h2 id="microfacet-brdf">Microfacet BRDF</h2> <table> <thead> <tr> <th>Symbol</th> <th>Definition</th> </tr> </thead> <tbody> <tr> <td>$f_r(\omega_i, \omega_o)$</td> <td>The microfacet BRDF</td> </tr> <tr> <td>$f_m(p_m, \omega_i, \omega_o)$</td> <td>The microsurface BRDF at point $p_m$</td> </tr> <tr> <td>$f_m(\omega_m, \omega_i, \omega_o)$</td> <td>The microsurface BRDF where the normal is pointing toward $\omega_m$</td> </tr> <tr> <td>$G(\omega_m, \omega_i, \omega_o)$</td> <td>The shadowing-masking function</td> </tr> <tr> <td>$R(\omega;\omega_m)$</td> <td>The pure reflected direction if the incident / outgoing direction is $\omega$ and the normal of surface is $\omega_m$.</td> </tr> <tr> <td>$F(\omega_i, \omega_m)$</td> <td>The fresnel term.</td> </tr> <tr> <td>$\vec{h}$</td> <td>The unnormalized half vector</td> </tr> <tr> <td>$\omega_h$</td> <td>The normalized half vector</td> </tr> </tbody> </table> <p>To find out the formula of the microfacet BRDF $f_r(\omega_i, \omega_o)$ to satisfy:</p> \[\mathrm{d}L(\omega_o) = f_r(\omega_i, \omega_o) &lt;\omega_i, \omega_g&gt; \mathrm{d} L(\omega_i),\] <p>we need to utilize the differential version of equation \eqref{eq: Spatial Outgoing Radiance}, which looks like:</p> \[\mathrm{d}L(\omega_o) = \frac{1}{\cos\theta_o} \int_{\mathcal{M}} G(p_m, \omega_o) &lt;\omega_m(p_m), \omega_o&gt; \mathrm{d}L(p_m, \omega_o) \mathrm{d} p_m\] <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <img src="/assets/posts/ndf_microfacet/differential%20outgoing%20radiance.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <div class="caption"> All the differential outgoing radiance on the microsurface will be finally aggregated into differential outgoing radiance of the whole region. </div> <p>Recall that for every point on the microsurface, we have the classic rendering equation (without self-emitting):</p> \[L(p_m, \omega_o) = \int_\Omega f_m(p_m, \omega_i, \omega_o) L(p_m, \omega_i) &lt;\omega_i, \omega_m(p_m)&gt; \mathrm{d} \omega_i\] <p>as well as the definition of the BRDF $f_m(p_m, \omega_i, \omega_o) = \frac{\mathrm{d}L(p_m, \omega_o)}{L(p_m, \omega_i) &lt;\omega_i, \omega_m(p_m)&gt; \mathrm{d}\omega_i}$.</p> <p>Substituting the $\mathrm{d}L(p_m, \omega_o)$ term, we have:</p> \[\mathrm{d}L(\omega_o) = \frac{1}{\cos\theta_o} \int_{\mathcal{M}} G(p_m, \omega_i,\omega_o) f_m(p_m, \omega_i, \omega_o) L(p_m, \omega_i) &lt;\omega_m(p_m), \omega_o&gt; &lt;\omega_i, \omega_m(p_m)&gt; \mathrm{d}\omega_i \mathrm{d} p_m\] <p>Here, we could find the the original masking function $G(p_m, \omega_o)$ has been replaced by the new shadowing-masking function. The reason is that, $p_m$ may not be directly illuminated by the light source, and this is a symmetric case to the masking-function. So we extend it into shadowing-masking function $G(p_m, \omega_i,\omega_o)$.</p> <p>Since those microstructure are too tiny to have a significant difference on the location $p_m$ w.r.t. the distance between light source and geometric surface. Hence, we have a proper assumption that the incident radiance $L(p_m, \omega_i)$ is independent to the location $p_m$, leading to $L(p_m, \omega_i) = L(\omega_i)$.And the equation can be rewritten as:</p> \[\mathrm{d}L(\omega_o) = \frac{1}{\cos\theta_o} L(\omega_i) \mathrm{d}\omega_i \int_{\mathcal{M}} G(p_m, \omega_i,\omega_o) f_m(p_m, \omega_i, \omega_o) &lt;\omega_m(p_m), \omega_o&gt; &lt;\omega_i, \omega_m(p_m)&gt; \mathrm{d} p_m\] <p>Comparing with the definition of $f_r(\omega_i, \omega_o)$, we have the initial formula of the microfacet BRDF:</p> \[f_r(\omega_i, \omega_o) = \frac{1}{&lt;\omega_i, \omega_g&gt;&lt;\omega_o, \omega_g&gt;} \int_{\mathcal{M}} G(p_m, \omega_i, \omega_o) f_m(p_m, \omega_i, \omega_o) &lt;\omega_m(p_m), \omega_o&gt; &lt;\omega_i, \omega_m(p_m)&gt; \mathrm{d} p_m\] <p>Now we obtain the spatial formula of the microfacet BRDF. To convert it into statistical integral, we need to introduce a new assumption:</p> <blockquote> <p><strong><em>Assumption 2</em></strong>: The material properities across all the microsurface covered by the same footprint are identical.</p> </blockquote> <p>This assumption leads to the independency between location $p_m$ and microsurface BRDF $f_m(p_m, \omega_i, \omega_o)$, causing $f_m(p_m, \omega_i, \omega_o) = f_m(\omega_m, \omega_i, \omega_o)$. And we can have the statistical integral:</p> <p>\begin{equation} \label{eq: Statistical Microfacet BRDF} f_r(\omega_i, \omega_o) = \frac{1}{&lt;\omega_i, \omega_g&gt;&lt;\omega_o, \omega_g&gt;} \int_{\Omega} G(\omega_m, \omega_i, \omega_o) f_m(\omega_m, \omega_i, \omega_o) D(\omega_m) &lt;\omega_m, \omega_o&gt; &lt;\omega_m, \omega_i&gt; \mathrm{d} \omega_m \end{equation}</p> <p>Hence, what we need to do next, is to show the components of microsurface BRDF $f_m(\omega_m, \omega_i, \omega_o)$.</p> <h3 id="pure-specular-brdf">Pure Specular BRDF</h3> <p>Before introducing the pure specular BRDF, let me explain why we need it. This is because of the third assumption of the microfacet theory:</p> <blockquote> <p><strong><em>Assumption 3</em></strong>: The microsurface is pure specular.</p> </blockquote> <p>So what does the pure specular BRDF look like? This is what we need to solve in this subsection. To begin with, we need to first get familiar with the <code class="language-plaintext highlighter-rouge">fresnel effect</code>. The left image below is borrowed from <a href="https://www.scratchapixel.com/lessons/3d-basic-rendering/introduction-to-shading/reflection-refraction-fresnel.html">Reflection, Refraction and Fresnel</a>.</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <img src="/assets/posts/ndf_microfacet/fresnel_img.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <div class="caption"> The fresnel effect. </div> <p>Fresnel effect describes that when the light hit on the surface, some of its energy reflected, and the order refracted. The amount of reflected energy is related to the angle of incidence. In radiosity, we need to use the energy per second, i.e., the flux, to mathematically describe this effect. Here we only consider the relationship between the reflected outgoing flux and the incident flux:</p> \[\mathrm{d}\Phi(\omega_o) = F(\omega_m, \omega_i)\mathrm{d}\Phi.(\omega_i)\] <p>Since</p> \[\mathrm{d}\Phi(\omega)=L(\omega)\cos\theta\mathrm{d}A\mathrm{d}\omega = L(\omega)\cos\theta\mathrm{d}A\sin\theta\mathrm{d}\theta\mathrm{d}\phi,\] <p>we have</p> \[L(\omega_o)\cos\theta_o\mathrm{d}A\sin\theta_o\mathrm{d}\theta\mathrm{d}\phi = F(\omega_m, \omega_i) L(\omega_i)\cos\theta_i\mathrm{d}A\sin\theta_i\mathrm{d}\theta\mathrm{d}\phi\] <p>Due to the nature of the reflection effect, many of the terms can be cancelled out, leaving:</p> \[L(\omega_o) = F(\omega_m, \omega_i) L(\omega_i).\] <p>On the other hand, since we are deriving the BRDF $f_m(\omega_m, \omega_i, \omega_o)$ of “pure specular” surface, it means that we only receive the light from the direction $\omega_i = R(\omega_o; \omega_m)$ which will reflected mirrorly to the $\omega_o$. Thus, there must be a delta term in the $f_m(\omega_m, \omega_i, \omega_o)$, leading to:</p> \[f_m(\omega_m, \omega_i, \omega_o) = f(\omega_m, \omega_i, \omega_o) \delta_{R(\omega_o;\omega_m)}(\omega_i).\] <p>Subsequently, the rendering equation has become:</p> <p>\begin{equation} L(\omega_o) = \int_\Omega f(\omega_m, \omega_i, \omega_o) \delta_{R(\omega_o;\omega_m)}(\omega_i) L(\omega_i) &lt;\omega_i, \omega_m&gt; \mathrm{d} \omega_i = f(\omega_m, R(\omega_o;\omega_m), \omega_o) L(R(\omega_o;\omega_m)) &lt;R(\omega_o;\omega_m), \omega_m&gt;. \end{equation}</p> <p>Substituting the $L(\omega_o)$, we can get:</p> \[f(\omega_m, \omega_i, \omega_o) = \frac{F(\omega_m, \omega_i)}{&lt;\omega_i, \omega_m&gt;}.\] <p>So, the pure specular BRDF is:</p> <p>\begin{equation} \label{eq: Pure Specular BRDF} f_m(\omega_m, \omega_i, \omega_o) = \frac{F(\omega_m, \omega_i)\delta_{R(\omega_o;\omega_m)}(\omega_i)}{&lt;\omega_i, \omega_m&gt;} \end{equation}</p> <h3 id="introducing-half-direction">Introducing Half Direction</h3> <p><strong>HOWEVER</strong>, it’s too soon to celebrate; there is one more thing we need to do. Maybe you have noticed that, in the equation \eqref{eq: Statistical Microfacet BRDF}, the incident direction $\omega_i$ and the outgoing direction $\omega_o$ are fixed; the integral variable is the normal of microsurface $\omega_m$. We know that the incident radiance can reflected to the desired outgoing direction only if the normal is pointing toward the half vector between $\omega_i$ and $\omega_o$. So there is truly a delta situation. But our previous delta function is defined on the $\omega_i$ or $\omega_o$. Thus we cannot utilize it to solve the integral.</p> <p>We can make the equation more concise if we can introduce the half vector $\vec{h} = \omega_i + \omega_o$ to the equation \eqref{eq: Pure Specular BRDF}. If so, one need to be very causious if the variable of the equation has changed, since the jacobian term needs to be considered. We can assume that, there are many pairs of $\omega_i$ and $\omega_o$ that have the same half vector $\vec{h}$, so the function need to have a transformation term to maintain consistency before and after replacing variable. This is called the <a href="https://en.wikipedia.org/wiki/Change_of_variables"><code class="language-plaintext highlighter-rouge">Change of Variables Theorem</code></a>.</p> <p>By replacing the $\omega_o$ term (the same as $\omega_i$), we can have a new BRDF equation:</p> \[f_m(\omega_m, \omega_i, \omega_o) = \frac{F(\omega_m, \omega_i)\delta_{\omega_m}(\omega_h)}{&lt;\omega_i, \omega_m&gt;} |\frac{\partial\omega_h}{\partial\omega_o}|=\lim_{\mathrm{d}\omega_o\rightarrow0} \frac{F(\omega_m, \omega_i)\delta_{\omega_m}(\omega_h)}{&lt;\omega_i, \omega_m&gt;} |\frac{\mathrm{d}\omega_h}{\mathrm{d}\omega_o}|\] <p>Perhaps we are quite unfamiliar with how to solve this derivative, but Heitz gave us a very genius solution.</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <img src="/assets/posts/ndf_microfacet/derivative.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <div class="caption"> The illustration from Heitz<d-cite key="2014Understanding"></d-cite>. </div> <p>We can directly realize the relationship between $\mathrm{d}\omega_o$ and $\mathrm{d}\omega_h$:</p> \[|\frac{\mathrm{d}\omega_h}{\mathrm{d}\omega_o}| = \frac{|\omega_o \cdot \omega_h|}{||\vec{h}||^2} = \frac{|\omega_o \cdot \omega_h|}{(\omega_h\cdot\vec{h})^2} = \frac{|\omega_o \cdot \omega_h|}{(\omega_h\cdot(\omega_i+\omega_o))^2}=\frac{|\omega_o \cdot \omega_h|}{(2\omega_h\cdot\omega_o)^2} = \frac{1}{4|\omega_h\cdot\omega_o|}.\] <p>Thus, we have:</p> \[f_m(\omega_m, \omega_i, \omega_o) = \frac{F(\omega_m, \omega_i)\delta_{\omega_m}(\omega_h)}{&lt;\omega_i, \omega_m&gt;} \frac{1}{4|\omega_h\cdot\omega_o|}.\] <h3 id="the-microfacet-brdf">The microfacet BRDF</h3> <p>We are almost here! By substituting the microsurface brdf in the \eqref{eq: Statistical Microfacet BRDF}, we could get:</p> \[f_r(\omega_i, \omega_o) = \frac{1}{&lt;\omega_i, \omega_g&gt;&lt;\omega_o, \omega_g&gt;} \int_{\Omega} G(\omega_m, \omega_i, \omega_o) \frac{F(\omega_m, \omega_i)\delta_{\omega_m}(\omega_h)}{&lt;\omega_i, \omega_m&gt;} \frac{1}{4|\omega_h\cdot\omega_o|} D(\omega_m) &lt;\omega_m, \omega_o&gt; &lt;\omega_m, \omega_i&gt; \mathrm{d} \omega_m .\] <p>Since the delta term, we could solve this integral and get:</p> \[f_r(\omega_i, \omega_o) = \frac{1}{&lt;\omega_i, \omega_g&gt;&lt;\omega_o, \omega_g&gt;} G(\omega_h, \omega_i,\omega_o) \frac{F(\omega_h, \omega_i)}{&lt;\omega_i, \omega_h&gt;} \frac{1}{4|\omega_h\cdot\omega_o|} D(\omega_h) &lt;\omega_h, \omega_o&gt; &lt;\omega_h, \omega_i&gt;.\] <p>And now we get the microfacet BRDF:</p> <p>\begin{equation} \label{eq: Microfacet BRDF} f_r(\omega_i, \omega_o) = \frac{F(\omega_h, \omega_i)G(\omega_h, \omega_i, \omega_o) D(\omega_h) }{4&lt;\omega_i, \omega_g&gt;&lt;\omega_o, \omega_g&gt;}. \end{equation}</p>]]></content><author><name></name></author><category term="rendering,"/><category term="basic"/><category term="theory"/><summary type="html"><![CDATA[To show what is NDF and how to derive the general formulate of Microfacet Model]]></summary></entry><entry><title type="html">PaperReading - Compressive Rendering Sensing</title><link href="https://icewired-yy.github.io/blog/2025/EN-paper-reading-compressive-rendering/" rel="alternate" type="text/html" title="PaperReading - Compressive Rendering Sensing"/><published>2025-10-02T12:00:00+00:00</published><updated>2025-10-02T12:00:00+00:00</updated><id>https://icewired-yy.github.io/blog/2025/EN-paper-reading-compressive-rendering</id><content type="html" xml:base="https://icewired-yy.github.io/blog/2025/EN-paper-reading-compressive-rendering/"><![CDATA[<h2 id="compressive-sensing-in-rendering-beyond-light-field-reconstruction">Compressive Sensing in Rendering: Beyond Light Field Reconstruction</h2> <p>In our previous <a href="https://icewired-yy.github.io/blog/2025/EN-paper-reading-compressive-light-transport-sensing/">blog post</a>, we explored the fundamental principles of Compressive Sensing (CS) and its pioneering application in Light Transport reconstruction. To further demonstrate the flexibility and broad potential of CS theory, this post shifts our focus to its application in another core area of computer graphics: image rendering.</p> <p>The work, titled “Compressive Rendering: A Rendering Application of Compressed Sensing,” is motivated by a profound observation of the traditional rendering pipeline. A conventional rendering process typically invests immense computational resources to render every single pixel in an image, creating a complete picture, which is then compressed using formats like JPEG or JPEG2000 for storage and transmission. This process is essentially “build completely, then discard redundancy.” The authors astutely point out that this is clearly a roundabout approach. A natural question arises: can we skip the intermediate steps and directly capture the crucial information of an image during the rendering phase, thus avoiding the waste of precious computation time on redundant information that will eventually be discarded?</p> <p>So, how can we apply the concepts of Compressive Sensing to the rendering process?</p> <h2 id="building-the-linear-relationship">Building the Linear Relationship</h2> <p>Before diving into the specific methodology, a crucial prerequisite must be emphasized: <strong>If you want to apply Compressive Sensing theory to a new problem, the first and most critical step is to mathematically formulate that problem as a linear system.</strong></p> <p>The authors propose a clear linear model for this. Assume our target image \(x\) has a total of \(n\) pixels, but we only render \(k\) of them (where \(k &lt; n\)) via ray tracing, using these results as our observation vector \(y\). This sampling process can be expressed with a linear system:</p> <p>\begin{equation} y = Sx \end{equation}</p> <p>Here, \(S\) is a \(k \times n\) sampling matrix, with only one “1” in each row, designed to “select” the pixels we actually rendered from the full image \(x\). Our goal is to recover the unknown full image \(x\) from the known \(y\) and \(S\).</p> <p>According to CS theory, directly solving this underdetermined system is impossible. However, if we can find a basis in which the signal \(x\) is sparse, the problem becomes solvable. For natural images, the Wavelet Basis is an excellent choice, as it can represent image information very sparsely.</p> <p>Thus, we can introduce the wavelet transform matrix \(\Psi\), allowing the image \(x\) to be represented by the inverse transform of its wavelet coefficients \(\hat{x}\), i.e., \(x = \Psi^{-1}\hat{x}\). Substituting this into our linear system, we get:</p> <p>\begin{equation} y = S\Psi^{-1}\hat{x} \end{equation}</p> <p>This equation establishes a linear relationship between our actual measurements \(y\) and the sparse wavelet coefficients \(\hat{x}\) we aim to solve for. By letting the measurement matrix \(A = S\Psi^{-1}\), the problem is transformed into the standard form of Compressive Sensing: find the sparsest solution \(\hat{x}\) to \(y=A\hat{x}\).</p> <p>However, we have momentarily overlooked another critical condition for CS theory to hold: <strong>the incoherence between the measurement basis and the sparsity basis.</strong></p> <h2 id="incoherence-and-the-invertible-gaussian-filter">Incoherence and the Invertible Gaussian Filter</h2> <p>Once the linear system is established, the next core challenge is to ensure the measurement matrix \(A\) satisfies the requirements of CS, specifically that its constituent measurement basis (point sampling) and sparsity basis (wavelets) have sufficiently low coherence.</p> <p>The bad news is that, in our application, the point sampling basis and the wavelet basis are highly coherent. Directly applying the model above would result in a reconstruction riddled with artifacts.</p> <p>To solve this thorny issue, the authors devised a highly innovative solution. Through experimental observation, they found that if the original image is subjected to a Gaussian blur, the coherence between the point sampling basis and the wavelet basis of the <em>blurred</em> image decreases significantly as the degree of blur increases.</p> <p>This inspired the authors’ core idea: instead of directly reconstructing the original sharp image \(x\), we reconstruct a blurred version \(x_b\). Once the blurred image is successfully reconstructed, we then sharpen it back to our desired clear image \(x\) using an “invertible” Gaussian filter operation.</p> <p>While this idea is clever, it introduces a new trade-off: a stronger blur leads to lower coherence and better CS reconstruction, but the subsequent sharpening (inverse Gaussian filtering) process is more likely to amplify noise, degrading the final image quality. Therefore, the degree of Gaussian blur becomes a critical hyperparameter that needs to be carefully chosen.</p> <p>Based on this, the Compressive Sensing formulation is revised as follows:</p> <ol> <li>We assume the sharp image \(x\) and the blurred image \(x_b\) are related by an invertible Gaussian filter \(\Phi\): \(x = \Phi^{-1}x_b\).</li> <li>Substitute this relationship into the initial sampling equation \(y=Sx\) to get \(y = S\Phi^{-1}x_b\).</li> <li>Next, we assume the blurred image \(x_b\) is sparse in the wavelet domain, i.e., \(x_b = \Psi^{-1}\hat{x}_b\).</li> <li>Finally, we arrive at the new linear system:</li> </ol> <p>\begin{equation} y = S\Phi^{-1}\Psi^{-1}\hat{x}_b \end{equation}</p> <p>In this new framework, the measurement matrix becomes \(A = S\Phi^{-1}\Psi^{-1}\), and the sparse vector we need to solve for is the wavelet coefficients of the blurred image, \(\hat{x}_b\).</p> <p>In the actual implementation, the authors also introduced two key optimizations:</p> <ol> <li> <p><strong>Frequency Domain Computation</strong>: To speed up calculations, the Gaussian convolution and its inverse are performed via matrix multiplication in the Fourier domain. The filter \(\Phi\) can be represented as \(\mathcal{F}^{H}G\mathcal{F}\), where \(\mathcal{F}\) is the Fourier transform and \(G\) is a diagonal matrix containing Gaussian function values. The final form of the measurement matrix is thus \(A = S\mathcal{F}^{H}G^{-1}\mathcal{F}\Psi^{-1}\). Once we solve for \(\hat{x}_b\), the final image is recovered via \(x=\mathcal{F}^{H}G^{-1}\mathcal{F}\Psi^{-1}\hat{x}_b\).</p> </li> <li> <p><strong>Wiener Filter</strong>: A core part of the implementation is how to compute the “invertible” Gaussian filter, which is the sharpening operation \(\Phi^{-1}\). A naive approach of simply dividing each frequency component by the corresponding Gaussian function value in the frequency domain (i.e., multiplying by \(G^{-1}\)with diagonal elements of\(1/G_{i,i}\)) runs into a serious problem. The issue is that the values of a Gaussian function are very close to zero in high-frequency regions. When a number (especially a signal that may contain noise) is divided by a number that is almost zero, the result becomes enormous. This process would dramatically amplify any pre-existing high-frequency noise in the image, resulting in a completely unusable, noisy reconstruction. To solve this, the authors employed a more robust and intelligent method: <strong>using a Wiener filter to approximate the inverse operation of the Gaussian filter</strong>. The formula for the Wiener filter is: \begin{equation} G_{i,i}^{-1} = \frac{G_{i,i}}{G_{i,i}^2 + \lambda} \end{equation} Here, \(\lambda\) is a small positive constant that acts as a regularization parameter. The brilliance of this formula lies in its adaptive nature:</p> <ul> <li><strong>For low and mid-frequency components</strong>: The value of the Gaussian function \(G_{i,i}\)is relatively large, so\(G_{i,i}^2\)is much larger than\(\lambda\). In this case, the denominator is approximately \(G_{i,i}^2\), and the whole expression approximates \(G_{i,i}/G_{i,i}^2 = 1/G_{i,i}\). The effect is nearly identical to a direct inversion, accurately restoring the signal.</li> <li><strong>For high-frequency components</strong>: The value of \(G_{i,i}\)is very small, close to zero. Here, the\(\lambda\) term in the denominator becomes dominant. It prevents the denominator from becoming too small, thereby averting a numerical “explosion”. This effectively suppresses the disproportionate amplification of high-frequency noise.</li> </ul> <p>Therefore, by incorporating the Wiener filter, the authors implemented a robust “inverse Gaussian convolution” operation that can effectively sharpen the image while avoiding noise interference.</p> </li> </ol> <h2 id="summary-and-thoughts">Summary and Thoughts</h2> <p>Through this elegant design, the method successfully applies Compressive Sensing to accelerate rendering. Experimental results show that the framework’s reconstruction quality significantly surpasses traditional interpolation or inpainting algorithms, especially in preserving edges and details in the image. At the same time, the time spent on the reconstruction step is only a small fraction (about 2%) of the total rendering time, proving its efficiency.</p> <p>Of course, the method has its limitations. For instance, it cannot recover isolated, single high-frequency pixel spikes (unless that pixel happens to be sampled), and its performance is inferior to simple interpolation at very low sampling rates (below 5%). Additionally, the algorithm is quite sensitive to the parameters of the Gaussian filter, which requires careful tuning.</p> <p>From my personal perspective, this work opens up an interesting direction. The \(k\) sample points used in the paper are currently selected randomly based on a Poisson-disk distribution, which provides a spatially uniform sampling pattern. A natural extension to consider is: could we combine this method with Importance Sampling? For example, by using a quick preprocessing step or information from a previous frame, we could predict which regions of the image are more complex or contain more detail. By placing more samples in these “important” areas, we might achieve the same or even better reconstruction quality with fewer total samples, further enhancing the efficiency of compressive rendering.</p>]]></content><author><name></name></author><category term="paper-reading"/><category term="paper,"/><category term="rendering,"/><category term="en-blog,"/><category term="compressive-sensing"/><summary type="html"><![CDATA[About Compressive Rendering Sensing]]></summary></entry><entry><title type="html">论文解读 - 渲染压缩感知</title><link href="https://icewired-yy.github.io/blog/2025/paper-reading-compressive-rendering/" rel="alternate" type="text/html" title="论文解读 - 渲染压缩感知"/><published>2025-10-02T12:00:00+00:00</published><updated>2025-10-02T12:00:00+00:00</updated><id>https://icewired-yy.github.io/blog/2025/paper-reading-compressive-rendering</id><content type="html" xml:base="https://icewired-yy.github.io/blog/2025/paper-reading-compressive-rendering/"><![CDATA[<h2 id="压缩感知在渲染中的应用不止于光场重建">压缩感知在渲染中的应用：不止于光场重建</h2> <p>在<a href="https://icewired-yy.github.io/blog/2025/paper-reading-compressive-light-transport-sensing/">之前的博客</a>中，我们探讨了压缩感知（Compressive Sensing, CS）的基本原理及其在光传输（Light Transport）重建上的开创性应用。为了进一步展示压缩感知理论的灵活性和广泛潜力，这篇博客我们将目光转向它在另一个计算机图形学核心领域的应用：图像渲染。</p> <p>这篇名为《Compressive Rendering: A Rendering Application of Compressed Sensing》的工作，其核心动机源于对传统渲染流程的一个深刻观察。 传统的渲染管线通常会投入巨大的计算资源，逐一渲染出图像中的每一个像素，形成一幅完整的图像，随后再通过JPEG或JPEG2000等格式将其压缩以便存储和传输。 这个过程本质上是“先完整构建，再剔除冗余”。作者敏锐地指出，这无疑是在绕远路。一个自然而然的问题浮现出来：我们能否跳过中间步骤，在渲染阶段就直接捕获图像的关键信息，从而避免在那些最终会被丢弃的冗余信息上浪费宝贵的计算时间？</p> <p>那么，如果想将压缩感知的思想应用于渲染过程，我们应该如何着手呢？</p> <h2 id="构建线性关系">构建线性关系</h2> <p>在我们深入探讨具体方法之前，必须强调一个前提：<strong>如果你想在一个新问题上应用压缩感知理论，首要任务，也是最关键的一步，就是将该问题数学化地构建（formulate）为一个线性系统</strong>。</p> <p>作者为此提出了一个清晰的线性模型。假设我们希望得到的目标图像 \(x\) 总共有 \(n\) 个像素，但我们只通过光线追踪渲染了其中的 \(k\) 个像素（其中 \(k &lt; n\)），并将这些渲染结果作为观测向量 \(y\)。 那么，这个采样过程可以用一个线性系统来表达：</p> <p>\begin{equation} y = Sx \end{equation}</p> <p>在这里，\(S\) 是一个 \(k \times n\) 的采样矩阵，它每一行只有一个“1”，用来从完整图像 \(x\) 中“挑选”出我们实际渲染的那个像素。 我们的目标，就是从已知的 \(y\) 和 \(S\) 中恢复出未知的完整图像 \(x\)。</p> <p>根据压缩感知理论，直接求解这个欠定方程是不可行的，但如果我们能找到一个基底，使得信号 \(x\) 在该基底下是稀疏的，问题就迎刃而解。 对于自然图像而言，小波基（Wavelet Basis）正是一个极佳的选择，因为它能非常有效地稀疏表示图像信息。</p> <p>于是，我们可以引入小波变换矩阵 \(\Psi\)，使得图像 \(x\) 可以表示为其小波系数 \(\hat{x}\) 的逆变换，即 \(x = \Psi^{-1}\hat{x}\)。 将此关系代入我们的线性系统，得到：</p> <p>\begin{equation} y = S\Psi^{-1}\hat{x} \end{equation}</p> <p>这个方程建立了我们实际测量值 \(y\) 与待求解的稀疏小波系数 \(\hat{x}\) 之间的线性关系。令测量矩阵 \(A = S\Psi^{-1}\)，问题就转化为了压缩感知的标准形式：求解 \(y=A\hat{x}\) 的最稀疏解 \(\hat{x}\)。</p> <p>然而，我们暂时忽略了压缩感知理论成立所需的另一个关键条件：<strong>测量基与稀疏基之间的非相干性（incoherence）</strong>。</p> <h2 id="非相干性与可逆高斯滤波">非相干性与可逆高斯滤波</h2> <p>线性系统建立后，下一个核心挑战是如何确保测量矩阵 \(A\) 满足压缩感知的要求，特别是其构成的测量基（点采样）与稀疏基（小波）之间具有足够低的相干性。</p> <p>一个坏消息是，在我们的应用中，点采样基和小波基之间存在很强的相干性。 直接应用上述模型会导致重建结果充满伪影。</p> <p>为了解决这个棘手的问题，作者提出了一个极具创造性的方案。他们通过实验观察发现，如果对原始图像进行高斯模糊，随着模糊程度的提高，点采样基与模糊图像的小波基之间的相干性会显著降低。</p> <p>这启发了作者的核心思路：我们不直接重建原始清晰图像 \(x\)，而是去重建一张模糊后的图像 \(x_b\)，待重建成功后，再通过一个“可逆”的高斯滤波操作将模糊图像 \(x_b\) 锐化回我们想要的清晰图像 \(x\)。</p> <p>这个思路虽然巧妙，但也带来一个新的权衡：模糊程度越大，相干性越低，压缩感知重建的效果就越好；但与此同时，后续的锐化（逆高斯滤波）过程也更容易放大噪声，从而降低最终图像的质量。 因此，高斯模糊的程度成为了一个需要被精心选择的关键超参数。</p> <p>基于此，压缩感知求解的公式被修正为：</p> <ol> <li>我们假设清晰图像 \(x\) 和模糊图像 \(x_b\) 之间通过一个可逆的高斯滤波器 \(\Phi\) 相关联：\(x = \Phi^{-1}x_b\)。</li> <li>将此关系代入最初的采样方程 \(y=Sx\)，得到 \(y = S\Phi^{-1}x_b\)。</li> <li>接着，我们假设模糊图像 \(x_b\) 在小波域是稀疏的，即 \(x_b = \Psi^{-1}\hat{x}_b\)。</li> <li>最终，我们得到新的线性系统：</li> </ol> <p>\begin{equation} y = S\Phi^{-1}\Psi^{-1}\hat{x}_b \end{equation}</p> <p>在这个新的框架下，测量矩阵变为了 \(A = S\Phi^{-1}\Psi^{-1}\)，而我们需要求解的稀疏向量是模糊图像的小波系数 \(\hat{x}_b\)。</p> <p>在具体的实现中，作者还引入了两个关键的优化：</p> <ol> <li> <p><strong>频域计算</strong>：为了加速，高斯卷积和其逆操作在傅里叶频域中通过矩阵乘法完成。滤波器 \(\Phi\) 可以表示为 \(\mathcal{F}^{H}G\mathcal{F}\)，其中 \(\mathcal{F}\) 是傅里叶变换，而 \(G\) 是一个对角矩阵，其对角线元素是高斯函数值。 因此，测量矩阵的最终形式为 \(A = S\mathcal{F}^{H}G^{-1}\mathcal{F}\Psi^{-1}\)。 当我们求解得到 \(\hat{x}_b\) 后，最终图像通过 \(x=\mathcal{F}^{H}G^{-1}\mathcal{F}\Psi^{-1}\hat{x}_b\) 来还原。</p> </li> <li> <p><strong>维纳滤波器（Wiener Filter）</strong>： 在实现中，一个核心环节是如何计算“可逆”的高斯滤波，也就是锐化操作 \(\Phi^{-1}\)。如果简单地在频域中用每个频率分量去除以对应的高斯函数值（即乘以 \(G^{-1}\)，其中对角元素为 \(1/G_{i,i}\)），会遇到一个严重的问题。 问题在于，高斯函数在高频区域的值非常接近于0。当一个数（尤其是可能带有噪声的信号）除以一个几乎为0的数时，结果会变得极大。这个过程会极度放大图像中原本存在的高频噪声，导致重建出的图像充满噪点，完全不可用。 为了解决这个问题，作者采用了一种更稳健、更智能的方法：<strong>使用维纳滤波器（Wiener filter）来近似实现高斯滤波的逆操作</strong>。维纳滤波器的公式如下： \begin{equation} G_{i,i}^{-1} = \frac{G_{i,i}}{G_{i,i}^2 + \lambda} \end{equation} 这里的 \(\lambda\) 是一个很小的正常数，作为正则化参数。这个公式的精妙之处在于它的自适应性：</p> <ul> <li><strong>对于低频和中频部分</strong>：高斯函数的值 \(G_{i,i}\)较大，因此\(G_{i,i}^2\)远大于\(\lambda\)。此时，分母约等于 \(G_{i,i}^2\)，整个式子约等于 \(G_{i,i}/G_{i,i}^2 = 1/G_{i,i}\)。效果就和直接求逆几乎一样，能够准确还原信号。</li> <li><strong>对于高频部分</strong>：高斯函数的值 \(G_{i,i}\)非常小，接近于0。此时，分母中的\(\lambda\) 起到了主导作用，它防止了分母因过小而导致结果“爆炸”。这有效地抑制了高频噪声被不成比例地放大。</li> </ul> <p>因此，通过引入维纳滤波器，作者实现了一个既能有效锐化图像，又能避免噪声干扰的、鲁棒的“逆高斯卷积”操作。</p> </li> </ol> <h2 id="总结与思考">总结与思考</h2> <p>通过上述精巧的设计，该方法成功地将压缩感知应用于渲染加速。实验结果表明，该框架的重建质量显著优于传统的插值或图像修复（inpainting）算法，尤其在保留图像的边缘和细节方面表现出色。 同时，重建步骤所花费的时间仅占总渲染时间的一个很小比例（约2%），证明了其高效性。</p> <p>当然，该方法也存在一些局限性。例如，它无法恢复孤立的单个高频像素点（除非该点恰好被采样到），并且在极低的采样率（低于5%）下，其性能会劣于简单的插值。 此外，算法对高斯滤波器的参数比较敏感，需要仔细调优。</p> <p>从我个人的角度看，这项工作开启了一个有趣的方向。目前论文中用于采样的 \(k\) 个点是基于泊松盘分布随机选择的，这是一种在空间上相对均匀的采样方式。 一个很自然的想法是：我们是否能将这种方法与重要性采样（Importance Sampling）相结合？例如，通过一个快速的预处理步骤或者利用前一帧的信息，来预测图像中哪些区域更为复杂、包含更多细节，从而在这些“重要”区域放置更多的采样点。这或许能用更少的总采样数达到同等甚至更好的重建质量，进一步提升压缩渲染的效率。</p>]]></content><author><name></name></author><category term="paper-reading"/><category term="paper,"/><category term="rendering,"/><category term="cn-blog,"/><category term="compressive-sensing"/><summary type="html"><![CDATA[About Compressive Rendering Sensing]]></summary></entry><entry><title type="html">PaperReading - Compressive Light Transport Sensing</title><link href="https://icewired-yy.github.io/blog/2025/EN-paper-reading-compressive-light-transport-sensing/" rel="alternate" type="text/html" title="PaperReading - Compressive Light Transport Sensing"/><published>2025-08-17T12:00:00+00:00</published><updated>2025-08-17T12:00:00+00:00</updated><id>https://icewired-yy.github.io/blog/2025/EN-paper-reading-compressive-light-transport-sensing</id><content type="html" xml:base="https://icewired-yy.github.io/blog/2025/EN-paper-reading-compressive-light-transport-sensing/"><![CDATA[<h1 id="introduction">Introduction</h1> <h2 id="what-is-compressive-sensing">What is Compressive Sensing?</h2> <p>Before understanding compressive sensing, we first need to have a concept of compression itself. If a signal, when projected onto a space spanned by a certain set of bases, has a significant coefficient concentration effect (large coefficients are mainly distributed on a small number of bases, while the coefficients on the remaining bases are very small), then if we only keep those bases with significant coefficients and their coefficients, the reconstructed signal can still retain most of the information of the original signal. We call this space a compressible space, and this signal is compressible in certain specific spaces.</p> <p>And we know that, according to the Nyquist sampling theorem, if we do not perform any transformation on the signal, our sampling frequency must be at least twice the frequency of the measured signal to fully recover the complete signal.</p> <p>Compressive Sensing studies the following problem: can we project a signal into a specific compressible space, changing the objective to reconstructing a k-sparse compressed signal, thereby reconstructing the original signal with a number of samples significantly smaller than what is required by the Nyquist theorem?</p> <blockquote> <p>K-Sparse: Refers to a vector containing at most k non-zero elements.</p> </blockquote> <h1 id="underdetermined-linear-systems">Underdetermined Linear Systems</h1> <p>For a linear equation such as this: \begin{equation} \boldsymbol{y} = \boldsymbol{A}\boldsymbol{x} \end{equation} where \(\boldsymbol{x}\) is an n-dimensional <strong>unknown signal to be solved for</strong>, \(\boldsymbol{A}\) is an \(m \times n\) matrix, which is the <strong>measurement matrix</strong> used to measure the unknown signal \(\boldsymbol{x}\), and the vector \(\boldsymbol{y}\) is an m-dimensional column vector, which is the <strong>observation result</strong> after measurement. Generally, if we can ensure that \(m&gt;n\) and the rank of \(\boldsymbol{A}\) is greater than or equal to \(n\), then this unknown signal \(\boldsymbol{x}\) can be solved for precisely. However, if \(m \ll n\), then this equation is <strong>underdetermined</strong>, and its solution is not unique.</p> <p>There are many existing algorithms for finding the sparse solution of underdetermined linear systems, such as Basis Pursuit.</p> <h2 id="lets-look-again-at-what-compressive-means">Let’s look again at what <strong>Compressive</strong> means?</h2> <p>As mentioned, Compressive Sensing aims to find a sparse solution in an underdetermined linear system. The “compressive” nature is embedded in the sparsity of this solution. Why is that?</p> <p>Any vector is essentially a discrete representation under a certain set of basis vectors, such as the common basis in three-dimensional space: \((1, 0, 0), (0, 1, 0), (0, 0, 1)\). <strong>If our solution is sparse, it means that in a space spanned by \(n\) basis vectors, the signal can be represented using only \(k\) of them.</strong> The remaining basis vectors are clearly negligible. We only need to retain these \(k\) basis vectors to reconstruct the original signal with high fidelity, thus achieving <strong>compression</strong>.</p> <h1 id="applying-compressive-sensing-to-light-transport">Applying Compressive Sensing to Light Transport</h1> <p>Consider a scene, for example:</p> <figure> <picture> <img src="/assets/posts/image.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> <p><strong>In this scene, every object, including the camera and the light source, is fixed.</strong> We want to find a Light Transport function such that when our light source changes, we can solve for the scene under new lighting conditions. As you can see, this is a constrained Capture-Relighting problem.</p> <p>Previous work has shown that this problem can be described by a linear equation: \begin{equation} \boldsymbol{C}=\boldsymbol{T}\boldsymbol{L} \end{equation} Here, \(\boldsymbol{C}\) is a \(p \times m\) matrix of observed results, where \(p\) is the number of pixels and \(m\) is the number of captures (observations). \(\boldsymbol{T}\) is a \(p \times n\) Light Transport matrix, where \(n\) is the number of light source parameters. \(\boldsymbol{L}\) is an \(n \times m\) matrix describing the light sources.</p> <p>This method of directly describing the rendering process with a linear equation has the following conventions:</p> <ul> <li><strong>There is no need to model camera parameters, light source sizes, or the relative positions of objects.</strong> All this information is implicitly contained within the Light Transport Matrix (somewhat like a neural network).</li> <li><strong>The light source needs to be parameterized.</strong> For instance, if the source consists of m point lights, the dimension of the light parameter vector is \(m\). For a constant, uniform area light, the dimension is \(1\). For an area light controlled by a texture map, the dimension is the number of pixels in the texture, \(p'\). Each value can represent radiance.</li> </ul> <p>In this equation, each row vector of \(\boldsymbol{T}\) is a Light Transport Function (or Reflectance Function) that we need to solve for. To solve for this matrix, suppose we have a complex light source with \(128 \times 128\) resolution (like a textured area light). A single pixel would require solving for 16,384 coefficients, meaning we would need to capture 16,384 sets of results. This clearly involves a massive overhead and is unacceptable for practical applications.</p> <p>Therefore, we want to introduce Compressive Sensing to the problem of solving for the Light Transport Function—to reduce the number of measurements and to solve for a sparse solution by transforming the function into a compressible basis. As you noted, previous research has indeed confirmed that <strong>Reflectance Functions</strong> are compressible in certain bases (like wavelets or spherical harmonics). This means we can approximate the original reflectance function with high accuracy using far fewer than \(n\) coefficients.</p> <h2 id="transforming-to-the-haar-wavelet-space">Transforming to the Haar Wavelet Space</h2> <p>The Haar wavelet is a very simple set of wavelet basis functions, composed only of the elements <code class="language-plaintext highlighter-rouge">0, -1, 1</code>, and the basis vectors are mutually orthogonal. Haar wavelets are also commonly used for image compression.</p> <p>To leverage the sparsity of the reflectance function, we need to transform the solving process into our chosen basis space. Here, we’ll use a general orthogonal basis \(\boldsymbol{B}\) as an example (which is the Haar wavelet basis in the paper’s implementation).</p> <p>We start with the original light transport equation: \begin{equation} \boldsymbol{C} = \boldsymbol{T}\boldsymbol{L} \end{equation} Our goal is to solve for the sparse coefficient matrix \(\hat{\boldsymbol{T}}\) in the basis \(\boldsymbol{B}\), rather than the dense matrix \(\boldsymbol{T}\). We’ll adopt the convention that each column vector of \(\boldsymbol{B}\) is a basis vector. We can insert an identity matrix \(\boldsymbol{I} = \boldsymbol{B}\boldsymbol{B}^T\) into the equation: \begin{equation} \boldsymbol{C} = \boldsymbol{T}(\boldsymbol{B}\boldsymbol{B}^T)\boldsymbol{L} = (\boldsymbol{T}\boldsymbol{B})(\boldsymbol{B}^T\boldsymbol{L}) \end{equation} We define the transport matrix in basis \(\boldsymbol{B}\) as \(\hat{\boldsymbol{T}} = \boldsymbol{T}\boldsymbol{B}\). Now, each row of \(\hat{\boldsymbol{T}}\) is a sparse or compressible vector. The equation becomes: \begin{equation} \boldsymbol{C} = \hat{\boldsymbol{T}}(\boldsymbol{B}^T\boldsymbol{L}) \end{equation} This form is not yet a standard CS equation. To construct a standard CS problem, we need to carefully design our <strong>illumination pattern matrix</strong> \(\boldsymbol{L}\) to eliminate the extra \(\boldsymbol{B}^T\) term. The paper proposes designing the illumination patterns as the product of the basis functions and a specially designed measurement matrix \(\boldsymbol{\Phi}\): \begin{equation} \boldsymbol{L} = \boldsymbol{B}\boldsymbol{\Phi} \end{equation} Here, \(\boldsymbol{\Phi}\) is an \(n \times m\) matrix that meets the requirements of CS theory (which we will detail later). Substituting this carefully designed \(\boldsymbol{L}\) gives: \begin{equation} \boldsymbol{C} = \hat{\boldsymbol{T}}(\boldsymbol{B}^T(\boldsymbol{B}\boldsymbol{\Phi})) \end{equation} Since \(\boldsymbol{B}\) is an orthogonal basis, \(\boldsymbol{B}^T\boldsymbol{B} = \boldsymbol{I}\), which greatly simplifies the equation to: \begin{equation} \boldsymbol{C} = \hat{\boldsymbol{T}}\boldsymbol{\Phi} \end{equation}</p> <p>Now, let’s consider the case for a single pixel \(i\), which corresponds to the \(i\)-th row of the matrices: \(\boldsymbol{c}_{i,.} = \hat{\boldsymbol{t}}_{i,.}\boldsymbol{\Phi}\) By transposing this row vector equation, we get: \(\boldsymbol{c}_{i,.}^T = \boldsymbol{\Phi}^T \hat{\boldsymbol{t}}_{i,.}^T\) This equation perfectly matches the standard CS form \(\boldsymbol{y} = \boldsymbol{A}\boldsymbol{x}\) that we introduced at the beginning. Here, \(\boldsymbol{y} = \boldsymbol{c}_{i,.}^T\) represents our \(m\) observations for pixel \(i\), \(\boldsymbol{A} = \boldsymbol{\Phi}^T\) is the measurement matrix, and \(\boldsymbol{x} = \hat{\boldsymbol{t}}_{i,.}^T\) is precisely the sparse wavelet coefficient vector for pixel \(i\) that we want to solve for.</p> <p>Through this transformation, we have successfully converted the problem of solving for light transport into a standard compressive sensing problem that can be solved independently for each pixel.</p> <h1 id="how-to-solve-for-the-coefficients">How to Solve for the Coefficients?</h1> <h2 id="the-importance-of-the-measurement-matrix">The Importance of the Measurement Matrix</h2> <p>Through the derivation above, we have successfully formulated a compressive sensing problem for each pixel: \(\boldsymbol{c}_{i,.}^T = \boldsymbol{\Phi}^T \hat{\boldsymbol{t}}_{i,.}^T\). The question now is, how should we choose the measurement matrix \(\boldsymbol{\Phi}\) (or its transpose \(\boldsymbol{A} = \boldsymbol{\Phi}^T\)) to ensure that we can stably and accurately recover the sparse signal \(\boldsymbol{x}\) from the underdetermined observations \(\boldsymbol{y}\)?</p> <h3 id="the-restricted-isometry-property-rip">The Restricted Isometry Property (RIP)</h3> <p>Clearly, the measurement matrix \(\boldsymbol{A}\) cannot be chosen arbitrarily. It must possess a special structure to ensure that different sparse signals are mapped to sufficiently different measurement results, thus avoiding ambiguity during the solution process. This core condition is known as the <strong>Restricted Isometry Property (RIP)</strong>.</p> <p>The mathematical definition of RIP is as follows: A matrix \(\boldsymbol{A}\) is said to satisfy the k-RIP if there exists a small constant \(\delta_k \in (0, 1)\) such that for <strong>any k-sparse</strong> vector \(\boldsymbol{x}\), the following inequality holds: \begin{equation} (1-\delta_k)|\boldsymbol{x}|_2^2 \le |\boldsymbol{A}\boldsymbol{x}|_2^2 \le (1+\delta_k)|\boldsymbol{x}|_2^2 \end{equation} The intuitive meaning of this property is that the matrix \(\boldsymbol{A}\), when applied to any sparse vector, acts approximately as an <strong>isometry</strong>. That is, it nearly preserves the Euclidean length (i.e., energy) of sparse signals. It doesn’t “squash” two different sparse signals together, nor does it excessively stretch them. It is this property that guarantees the stability and uniqueness of recovering \(\boldsymbol{x}\) from the measurements \(\boldsymbol{y}\).</p> <h3 id="mutual-coherence">Mutual Coherence</h3> <p>While RIP is the “gold standard” for guaranteeing signal recovery, it has a major practical drawback: verifying whether a given matrix satisfies RIP is an NP-hard problem. This makes it unsuitable as a practical criterion for matrix design.</p> <p>Fortunately, we can satisfy RIP indirectly by using a more easily computable metric: <strong>Mutual Coherence</strong>, denoted by \(\mu\). Previous research has established that an upper bound for the RIP constant \(\delta_k\) can be expressed in terms of the mutual coherence \(\mu\) and the sparsity \(k\): \begin{equation} \delta_k \le (k-1)\mu \end{equation} This inequality tells us that if we can design a matrix with a very low mutual coherence \(\mu\), and the signal’s sparsity \(k\) is not too large, we can guarantee that its RIP constant \(\delta_k\) will also be small. This makes mutual coherence a practical and sufficient substitute for RIP.</p> <p>The definition of mutual coherence is as follows: For a matrix \(\boldsymbol{A}\) with all its columns \(a_j\) normalized, its mutual coherence \(\mu(\boldsymbol{A})\) is defined as the maximum absolute value of the inner product between any two distinct columns. \begin{equation} \mu(\boldsymbol{A}) = \max_{i \neq j} |\langle a_i, a_j \rangle| = \max_{i \neq j} |a_i^T a_j| \end{equation} It measures the “worst-case” similarity between the columns of the matrix. A low value of \(\mu\) means all column vectors are nearly orthogonal, which is exactly what we desire.</p> <h3 id="random-matrices-are-the-answer">Random Matrices are the Answer</h3> <p>Our objective becomes clear: we need to find a measurement matrix \(\boldsymbol{A}\) that is both full-rank (to provide sufficient measurement information) and has the lowest possible mutual coherence \(\mu\).</p> <p>A surprising but highly effective answer is: <strong>randomization</strong>. Random matrices, such as Gaussian or Bernoulli random matrices, satisfy both of these requirements with a very high probability.</p> <ul> <li><strong>Gaussian Random Matrix</strong>: Each entry in the matrix is independently drawn from a standard Gaussian distribution.</li> <li><strong>Bernoulli Random Matrix</strong>: Each entry in the matrix is randomly assigned a value of +1 or -1 with equal probability.</li> </ul> <blockquote> <p><strong>Why do Random Matrices Work?</strong></p> <ol> <li><strong>Full Rank Property</strong>: In a high-dimensional space, it is almost impossible for a random vector to lie exactly in the subspace spanned by several other random vectors. Therefore, an \(m \times n\) (\(m &lt; n\)) random matrix will have linearly independent rows (or columns) with extremely high probability, thus ensuring its rank is \(m\).</li> <li><strong>Low Mutual Coherence</strong>: This stems from the phenomenon of measure concentration in high dimensions. In a high-dimensional space, any two randomly generated unit vectors are, with very high probability, nearly orthogonal. Their inner product (i.e., coherence) is highly concentrated around its expected value of 0, and the probability of it being far from 0 decreases exponentially with the dimension. Consequently, a matrix composed of random vectors will have very low coherence between any two of its columns.</li> </ol> </blockquote> <p>In the practical application of the paper <strong><em>Compressive Light Transport Sensing</em></strong>, the authors ultimately choose a <strong>binary, Bernoulli-like</strong> illumination pattern. This decision is primarily based on several practical engineering considerations:</p> <ul> <li><strong>Convenience of Numerical Implementation</strong>: Binary patterns (e.g., light on/off) are very easy to implement on physical devices like monitors or projectors. Using a Gaussian matrix would require mapping floating-point values to the limited RGB levels of a display, a quantization process that would introduce errors.</li> <li><strong>No Need for Precise Gamma Correction</strong>: By using only the maximum and minimum intensity values, the system is insensitive to the non-linear response (Gamma curve) of the display device, which removes a potential source of calibration error.</li> <li><strong>Signal-to-Noise Ratio (SNR) Considerations</strong>: The paper specifically designs a “binary segregated ensemble” pattern. This pattern is crafted to maximize the dynamic range during measurement, thereby improving the SNR in a way that is difficult to achieve with standard Gaussian patterns.</li> </ul> <h1 id="spatial-coherency">Spatial Coherency</h1> <p>So far, we have established and solved an independent compressive sensing problem for each pixel. While this “brute-force per-pixel” approach is theoretically sound, it overlooks a crucial piece of prior information: <strong>the reflectance functions of adjacent pixels in an image are typically highly correlated</strong>. For instance, within an area of uniform material, the reflective behavior of neighboring pixels is nearly identical. Solving for them in complete isolation can cause small measurement noises to introduce unnatural variations in the reconstructed adjacent reflectance functions, leading to visible noise or artifacts in the final rendered image.</p> <h2 id="a-coarse-to-fine-reconstruction-strategy">A Coarse-to-Fine Reconstruction Strategy</h2> <p>To address this issue and leverage the spatial coherency between pixels to regularize the solution process and improve quality, the paper proposes a <strong>coarse-to-fine</strong> hierarchical reconstruction algorithm. The core idea is that instead of directly solving at the original resolution, we start with a very coarse, low-resolution version of the image to robustly estimate an approximate solution for the reflectance functions. Then, we progressively increase the resolution, using the coarser but more robust solution from the previous level as an initial guess to guide the solution at the current, finer level.</p> <p>This multi-resolution structure can be naturally obtained from our chosen <strong>Haar wavelet basis</strong>. The wavelet transform is inherently a multi-resolution analysis tool. After performing a multi-level wavelet decomposition on an image, we get a series of sub-band images at different frequencies. The low-frequency component is essentially a downsampled and blurred version of the original image at different scales. This naturally provides the image pyramid needed for our coarse-to-fine strategy.</p> <h2 id="solving-for-the-difference-signal">Solving for the Difference Signal</h2> <p>The key to the hierarchical algorithm is how to effectively use the solution from the previous level to guide the solution at the next. Let’s assume we have obtained the sparse coefficients for the reflectance function, \(\hat{\boldsymbol{t}}_{init}\), from level \(l-1\) (a coarser level). When we move to level \(l\) (a finer level), it is reasonable to believe that the true solution for this level, \(\hat{\boldsymbol{t}}\), will not be very different from the initial guess, \(\hat{\boldsymbol{t}}_{init}\), inherited from the level above.</p> <p>Therefore, instead of directly solving for the large and complex \(\hat{\boldsymbol{t}}\), we can shift our perspective and solve for the <strong>difference signal</strong>, \(\boldsymbol{d}\), between them: \begin{equation} \hat{\boldsymbol{t}} = \hat{\boldsymbol{t}}_{init} + \boldsymbol{d} \end{equation} Since \(\hat{\boldsymbol{t}}_{init}\) is already a good approximation, we can expect the difference signal \(\boldsymbol{d}\) to be <strong>even sparser</strong> than the original signal \(\hat{\boldsymbol{t}}\) itself. In the compressive sensing framework, solving for a sparser signal is generally more robust and accurate.</p> <p>We can substitute this idea into our familiar CS equation, \(\boldsymbol{y} = \boldsymbol{A}\hat{\boldsymbol{t}}\) (where \(\boldsymbol{y}\) is the observation and \(\boldsymbol{A}\) is the measurement matrix): \begin{equation} \boldsymbol{y} = \boldsymbol{A}(\hat{\boldsymbol{t}}_{init} + \boldsymbol{d}) = \boldsymbol{A} \hat{\boldsymbol{t}}_{init} + \boldsymbol{A}\boldsymbol{d} \end{equation}</p> <p>By moving the known terms to the left side, we formulate a new compressive sensing problem: \begin{equation} \boldsymbol{y} - \boldsymbol{A}\hat{\boldsymbol{t}}_{init} = \boldsymbol{A}\boldsymbol{d} \end{equation} In this new problem, the unknown we need to solve for is the sparse difference signal \(\boldsymbol{d}\), and our new “measurement” becomes the <strong>residual</strong>, \((\boldsymbol{y} - \boldsymbol{A}\hat{\boldsymbol{t}}_{init})\). After solving for \(\boldsymbol{d}\), we can update our solution at the current level with a simple addition: \(\hat{\boldsymbol{t}} = \hat{\boldsymbol{t}}_{init} + \boldsymbol{d}\).</p> <p>The entire algorithm starts at the top of the pyramid (the coarsest level) and iteratively performs this “solve for difference and update” process until it reaches the bottom level at the original resolution. In this manner, spatial coherency is implicitly passed down and reinforced through the levels, resulting in a reconstructed reflectance field that is spatially smoother and more continuous, significantly outperforming the independent per-pixel approach.</p> <h1 id="final-optimization---separating-high-and-low-frequency-information">Final Optimization - Separating High and Low-Frequency Information</h1> <p>Up to this point, we have applied the sparse recovery of the reflectance function uniformly across all Haar wavelet basis functions. However, based on prior knowledge, we understand that an object’s Reflectance Function is typically composed of a smooth, low-frequency component (like diffuse reflection) and sparse, high-frequency details (like specular highlights). The low-frequency part is almost always present and thus its coefficients in the wavelet domain are not sparse, whereas the high-frequency details are the truly sparse component.</p> <p>This observation inspires a potential optimization strategy: we can split the acquisition process into two parts.</p> <ol> <li>For the low-frequency information, which contributes most of the energy but is not sparse, we can bypass compressive sensing and use a more direct and precise “full sampling” method to solve for its corresponding wavelet coefficients.</li> <li>For the genuinely sparse high-frequency details, we continue to use the compressive sensing approach for efficient capture.</li> </ol> <p>This hybrid strategy can lead to more robust reconstruction results. To achieve the precise recovery of the low-frequency components, we can introduce <strong>Hadamard Patterns</strong>. A Hadamard matrix is a square matrix consisting of +1 and -1, whose rows (and columns) are mutually orthogonal. Illuminating and measuring with a full set of Hadamard patterns is equivalent to performing a complete Fourier-like transform on the signal, from which the original signal can be perfectly and accurately recovered.</p> <p>In low-dimensional cases (e.g., when we only care about the first few dozen low-frequency wavelet coefficients), using Hadamard patterns is more advantageous than using Bernoulli random matrices. This is because a complete Hadamard matrix is <strong>guaranteed</strong> to be full-rank and orthogonal, whereas a random matrix can only guarantee these properties with <strong>very high probability</strong>. For the critical low-frequency part of the signal that forms its backbone, this deterministic guarantee is crucial.</p> <h1 id="extra-bonus">Extra Bonus</h1> <p>In our previous discussion, “mutual coherence” was used to measure the similarity between the column vectors within a single measurement matrix. In fact, this concept can be generalized to describe the degree of <strong>incoherence between two different bases</strong>.</p> <p>This leads to a profound observation known as the <strong>Generalized Uncertainty Principle</strong>: A signal cannot be sparse in two mutually incoherent bases <strong>simultaneously</strong>.</p> <p>The most classic example of this principle is the duality between the time and frequency domains:</p> <ul> <li>A <strong>Dirac Delta Function</strong> is perfectly sparse in the time domain (the standard basis), as it has a non-zero value at only a single point in time.</li> <li>However, its Fourier transform is a complex exponential with constant magnitude across the entire frequency domain (the Fourier basis), meaning its energy is uniformly spread across all frequencies, making it perfectly <strong>dense</strong>.</li> </ul> <p>Conversely, a single-frequency sine wave is sparse in the frequency domain but extends infinitely in the time domain, appearing dense. This phenomenon of “energy spreading” intuitively explains the uncertainty principle: the more “certain” (sparse) a signal is in one domain, the more “uncertain” (dense) it must be in another incoherent domain.</p> <p>This relationship is not just a qualitative observation; it has a rigorous mathematical proof, similar to the <strong>Heisenberg Uncertainty Principle</strong> in physics. This theory states that it is impossible to simultaneously know both the position and the momentum of a particle, with the uncertainty in a particle’s position expressed as: \begin{equation} \Delta x \Delta p \ge \frac{h}{4\pi}. \end{equation} In the field of signal processing, a more generalized uncertainty principle has been proven. For any given signal, if its sparsity in basis \(\Phi\) is \(||\boldsymbol{\alpha}||_0 = A\) and its sparsity in basis \(\Psi\) is \(||\boldsymbol{\beta}||_0 = B\), they must satisfy the following inequality: \begin{equation} \sqrt{A \cdot B} \ge \frac{1}{M} \end{equation} or a slightly weaker but more commonly used form: \begin{equation} A + B \ge \frac{2}{M} \end{equation} Here, \(M\) is the mutual coherence between the two bases \(\Phi\) and \(\Psi\), defined as \(M = \sup_ {i,j} |\langle \phi_i, \psi_j \rangle|\). This inequality precisely quantifies our observation: if two bases are highly incoherent (meaning M is very small), then \(1/M\) will be very large. This forces at least one of A or B to be large, confirming that a signal cannot be sparse in both bases simultaneously.</p>]]></content><author><name></name></author><category term="paper-reading"/><category term="paper,"/><category term="rendering,"/><category term="relighting,"/><category term="en-blog,"/><category term="compressive-sensing"/><summary type="html"><![CDATA[About Compressive Light Transport Sensing]]></summary></entry><entry><title type="html">论文解读 - 光线传输的压缩感知 （Compressive Light Transport Sensing）</title><link href="https://icewired-yy.github.io/blog/2025/paper-reading-compressive-light-transport-sensing/" rel="alternate" type="text/html" title="论文解读 - 光线传输的压缩感知 （Compressive Light Transport Sensing）"/><published>2025-08-17T12:00:00+00:00</published><updated>2025-08-17T12:00:00+00:00</updated><id>https://icewired-yy.github.io/blog/2025/paper-reading-compressive-light-transport-sensing</id><content type="html" xml:base="https://icewired-yy.github.io/blog/2025/paper-reading-compressive-light-transport-sensing/"><![CDATA[<h1 id="引言">引言</h1> <h2 id="什么是压缩感知">什么是压缩感知？</h2> <p>在了解压缩感知之前，我们需要先对压缩本身有一个概念。如果一个信号，对于某一组基所展成的空间，其投影信号具有明显的系数集中效应（大系数主要分布在一小部分基上，而其余基上的系数很小），这个时候，如果我们只保留那些具有显著系数的基和其系数，那么重建的信号仍然能保留绝大部分原始信号的信息。我们称这个空间是可压缩空间，这个信号在某些特定空间内可压缩。</p> <p>而我们知道，如果我们不对信号做任何变换，根据奈奎斯特采样定理，我们的采样频率至少要是被测信号频率的两倍，才能恢复完整的信号。</p> <p>Compressive Sensing（压缩感知）研究的是这样一个问题：我们能不能把信号投影到某一个特定的可压缩空间下，将目标变成重建出稀疏度为k的已压缩信号，从而以显著小于奈奎斯特采样要求的采样数重建出原始信号？</p> <blockquote> <p>K-Sparse （稀疏度为k）：是指向量中最多包含k个非0元素。</p> </blockquote> <h1 id="欠定线性系统">欠定线性系统</h1> <p>对于这样一个线性方程 \begin{equation} \boldsymbol{y} = \boldsymbol{A}\boldsymbol{x} \end{equation} 其中\(\boldsymbol{x}\)是n维的<strong>待求解未知信号</strong>，\(\boldsymbol{A}\)是一个\(m \times n\)的矩阵，是用于测量未知信号\(\boldsymbol{x}\)的<strong>测量矩阵</strong>，向量 \(\boldsymbol{y}\)是一个\(m\)维的列向量，是测量后的<strong>观测结果</strong>。一般来说，如果我们可以保证\(m&gt;n\)且\(\boldsymbol{A}\)的秩大于等于\(n\)，那这个未知信号\(\boldsymbol{x}\)是可以精确地求解出来的。但是如果\(m \ll n\)，那么这个方程是<strong>欠定的</strong>，解不唯一。</p> <p>求解欠定线性系统的稀疏解有许多现存的算法，如基追踪(Basis Pursuit)。</p> <h2 id="再来看看什么是compressive">再来看看什么是<strong>Compressive</strong>?</h2> <p>刚才提到，Compressive Sensing希望在欠定线性系统中寻找出一个稀疏解，而Compressive就蕴含在这个解的稀疏性上。为什么这么说呢？</p> <p>因为，任何向量实际上都是在某一组基底下的离散表示，比如最常见的三维空间中的基底：\((1, 0, 0), (0, 1, 0), (0, 0, 1)\)。<strong>而如果我们的解是稀疏的，那么就意味着，在一个由\(n\)个基底构成的空间中，这个信号只需要用\(k\)个基底进行表示即可</strong>。那么显然其余的基底都是可以忽略的，我们只需要保留这\(k\)个基底就能够高保真地重建出原始信号，因此实现了<strong>压缩</strong>。</p> <h1 id="将compressive-sensing引入到light-transport">将Compressive Sensing引入到Light Transport</h1> <p>在一个场景中，比如:</p> <figure> <picture> <img src="/assets/posts/image.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> <p><strong>其中场景中的每一个物体，包括相机与光源，都是固定不动的</strong>。我们希望找到一个Light Transport函数，使得当我们的光源发生变化时，我可以求解出新的光照下的场景。可以见到，这个问题是一个有限制的Capture-Relighting问题。</p> <p>前人的工作已经证明了，这个问题可以用一个线性方程去描述： \begin{equation} \boldsymbol{C}=\boldsymbol{T}\boldsymbol{L} \end{equation} 其中\(\boldsymbol{C}\)是\(p \times m\)的观测结果矩阵，\(p\)是像素个数，\(m\)是拍摄（观测）次数，\(\boldsymbol{T}\)是\(p \times n\)的Light Transport矩阵，\(n\)是光源参数数量。\(\boldsymbol{L}\)是\(n \times m\)的描述光源的矩阵。</p> <p>这种直接用线性方程描述渲染过程的方式有以下约定：</p> <ul> <li><strong>不需要建模相机参数，光源大小，物体之间的相对位置</strong>。所有这些信息都隐含在Light Transport Matrix之中（有点神经网络的感觉）。</li> <li><strong>光源需要参数化描述</strong>。比如，如果光源是m个点光源，那么光源的参数向量的维度就是\(m\)；如果是恒定均匀的面光源，那么维度是\(1\)；如果是用贴图控制的面光源，那么维度就是贴图的像素数\(p'\)。每一个值可以是辐射度。</li> </ul> <p>在这个方程中，\(\boldsymbol{T}\)中的每一个行向量都是我们要求解的Light Transport Function，或者也可以是Reflectance Function。如果现在我们要求解这个矩阵，假设我们有\(128 \times 128\)的复杂光源（比如贴图面光源），此时一个像素中需要求解的系数就有16384个，因此我们要采集16384组拍摄结果，然后进行逐个求解。这显然有着很大的开销，在实际应用中是不可接受的。</p> <p>因此，我们希望将Compressive Sensing引入Light Transport Function求解上——减少测量次数，并且将Light Transport Function变换到某些可压缩的基底下求解出稀疏解。正如你所提及的，前人的研究确实已经证明了：<strong>Reflectance Function</strong>在一些基底下（比如小波、球谐函数等），其系数向量具有可压缩性。这意味着我们可以用远少于\(n\)个的系数来高精度地近似原始的反射函数。</p> <h2 id="转换到haar小波空间">转换到Haar小波空间</h2> <p>Haar小波是一组非常简单的小波基底，其只由<code class="language-plaintext highlighter-rouge">0,-1,1</code>三个元素组成，并且基底之间两两正交。Haar小波也常用于图像的压缩。</p> <p>为了利用反射函数的稀疏性，我们需要将求解过程转换到我们选定的基底空间下，这里我们以一个通用的正交基\(\boldsymbol{B}\)为例（在论文的实现中即为Haar小波基）。</p> <p>我们从原始的光照传输方程出发： \begin{equation} \boldsymbol{C} = \boldsymbol{T}\boldsymbol{L} \end{equation} 我们的目标是求解在基\(\boldsymbol{B}\)下的稀疏系数矩阵\(\hat{\boldsymbol{T}}\)，而不是直接求解稠密的\(\boldsymbol{T}\)。我们约定，\(\boldsymbol{B}\)中的每一个列向量是基底的基向量。我们可以向方程中插入一个单位矩阵\(\boldsymbol{I} = \boldsymbol{B}\boldsymbol{B}^T\): \begin{equation} \boldsymbol{C} = \boldsymbol{T}(\boldsymbol{B}\boldsymbol{B}^T)\boldsymbol{L} = (\boldsymbol{T}\boldsymbol{B})(\boldsymbol{B}^T\boldsymbol{L}) \end{equation} 我们定义在基\(\boldsymbol{B}\)下的传输矩阵为\(\hat{\boldsymbol{T}} = \boldsymbol{T}\boldsymbol{B}\)。现在，\(\hat{\boldsymbol{T}}\)的每一行都是一个稀疏或可压缩的向量。方程变为： \begin{equation} \boldsymbol{C} = \hat{\boldsymbol{T}}(\boldsymbol{B}^T\boldsymbol{L}) \end{equation} 这个形式还不是标准的压缩感知方程。为了构建标准的压缩感知问题，我们需要精心设计我们的<strong>光照模式矩阵</strong>\(\boldsymbol{L}\)，从而消除额外的\(\boldsymbol{B}^T\)。论文提出，将光照模式设计为基函数与一个特殊设计的测量矩阵\(\boldsymbol{\Phi}\)的乘积： \begin{equation} \boldsymbol{L} = \boldsymbol{B}\boldsymbol{\Phi} \end{equation} 这里的\(\boldsymbol{\Phi}\)是一个\(n \times m\)的矩阵，它符合压缩感知理论的要求（后面会详细描述）。将这个精心设计的\(\boldsymbol{L}\)代入，我们得到： \begin{equation} \boldsymbol{C} = \hat{\boldsymbol{T}}(\boldsymbol{B}^T(\boldsymbol{B}\boldsymbol{\Phi})) \end{equation} 由于\(\boldsymbol{B}\)是正交基，\(\boldsymbol{B}^T\boldsymbol{B} = \boldsymbol{I}\)，因此方程被极大地简化了： \begin{equation} \boldsymbol{C} = \hat{\boldsymbol{T}}\boldsymbol{\Phi} \end{equation}</p> <p>现在，我们来考察单个像素\(i\)的情况，也就是矩阵的第\(i\)行： \(\boldsymbol{c}_{i,.} = \hat{\boldsymbol{t}}_{i,.}\boldsymbol{\Phi}\) 将这个行向量方程进行转置，我们得到： \(\boldsymbol{c}_{i,.}^T = \boldsymbol{\Phi}^T \hat{\boldsymbol{t}}_{i,.}^T\) 这个方程完美地匹配了我们最初介绍的压缩感知标准形式\(\boldsymbol{y} = \boldsymbol{A}\boldsymbol{x}\)。其中，\(\boldsymbol{y} = \boldsymbol{c}_{i,.}^T\)是我们对像素\(i\)的\(m\)次观测结果，\(\boldsymbol{A} = \boldsymbol{\Phi}^T\)是测量矩阵，而\(\boldsymbol{x} = \hat{\boldsymbol{t}}_{i,.}^T\)正是我们希望求解的、像素\(i\)的稀疏小波系数。</p> <p>通过这样的转换，我们成功地将求解光照传输的问题，转化为了一个可以对每个像素独立进行的标准压缩感知问题。</p> <h1 id="如何求解系数解">如何求解系数解？</h1> <h2 id="重点在于测量矩阵">重点在于测量矩阵</h2> <p>通过上述推导，我们成功地为每个像素建立了一个压缩感知问题：\(\boldsymbol{c}_{i,.}^T = \boldsymbol{\Phi}^T \hat{\boldsymbol{t}}_{i,.}^T\)。现在的问题是，我们应该如何选择测量矩阵\(\boldsymbol{\Phi}\)（或者说它的转置\(\boldsymbol{A} = \boldsymbol{\Phi}^T\)）来保证我们能够稳定、准确地从欠定的观测值\(\boldsymbol{y}\)中恢复出稀疏信号\(\boldsymbol{x}\)？</p> <h3 id="受限等距性质-restricted-isometry-property-rip">受限等距性质 (Restricted Isometry Property, RIP)</h3> <p>显然，测量矩阵\(\boldsymbol{A}\)并不是可以随意选择的。它必须具备某种特殊的结构，以确保不同的稀疏信号能够被映射到足够不同的测量结果上，从而避免在求解时产生混淆。这个核心条件被称为<strong>受限等距性质 (Restricted Isometry Property, RIP)</strong>。</p> <p>RIP的数学定义如下：对于一个矩阵\(\boldsymbol{A}\)，如果存在一个很小的常数\(\delta_k \in (0, 1)\)，使得对于<strong>任意k-稀疏</strong>的向量\(\boldsymbol{x}\)，都满足以下不等式，那么我们就说这个矩阵满足k-RIP条件。 \begin{equation} (1-\delta_k)|\boldsymbol{x}|_2^2 \le |\boldsymbol{A}\boldsymbol{x}|_2^2 \le (1+\delta_k)|\boldsymbol{x}|_2^2 \end{equation} 这个性质的直观意义是，矩阵\(\boldsymbol{A}\)在作用于所有稀疏向量时，其行为近似于一个<strong>等距变换</strong>。也就是说，它能够近似地保持稀疏信号的欧几里得长度（即能量），不会把两个不同的稀疏信号“压扁”到一起，也不会过度拉伸它们。正是这种性质，保证了从测量值\(\boldsymbol{y}\)恢复\(\boldsymbol{x}\)的过程是稳定和唯一的。</p> <h3 id="互相关性-mutual-coherence">互相关性 (Mutual Coherence)</h3> <p>虽然RIP是保证信号恢复的“黄金标准”，但它有一个巨大的实践障碍：要验证一个给定的矩阵是否满足RIP是一个NP难问题。这使得它无法被用作一个实用的矩阵设计准则。</p> <p>幸运的是，我们可以通过一个更容易计算的指标来间接满足RIP，这个指标就是<strong>互相关性 (Mutual Coherence)</strong>，用\(\mu\)表示。前人的研究已经证明，RIP常数\(\delta_k\)的一个上界可以由互相关性\(\mu\)和稀疏度\(k\)来表示： \begin{equation} \delta_k \le (k-1)\mu \end{equation} 这个不等式告诉我们，只要我们能设计一个具有很低互相关性\(\mu\)的矩阵，并且信号的稀疏度\(k\)不是太大，我们就能保证其RIP常数\(\delta_k\)也会很小。这使得互相关性成为了RIP的一个实用且充分的替代条件。</p> <p>互相关性的定义如下：对于一个所有列向量\(a_j\)都经过归一化的矩阵\(\boldsymbol{A}\)，其互相关性\(\mu(\boldsymbol{A})\)被定义为任意两个不同列向量之间内积绝对值的最大值。 \begin{equation} \mu(\boldsymbol{A}) = \max_{i \neq j} |\langle a_i, a_j \rangle| = \max_{i \neq j} |a_i^T a_j| \end{equation} 它衡量了矩阵列向量之间的“最坏情况”下的相似度。一个低\(\mu\)值意味着所有列向量都近似正交，这正是我们所期望的。</p> <h3 id="随机矩阵是答案">随机矩阵是答案</h3> <p>我们的目标变得清晰了：我们需要找到一个测量矩阵\(\boldsymbol{A}\)，它既要保证列满秩（以提供足够的测量信息），又要使其互相关性\(\mu\)尽可能地低。</p> <p>一个看似出人意料但却极为有效的答案是：<strong>随机化</strong>。随机矩阵，例如高斯随机矩阵或伯努利随机矩阵，能够以极高的概率同时满足这两个要求。</p> <ul> <li><strong>高斯随机矩阵</strong>：矩阵中的每一个元素都独立地从标准高斯分布中随机抽取。</li> <li><strong>伯努利随机矩阵</strong>：矩阵中的每一个元素以等概率随机地取值为+1或-1。</li> </ul> <blockquote> <p><strong>为什么随机矩阵有效？</strong></p> <ol> <li><strong>满秩性</strong>：在一个高维空间中，一个随机向量几乎不可能精确地落在由其他几个随机向量张成的子空间中。因此，一个\(m \times n\)（\(m &lt; n\)）的随机矩阵，其行（或列）向量线性无关的概率是极高的，从而保证了其秩为\(m\)。</li> <li><strong>低互相关性</strong>：这源于高维空间中的测度集中现象。在高维空间中，任意两个随机生成的单位向量都以极高的概率近似正交。它们的内积（即相关性）会高度集中在期望值0附近，其值偏离0的概率会随着维度的增加而指数级下降。因此，一个由随机向量构成的矩阵，其任意两列之间的相关性都会非常低。</li> </ol> </blockquote> <p>在<strong><em>Compressive Light Transport Sensing</em></strong>这篇论文的实际应用中，作者最终选择了一种<strong>二值的、类伯努利矩阵</strong>的照明模式。这主要是出于以下几个实际工程的考量：</p> <ul> <li><strong>数值实现的便利性</strong>：二值模式（例如开关灯）非常容易在物理设备（如显示器或投影仪）上实现。如果使用高斯矩阵，需要将浮点数值映射到显示器有限的RGB灰度级上，这个量化过程会引入误差。</li> <li><strong>无需精确的Gamma校正</strong>：由于只使用最大和最小强度值，系统对显示设备的非线性响应（Gamma曲线）不敏感，减少了一个潜在的校准误差源。</li> <li><strong>信噪比（SNR）考量</strong>：论文中特别设计了一种“二值分离集成(binary segregated ensemble)”模式，这种模式可以最大化测量过程中的动态范围，从而提升信噪比，这是标准高斯模式难以做到的。</li> </ul> <h1 id="空间相关性">空间相关性</h1> <p>至此，我们已经为每个像素建立并解决了一个独立的压缩感知问题。这种“暴力”的逐像素（Brute-force per-pixel）求解方法虽然在理论上是可行的，但它忽略了一个非常重要的先验信息：<strong>图像中相邻像素的反射函数通常是高度相关的</strong>。例如，在一块均匀材质的区域，相邻像素的反射行为几乎是完全相同的。如果完全独立地求解它们，微小的测量噪声就可能导致重建出的相邻反射函数出现不应有的跳变，从而在最终的渲染结果中产生可见的噪点或伪影。</p> <h2 id="从粗到精的求解策略">从粗到精的求解策略</h2> <p>为了解决这个问题，并利用像素间的空间相关性来正则化求解过程、提升结果质量，论文提出了一种<strong>从粗到精（Coarse-to-Fine）</strong>的层级化重建算法。其核心思想是，我们不直接在原始分辨率上求解，而是先从图像的一个极度粗糙的、低分辨率的版本开始，稳健地估计出一个大致的反射函数解。然后，我们逐级提升分辨率，在每一级都利用上一级更粗糙但更稳健的解作为初始猜测，来引导当前级别的求解。</p> <p>这种多分辨率的结构可以非常自然地从我们选择的<strong>Haar小波基</strong>中获得。小波变换本身就是一个多分辨率分析工具。对一张图片进行多级小波变换后，我们会得到一系列不同频率的子带图像，其中的低频分量，就是原始图像在不同尺度下的降采样、模糊化的版本。这天然地为我们提供了从粗到精的图像金字塔。</p> <h2 id="求解差分信号">求解差分信号</h2> <p>层级化算法的关键在于，如何有效地利用上一级的解来指导下一级的求解。假设我们已经得到了第\(l-1\)层（一个较粗糙的层级）的反射函数稀疏系数\(\hat{\boldsymbol{t}}_{init}\)。当我们进入第\(l\)层（一个更精细的层级）时，我们有理由相信，这一层的真实解\(\hat{\boldsymbol{t}}\)与从上一层继承来的初始猜测\(\hat{\boldsymbol{t}}_{init}\)不会相差太远。</p> <p>因此，与其直接求解庞大且复杂的\(\hat{\boldsymbol{t}}\)，我们可以改变思路，去求解它们之间的<strong>差分信号</strong>\(\boldsymbol{d}\)，即： \begin{equation} \hat{\boldsymbol{t}} = \hat{\boldsymbol{t}}_{init} + \boldsymbol{d} \end{equation} 由于\(\hat{\boldsymbol{t}}_{init}\)已经是一个不错的近似，我们可以预期差分信号\(\boldsymbol{d}\)会比原始信号\(\hat{\boldsymbol{t}}\)本身<strong>更加稀疏</strong>。求解一个更稀疏的信号在压缩感知框架下通常会更稳健、更精确。</p> <p>我们将这个思想代入我们熟悉的压缩感知方程\(\boldsymbol{y} = \boldsymbol{A}\hat{\boldsymbol{t}}\)中（这里\(\boldsymbol{y}\)是观测值，\(\boldsymbol{A}\)是测量矩阵）： \begin{equation} \boldsymbol{y} = \boldsymbol{A}(\hat{\boldsymbol{t}}_{init} + \boldsymbol{d}) = \boldsymbol{A} \hat{\boldsymbol{t}}_{init} + \boldsymbol{A}\boldsymbol{d} \end{equation}</p> <p>将已知项移到等式左边，我们得到一个新的压缩感知问题： \begin{equation} \boldsymbol{y} - \boldsymbol{A}\hat{\boldsymbol{t}}_{init} = \boldsymbol{A}\boldsymbol{d} \end{equation} 在这个新的问题中，我们要求解的未知量是稀疏的差分信号\(\boldsymbol{d}\)，而我们的“测量值”变成了<strong>残差（residual）</strong>\((\boldsymbol{y} - \boldsymbol{A}\hat{\boldsymbol{t}}_{init})\)。求解出\(\boldsymbol{d}\)后，我们就可以通过简单的加法\(\hat{\boldsymbol{t}} = \hat{\boldsymbol{t}}_{init} + \boldsymbol{d}\)来更新我们在当前层的解。</p> <p>整个算法从金字塔的最顶端（最粗糙的层级）开始，迭代地执行这个“求解差分并更新”的过程，直到抵达最底部的原始分辨率层。通过这种方式，空间相关性被隐式地逐层传递和加强，最终重建出的反射函数场在空间上会更加平滑和连续，显著优于独立的逐像素求解方法。</p> <h1 id="最后一步优化---分离高频与低频信息">最后一步优化 - 分离高频与低频信息</h1> <p>到目前为止，我们都将反射函数的稀疏求解一视同仁地应用在所有哈尔小波基底上。但根据先验知识，我们知道一个物体的反射函数（Reflectance Function）通常是由平滑的低频部分（如漫反射）和稀疏的高频细节（如高光）组成的。低频部分几乎总是存在的，因此它在小波域的系数并不稀疏，而高频细节才是真正稀疏的部分。</p> <p>这个观察启发了一种潜在的优化策略：我们可以将求解过程一分为二。</p> <ol> <li>对于贡献了主要能量、但并不稀疏的低频信息，我们不使用压缩感知，而是采用一种更直接、更精确的“全采样”方式来求解其对应的小波系数。</li> <li>对于真正稀疏的高频细节，我们继续使用压缩感知的方法来高效地捕捉。</li> </ol> <p>这种混合策略可以带来更稳健的重建结果。为了实现对低频部分的精确求解，我们可以引入<strong>哈达玛图案（Hadamard Patterns）</strong>。哈达玛矩阵是由+1和-1构成的方阵，其所有行（和列）相互正交。使用一整套哈达玛图案进行照明和测量，等价于对信号进行了一次完整的傅里叶变换，从中可以无损地、精确地反解出原始信号。</p> <p>在维度较低的情况下（例如我们只关心最前面几个或几十个低频小波系数），使用哈达玛图案相比于伯努利随机矩阵更具优势。因为一个完整的哈达玛矩阵能够<strong>确保</strong>是满秩且正交的，而随机矩阵只能<strong>以极高的概率</strong>保证这些性质。对于构成信号主体、不容有失的低频部分，这种确定性的保证是至关重要的。</p> <h1 id="extra-bonus">Extra Bonus</h1> <p>在之前的讨论中，“互相关性”被用来衡量单个测量矩阵内部各列向量之间的相似性。实际上，这个概念也可以被推广，用来描述<strong>两个不同基底之间的不相关程度</strong>。</p> <p>这里引出一个非常深刻的观察，被称为<strong>广义不确定性原理</strong>：一个信号，无法在两个互不相关的基底中<strong>同时</strong>保持稀疏。</p> <p>这个原理最经典的例子是时间和频率域的对偶关系：</p> <ul> <li>一个<strong>狄拉克函数</strong>（Dirac Delta Function），在时域（标准基）下是极致稀疏的，因为它仅在一个时间点有非零值。</li> <li>然而，它的傅里叶变换在整个频率域（傅里叶基）上是一个幅值恒定的复指数函数，能量均匀分布在所有频率上，是极致<strong>稠密</strong>的。</li> </ul> <p>反之亦然，一个单一频率的正弦波在频域是稀疏的，但在时域上却延展至无穷，表现为稠密。这种“能量扩散”的现象，直观地诠释了不确定性原理：一个信号在一个域中越“确定”（稀疏），在另一个不相关的域中就越“不确定”（稠密）。</p> <p>这个关系不仅仅是一个定性的观察，它有严格的数学证明。类似于物理学中的<strong>海森堡不确定性原理</strong>。这个理论是说，你不可能同时知道一个粒子的位置和它的动量，粒子位置的不确定性： \begin{equation} \Delta x \Delta p \ge \frac{h}{4\pi}. \end{equation} 在信号处理领域，一个更广义的不确定性原理已经被证明。对于任意一个信号，假设它在基底\(\Phi\)下的稀疏度为\(||\boldsymbol{\alpha}||_0 = A\)，在基底\(\Psi\)下的稀疏度为\(||\boldsymbol{\beta}||_0 = B\)，那么它们必须满足以下不等式关系： \begin{equation} \sqrt{A \cdot B} \ge \frac{1}{M} \end{equation} 或者一个稍弱但更常用的形式： \begin{equation} A + B \ge \frac{2}{M} \end{equation} 这里的\(M\)就是两个基底\(\Phi\)和\(\Psi\)之间的互相关性，定义为\(M = \sup_ {i,j} |\langle \phi_i, \psi_j \rangle|\)。这个不等式精准地量化了我们的观察：如果两个基底互不相关（即M非常小），那么\(1/M\)就会非常大，这就迫使A和B中至少有一个必须很大，从而印证了一个信号无法在这两个基底下同时稀疏。</p>]]></content><author><name></name></author><category term="paper-reading"/><category term="paper,"/><category term="rendering,"/><category term="relighting,"/><category term="cn-blog,"/><category term="compressive-sensing"/><summary type="html"><![CDATA[About Compressive Light Transport Sensing]]></summary></entry></feed>