Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields
Authors
Dor Verbin, Peter Hedman, Ben Mildenhall, Todd Zickler, Jonathan T. Barron, Pratul P. Srinivasan
Abstract
Neural Radiance Fields (NeRF) is a popular view synthesis technique that represents a scene as a continuous volumetric function, parameterized by multilayer perceptrons that provide the volume density and view-dependent emitted radiance at each location. While NeRF-based techniques excel at representing fine geometric structures with smoothly varying view-dependent appearance, they often fail to accurately capture and reproduce the appearance of glossy surfaces. We address this limitation by introducing Ref-NeRF, which replaces NeRF's parameterization of view-dependent outgoing radiance with a representation of reflected radiance and structures this function using a collection of spatially-varying scene properties. We show that together with a regularizer on normal vectors, our model significantly improves the realism and accuracy of specular reflections. Furthermore, we show that our model's internal representation of outgoing radiance is interpretable and useful for scene editing.
Concepts
The Big Picture
Imagine photographing a chrome trophy from a dozen different angles, then asking a computer to render it from a viewpoint the camera never visited. The computer just needs to “understand” the object in 3D and simulate what it would look like. But glossy surfaces are tricksters. The reflection you see in a mirror ball depends exquisitely on where you’re standing. Shift a few inches, and the entire image in the ball shifts too. For AI systems trying to reconstruct scenes from photos, that angle-dependent glimmer has been a persistent headache.
Neural Radiance Fields (NeRF) took the computer vision world by storm starting in 2020. The technique trains a neural network to represent a 3D scene as a continuous volumetric function, a kind of fog of light-emitting particles, that can be “photographed” from any virtual angle. NeRF works beautifully for scenes with smooth, gently varying appearance.
Point it at a glossy teapot or a shiny car, though, and the results look haunted: bright reflective spots that flicker and swim between frames, ghostly semitransparent shells floating inside objects, reflections that look painted-on rather than physically real. A team from Harvard University and Google Research set out to fix this. Their method, Ref-NeRF, doesn’t rebuild NeRF from scratch. It restructures how the neural network thinks about light, replacing an awkward mathematical setup with one that matches how reflections actually work.
Key Insight: The core problem isn’t that neural networks are bad at learning reflections. NeRF was asking them to learn the wrong function. Ref-NeRF switches from viewing direction to reflected direction as input, making the interpolation problem far simpler.
How It Works
The root of NeRF’s glossiness problem is subtle. When NeRF models how a surface looks from different angles, it takes the raw viewing direction as input to its neural network. For a glossy surface, the brightness you see depends on whether that angle aligns with a specular highlight, and highlights move fast as the viewing angle changes. The function NeRF has to learn is jagged and complicated. Filling in the gaps between training photos produces the uncanny flickering artifacts that plague glossy renderings.
Ref-NeRF’s first move: instead of feeding the viewing direction into the network, feed in the reflected direction (the viewing vector mirrored about the surface’s local normal). This is exactly how a mirror works. Move your head while staring at a mirror ball, and the reflection you see is whatever sits along the reflected direction. That reflected radiance function is far smoother. It doesn’t depend on the viewer’s absolute position, only on the surface orientation and the environment lighting.
The second piece is Integrated Directional Encoding (IDE), a way to encode the reflected direction so that it smoothly blends between sharp and blurry appearances. Ref-NeRF also explicitly decomposes surface appearance into three learnable components:
- Diffuse color — the base pigment, which looks the same from all angles
- Specular tint — how the surface color tints its reflections
- Roughness — how sharp or blurry the highlights appear (polished mirror vs. brushed metal)
This decomposition means the network never confuses “this surface is red” with “this surface is shiny,” keeping each component smooth and learnable.
There’s a catch. To compute the reflection direction at any point, you need an accurate surface normal. NeRF’s volumetric geometry is famously “foggy,” with density smeared out around surfaces rather than concentrated at them. Normals derived from that kind of geometry are noisy and unreliable.
Ref-NeRF handles this with a normal vector regularizer: a penalty that pushes the model toward keeping its density tightly packed at surfaces, with normals pointing in physically consistent directions. Cleaner geometry yields better normals, which yield more accurate reflection vectors, which improve the appearance model. Each piece reinforces the others.
Why It Matters
The improvements are immediately visible. On benchmark scenes with highly specular objects, Ref-NeRF substantially outperforms previous methods, including mip-NeRF (the variant it builds on). Specular highlights now move across surfaces smoothly as the camera translates, rather than blinking in and out. Objects look solid rather than surrounded by ghostly halos.
Ref-NeRF’s structured representation also gives you something NeRF never offered: editability. Because the model has explicitly learned separate components (surface normals, material roughness, diffuse texture, specular tint), each can be manipulated independently. Change the roughness field and a shiny object becomes matte. Swap the diffuse colors and you repaint the scene. These aren’t tricks bolted on afterward; they fall out of how the model organizes what it knows about the scene.
That points toward AI-reconstructed 3D scenes that aren’t just viewable but usable as editable assets for film, games, and virtual environments.
The approach matters for the broader NeRF ecosystem, too. NeRF has spawned hundreds of extensions for dynamic scenes, human avatars, relighting, autonomous driving, and scientific visualization. Ref-NeRF’s reparameterization and regularization are modular, slotting into many of these downstream systems. Get the physics of reflection right at the foundation, and the whole family of techniques benefits.
Bottom Line: By replacing an awkward view-direction parameterization with a physically motivated reflected-direction representation, Ref-NeRF makes glossy surfaces tractable for neural scene reconstruction and delivers interpretable, editable 3D representations as a bonus.
IAIFI Research Highlights
Ref-NeRF encodes the geometry of specular reflection, a concept rooted in classical optics, directly into a neural network's structure. It shows how domain knowledge from physics can reshape AI model design rather than just inform training data.
Parameterization choices profoundly affect a network's ability to interpolate. The Integrated Directional Encoding and diffuse/specular decomposition offer a reusable template for handling view-dependent effects in future neural rendering systems.
By representing normals, roughness, and specular tint with physically grounded parameters, Ref-NeRF moves neural rendering closer to genuine physically-based simulation, a step toward AI systems that capture light-matter interaction with real fidelity.
Future work may extend Ref-NeRF's reflection framework to handle interreflections, subsurface scattering, and dynamic lighting, working toward fully relightable neural scene representations; the paper is available at [arXiv:2112.03907](https://arxiv.org/abs/2112.03907).
Original Paper Details
Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields
2112.03907
Dor Verbin, Peter Hedman, Ben Mildenhall, Todd Zickler, Jonathan T. Barron, Pratul P. Srinivasan
Neural Radiance Fields (NeRF) is a popular view synthesis technique that represents a scene as a continuous volumetric function, parameterized by multilayer perceptrons that provide the volume density and view-dependent emitted radiance at each location. While NeRF-based techniques excel at representing fine geometric structures with smoothly varying view-dependent appearance, they often fail to accurately capture and reproduce the appearance of glossy surfaces. We address this limitation by introducing Ref-NeRF, which replaces NeRF's parameterization of view-dependent outgoing radiance with a representation of reflected radiance and structures this function using a collection of spatially-varying scene properties. We show that together with a regularizer on normal vectors, our model significantly improves the realism and accuracy of specular reflections. Furthermore, we show that our model's internal representation of outgoing radiance is interpretable and useful for scene editing.