本文最后更新于139 天前,其中的信息可能已经过时,如有错误请发送邮件到chengyulong@csu.edu.cn
nerf_factory源码笔记(二)
同样nerf_factory在模型选择上也提供了许多接口,如nerf
、mipnerf
、mipnerf360
和plenoxel
等等,如下图所示:
首先谈到train
,该代码用的是pytorch_lighting
,是一个高度结构化的代码,首先说一下它其中的训练逻辑,加快大家阅读源码的速度和效率:
A LightningModule
organizes your PyTorch code into 6 sections:
- Initialization (
__init__
andsetup()
). - Train Loop (
training_step()
) - Validation Loop (
validation_step()
) - Test Loop (
test_step()
) - Prediction Loop (
predict_step()
) - Optimizers and LR Schedulers (
configure_optimizers()
)
这是官方给的解释,意思就是train
、val
、test
和初始设置要做的是分别在对应的函数里写好就行了,他封装好的框架会对其进行调用,再其中train
的具体训练逻辑是
# put model in train mode and enable gradient calculation
model.train()
torch.set_grad_enabled(True)
for batch_idx, batch in enumerate(train_dataloader):
loss = training_step(batch, batch_idx)
# clear gradients
optimizer.zero_grad()
# backward
loss.backward()
# update parameters
optimizer.step()
因此我们只需要去改写对应的函数就行。
其中我们先讲一下nerf
模型的代码,首先先看train_step
中的函数
def training_step(self, batch, batch_idx):
rendered_results = self.model(
batch, self.randomized, self.white_bkgd, self.near, self.far
)
rgb_coarse = rendered_results[0][0]
rgb_fine = rendered_results[1][0]
target = batch["target"]
loss0 = helper.img2mse(rgb_coarse, target)
loss1 = helper.img2mse(rgb_fine, target)
loss = loss1 + loss0
psnr0 = helper.mse2psnr(loss0)
psnr1 = helper.mse2psnr(loss1)
self.log("train/psnr1", psnr1, on_step=True, prog_bar=True, logger=True)
self.log("train/psnr0", psnr0, on_step=True, prog_bar=True, logger=True)
self.log("train/loss", loss, on_step=True)
return loss
重点就是self.model
函数,是主要的计算过程,其余都是损失函数的计算,如psnr
或者mse
是常见的损失函数
def img2mse(x, y):
return torch.mean((x - y) ** 2)
def mse2psnr(x):
return -10.0 * torch.log(x) / np.log(10)
这里和论文的一致,
$$
\mathcal L=\sum_{r\in R}[||C^t_c(r)-C(r)||+||C_f^t(r)-C(r)||^2_2]
$$
进入NeRF函数中
模型结构:
根据论文所述,模型分为coarse和fine模型
class NeRF(nn.Module):
def __init__(
...
):
...
super(NeRF, self).__init__()
self.rgb_activation = nn.Sigmoid()
self.sigma_activation = nn.ReLU()
self.coarse_mlp = NeRFMLP(min_deg_point, max_deg_point, deg_view)
self.fine_mlp = NeRFMLP(min_deg_point, max_deg_point, deg_view)
进入coarse_mlp
模型内
class NeRFMLP(nn.Module):
def __init__(
self,
min_deg_point,
max_deg_point,
deg_view,
netdepth: int = 8,
netwidth: int = 256,
netdepth_condition: int = 1,
netwidth_condition: int = 128,
skip_layer: int = 4,
input_ch: int = 3,
input_ch_view: int = 3,
num_rgb_channels: int = 3,
num_density_channels: int = 1,
):
for name, value in vars().items():
if name not in ["self", "__class__"]:
setattr(self, name, value)
super(NeRFMLP, self).__init__()
self.net_activation = nn.ReLU()
pos_size = ((max_deg_point - min_deg_point) * 2 + 1) * input_ch # 位置点的输入维度(与之后提到的位置编码相对应)
view_pos_size = (deg_view * 2 + 1) * input_ch_view # 视角的输入维度(与之后提到的位置编码相对应)
init_layer = nn.Linear(pos_size, netwidth) # 初始输入层[63,256]
init.xavier_uniform_(init_layer.weight) # 初始化权重
pts_linear = [init_layer]
for idx in range(netdepth - 1):
if idx % skip_layer == 0 and idx > 0:
module = nn.Linear(netwidth + pos_size, netwidth)
else:
module = nn.Linear(netwidth, netwidth)
init.xavier_uniform_(module.weight)
pts_linear.append(module)
self.pts_linears = nn.ModuleList(pts_linear) # 组合在一起视为整体
views_linear = [nn.Linear(netwidth + view_pos_size, netwidth_condition)]
for idx in range(netdepth_condition - 1):
layer = nn.Linear(netwidth_condition, netwidth_condition)
init.xavier_uniform_(layer.weight)
views_linear.append(layer)
self.views_linear = nn.ModuleList(views_linear) # 组合在一起视为整体
self.bottleneck_layer = nn.Linear(netwidth, netwidth)
self.density_layer = nn.Linear(netwidth, num_density_channels) # 生成σ密度
self.rgb_layer = nn.Linear(netwidth_condition, num_rgb_channels) # 生成RGB
init.xavier_uniform_(self.bottleneck_layer.weight)
init.xavier_uniform_(self.density_layer.weight)
init.xavier_uniform_(self.rgb_layer.weight)
该代码和下面的模型一一对应
最后通过两个相同的网络结构,前一个粗网络就够得到的值反推进行优化采样策略,之后再进行优化后采样的在放入精细网络结构。最后得出最后的结果。