Bristol Vision Institute - Research
Reached state-of-the-art on the BVI-RLV benchmark (29.22 dB PSNR) by re-engineering a CNN-RNN video restoration pipeline on HPC infrastructure.
Outcomes
- Higher temporal consistency
- Faster experiment cycles
- More stable tensor pipelines
Stack
- PyTorch
- SLURM
- Python
- HPC
What I Learned
- Feature alignment: Bidirectional warping aligns forward/backward features before fusion.
- Temporal stabilization: ConvGRU passes preserve frame-to-frame detail under motion.
- Regression prevention: Tensor-shape instrumentation catches concat and recurrent mismatches.
Implementation Notes
- Frame batches enter forward and backward optical-flow warping blocks.
- Bidirectional features are fused before recurrent refinement.
- ConvGRU layers propagate temporal context across multiple passes.
- Decoder reconstructs denoised frames and computes reconstruction losses.
Code Snippet
# CNN-RNN re-engineering loop with explicit tensor guards
for t in range(seq_len):
fwd = warp(features[:, t], flow_fwd[:, t]) # [B, C, H, W]
bwd = warp(features[:, seq_len - 1 - t], flow_bwd[:, t])
fused = torch.cat([fwd, bwd], dim=1) # [B, 2C, H, W]
if fused.shape[1] != 2 * hidden_channels:
raise RuntimeError(f"unexpected channels: {fused.shape}")
state = conv_gru(fused, state) # temporal refinement
out[:, t] = decoder(state)