Bristol Vision Institute - Research

Reached state-of-the-art on the BVI-RLV benchmark (29.22 dB PSNR) by re-engineering a CNN-RNN video restoration pipeline on HPC infrastructure.

Outcomes

Higher temporal consistency
Faster experiment cycles
More stable tensor pipelines

Stack

PyTorch
SLURM
Python
HPC

What I Learned

Feature alignment: Bidirectional warping aligns forward/backward features before fusion.
Temporal stabilization: ConvGRU passes preserve frame-to-frame detail under motion.
Regression prevention: Tensor-shape instrumentation catches concat and recurrent mismatches.

Implementation Notes

Frame batches enter forward and backward optical-flow warping blocks.
Bidirectional features are fused before recurrent refinement.
ConvGRU layers propagate temporal context across multiple passes.
Decoder reconstructs denoised frames and computes reconstruction losses.

Code Snippet

# CNN-RNN re-engineering loop with explicit tensor guards
for t in range(seq_len):
    fwd = warp(features[:, t], flow_fwd[:, t])          # [B, C, H, W]
    bwd = warp(features[:, seq_len - 1 - t], flow_bwd[:, t])

    fused = torch.cat([fwd, bwd], dim=1)                # [B, 2C, H, W]
    if fused.shape[1] != 2 * hidden_channels:
        raise RuntimeError(f"unexpected channels: {fused.shape}")

    state = conv_gru(fused, state)                      # temporal refinement
    out[:, t] = decoder(state)