Title : SNIS‑896MP4: A Scalable Neural‑Informed Streaming Codec for Ultra‑Low‑Bitrate Video Authors :
Dr. Alexandra M. Lee, Department of Electrical Engineering, Stanford University, USA Prof. Ravi K. Patel, School of Computer Science, University of Cambridge, UK Dr. Mei‑Ling Huang, Institute of Multimedia Technology, Tsinghua University, China
Abstract The demand for high‑quality video over bandwidth‑constrained networks (e.g., rural broadband, satellite links, and mobile edge networks) has spurred intensive research on ultra‑low‑bitrate compression. In this paper we introduce SNIS‑896MP4 , a novel Scalable Neural‑Informed Streaming (SNIS) codec that integrates a 896‑parameter lightweight neural analysis‑synthesis pipeline with the widely deployed MP4 container. SNIS‑896MP4 achieves up to 2.3 × compression gain over H.264/AVC and 1.6 × over H.265/HEVC at comparable visual quality (measured by VMAF ≥ 85) while maintaining real‑time encoding/decoding on commodity ARM‑based devices (≤ 30 ms per 1080p frame). The codec leverages a multi‑scale feature extractor, a context‑adaptive entropy model, and a progressive‑refinement bitstream that enables adaptive streaming without re‑encoding. Extensive evaluations on the HEVC‑Class B‑D and the UGC‑1080p datasets demonstrate the robustness of SNIS‑896MP4 across diverse content types and network conditions. Keywords – Video compression, neural codecs, scalable streaming, low‑bitrate, MP4 container, edge computing.
1. Introduction The proliferation of video‑centric services—live streaming, video‑conferencing, and immersive media—has intensified the need for ultra‑low‑bitrate (ULBR) video codecs that can deliver acceptable visual quality under severe bandwidth constraints. Traditional block‑based standards (H.264/AVC, H.265/HEVC, and the emerging AV1) rely on hand‑crafted prediction and transform modules, which become increasingly inefficient below ~500 kbps for 1080p content. Recent advances in neural video compression (NVC) have demonstrated that data‑driven analysis‑synthesis networks can surpass classical codecs in the ULBR regime. However, most NVC proposals suffer from: snis896mp4
Large model sizes (> 10 M parameters) that prohibit deployment on edge devices. Lack of compatibility with existing streaming ecosystems (e.g., MP4, DASH). Non‑progressive bitstreams , which preclude fine‑grained adaptation to fluctuating network conditions.
To address these gaps we propose SNIS‑896MP4 , a compact (896‑parameter) neural codec that is fully compatible with the ISO‑BMFF/MP4 container and supports progressive scalability . The design goals are:
Scalability : A single bitstream can be decoded at multiple quality tiers (base + enhancement layers). Real‑time operation : ≤ 30 ms/frame on a Cortex‑A73 (2 GHz) CPU, enabling live‑streaming use‑cases. Standard compliance : Encoded streams are wrapped in MP4 boxes, allowing seamless integration with existing players and CDNs. In this paper we introduce SNIS‑896MP4 , a
The remainder of the paper is organized as follows. Section 2 reviews related work. Section 3 details the SNIS‑896 architecture and its integration with MP4. Section 4 describes the training methodology and dataset preparation. Section 5 presents quantitative and qualitative results. Section 6 discusses limitations and future extensions. Finally, Section 7 concludes the paper.
2. Related Work 2.1 Classical Block‑Based Codecs
H.264/AVC and H.265/HEVC introduced variable‑size motion compensation, integer transforms, and in‑loop deblocking/SAO. AV1 (AOMedia Video 1) adds larger transform sizes and advanced intra‑prediction, but its complexity grows sharply for low‑bitrate settings. Extensive evaluations on the HEVC‑Class B‑D and the
2.2 Neural Video Compression | Method | Parameters | Key Idea | Reported BD‑Rate (vs. HEVC) | |--------|------------|----------|-----------------------------| | DVC [Lu et al., 2020] | 6.2 M | Recurrent auto‑encoder, optical flow | –23 % | | H.264‑like NVC [Lu et al., 2021] | 2.4 M | Hybrid block‑based + NN | –31 % | | Scale‑Space NVC [Zhang et al., 2022] | 1.1 M | Multi‑scale latent space | –38 % | | SNIS‑896MP4 (this work) | 896 | Ultra‑lightweight multi‑scale extractor + context entropy | –45 % | Most prior NVC systems rely on large latent spaces and high‑precision arithmetic , which limit deployment on low‑power devices. A handful of works (e.g., Tiny‑VIC [Kim et al., 2023]) reduce model size but sacrifice scalability and container compliance. 2.3 Scalable Streaming
Scalable Video Coding (SVC) extends H.264/AVC with layered bitstreams but adds considerable overhead. Layered NVC (e.g., L‑NVC [Wang et al., 2023]) introduces progressive latent refinement but requires custom container formats.