Generative inbetweening aims to generate intermediate frame sequences by utilizing two key frames as input. Although remarkable progress has been made in video generation models, generative inbetweening still faces challenges in maintaining temporal stability due to the ambiguous interpolation path between two key frames. This issue becomes particularly severe when there is a large motion gap between input frames. In this paper, we propose a straightforward yet highly effective Frame-wise Conditions-driven Video Generation (FCVG) method that significantly enhances the temporal stability of interpolated video frames. Specifically, our FCVG provides an explicit condition for each frame, making it much easier to identify the interpolation path between two input frames and thus ensuring temporally stable production of visually plausible video frames. To achieve this, we suggest extracting matched lines from two input frames that can then be easily interpolated frame by frame, serving as frame-wise conditions seamlessly integrated into existing video generation models. In extensive evaluations covering diverse scenarios such as natural landscapes, complex human poses, camera movements and animations, existing methods often exhibit incoherent transitions across frames. In contrast, our FCVG demonstrates the capability to generate temporally stable videos using both linear and non-linear interpolation curves.
Comparison with State-of-the-arts
Case0
Start Frame
FILM
TRF
End Frame
GI
Ours
Case1
Start Frame
FILM
TRF
End Frame
GI
Ours
Case2
Start Frame
FILM
TRF
End Frame
GI
Ours
Case3
Start Frame
FILM
TRF
End Frame
GI
Ours
Case4
Start Frame
FILM
TRF
End Frame
GI
Ours
Case5
Start Frame
FILM
TRF
End Frame
GI
Ours
Case6
Start Frame
FILM
TRF
End Frame
GI
Ours
Case7
Start Frame
FILM
TRF
End Frame
GI
Ours
Case8
Start Frame
FILM
TRF
Input Frame 1
GI
Ours
Case9
Start Frame
FILM
TRF
End Frame
GI
Ours
Case10
Start Frame
FILM
TRF
End Frame
GI
Ours
Ablation Study
The `w/o Pose' and `w/o Matching' indicate the removal of human pose and line matching conditions, respectively. The line matching condition governs the overall motion of the scene, and the pose condition benefits details with human movements.