zsxkib/star 🔢🖼️📝❓ → 🖼️
About
STAR Video Upscaler: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

Example Output
Prompt:
"The video is a black-and-white silent film featuring two men in wheelchairs on a pier. The foreground man, in a suit and hat, holds a sign reading 'HELP A CRIPPLE.' The background shows a building and a boat, with early 20th-century clothing and image quality suggesting a narrative of disability and assistance"
Output
Performance Metrics
109.45s
Prediction Time
109.46s
Total Time
All Input Parameters
{ "steps": 2, "video": "https://replicate.delivery/pbxt/MPyx99JPK5KknsPe31tsSGL79Noiq8asmC8re9rID8fagp2X/016_video.mp4", "prompt": "The video is a black-and-white silent film featuring two men in wheelchairs on a pier. The foreground man, in a suit and hat, holds a sign reading 'HELP A CRIPPLE.' The background shows a building and a boat, with early 20th-century clothing and image quality suggesting a narrative of disability and assistance", "upscale": 4, "chunk_size": 3, "solver_mode": "normal", "max_chunk_len": 24, "guidance_scale": 7.5 }
Input Parameters
- steps
- Number of diffusion steps (normal mode only). 1-5: balanced, 10-50: extreme details (very slower)
- video (required)
- Input video file to enhance
- prompt
- Detailed text description of video content. Include: Main subjects, colors, motion details, quality aspects. Example: '4K close-up of a golden retriever running through autumn leaves, vibrant orange and yellow colors, sharp details'
- upscale
- Super-resolution scaling factor.
- chunk_size
- Parallel processing batches.
- solver_mode
- Sampling strategy: 'fast' (fixed 15 steps) for quick results, 'normal' (custom steps) for quality tuning
- max_chunk_len
- Frame group size for temporal processing. Higher values improve motion consistency but increase VRAM usage (24 = ~8GB VRAM)
- guidance_scale
- Text-video alignment strength. Lower: creative interpretation (5.0-7.5), Higher: strict prompt adherence (8.0-15.0)
Output Schema
Output
Example Execution Logs
2025-01-31 16:06:52,250 - video_to_video - INFO - checkpoint_path: pretrained_weight/I2VGen-XL_heavy_deg 2025-01-31 16:06:59,212 - video_to_video - INFO - Build encoder with FrozenOpenCLIPEmbedder Model found at pretrained_weight/I2VGen-XL_heavy_deg, skipping download. model_file_path: pretrained_weight/I2VGen-XL_heavy_deg/model.pt 2025-01-31 16:07:10,872 - video_to_video - INFO - Load model path pretrained_weight/I2VGen-XL_heavy_deg, with local status <All keys matched successfully> 2025-01-31 16:07:10,873 - video_to_video - INFO - Build diffusion with GaussianDiffusion Cannot initialize model with low cpu memory usage because `accelerate` was not found in the environment. Defaulting to `low_cpu_mem_usage=False`. It is strongly recommended to install `accelerate` for faster and less memory-intense model loading. You can do so with: ``` pip install accelerate ``` . 2025-01-31 16:07:11,405 - video_to_video - INFO - Build Temporal VAE Enhancing video: 0%| | 0/2 [00:00<?, ?it/s]2025-01-31 16:07:11,558 - video_to_video - INFO - input video path: /tmp/tmpq3y7lksq016_video.mp4 2025-01-31 16:07:11,558 - video_to_video - INFO - text: The video is a black-and-white silent film featuring two men in wheelchairs on a pier. The foreground man, in a suit and hat, holds a sign reading 'HELP A CRIPPLE.' The background shows a building and a boat, with early 20th-century clothing and image quality suggesting a narrative of disability and assistance 2025-01-31 16:07:11,572 - video_to_video - INFO - input frames length: 100 2025-01-31 16:07:11,572 - video_to_video - INFO - input fps: 25.0 2025-01-31 16:07:11,642 - video_to_video - INFO - input resolution: (240, 320) 2025-01-31 16:07:11,642 - video_to_video - INFO - target resolution: (960, 1280) 2025-01-31 16:07:11,690 - video_to_video - INFO - video_data shape: torch.Size([100, 3, 960, 1280]) Diffusion steps: 0%| | 0/2 [00:00<?, ?step/s][A2025-01-31 16:07:20,006 - video_to_video - INFO - step: 0 Diffusion steps: 50%|█████ | 1/2 [00:26<00:26, 26.21s/step][A2025-01-31 16:07:46,213 - video_to_video - INFO - step: 1 Diffusion steps: 100%|██████████| 2/2 [00:52<00:00, 26.22s/step][A Diffusion steps: 100%|██████████| 2/2 [00:52<00:00, 26.21s/step] 2025-01-31 16:08:12,434 - video_to_video - INFO - sampling, finished. 2025-01-31 16:08:34,479 - video_to_video - INFO - temporal vae decoding, finished. Enhancing video: 0%| | 0/2 [01:30<?, ?it/s]
Version Details
- Version ID
f466b8466225aa1d2f444f683369b82c75dab0b270e4b2f7360960b364e0fa1e
- Version Created
- January 31, 2025