zsxkib/star 🔢🖼️📝❓ → 🖼️

▶️ 648 runs 📅 Jan 2025 ⚙️ Cog 0.13.7 🔗 GitHub 📄 Paper ⚖️ License

About

STAR Video Upscaler: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

Example Output

Prompt:

"The video is a black-and-white silent film featuring two men in wheelchairs on a pier. The foreground man, in a suit and hat, holds a sign reading 'HELP A CRIPPLE.' The background shows a building and a boat, with early 20th-century clothing and image quality suggesting a narrative of disability and assistance"

Output

Performance Metrics

109.45s Prediction Time

109.46s Total Time

All Input Parameters

{
  "steps": 2,
  "video": "https://replicate.delivery/pbxt/MPyx99JPK5KknsPe31tsSGL79Noiq8asmC8re9rID8fagp2X/016_video.mp4",
  "prompt": "The video is a black-and-white silent film featuring two men in wheelchairs on a pier. The foreground man, in a suit and hat, holds a sign reading 'HELP A CRIPPLE.' The background shows a building and a boat, with early 20th-century clothing and image quality suggesting a narrative of disability and assistance",
  "upscale": 4,
  "chunk_size": 3,
  "solver_mode": "normal",
  "max_chunk_len": 24,
  "guidance_scale": 7.5
}

Input Parameters

steps Type: integerDefault: 5Range: 2 - 50: Number of diffusion steps (normal mode only). 1-5: balanced, 10-50: extreme details (very slower)
video (required) Type: string: Input video file to enhance
prompt Type: stringDefault: Realistic high quality video with realistic details and vibrant colors: Detailed text description of video content. Include: Main subjects, colors, motion details, quality aspects. Example: '4K close-up of a golden retriever running through autumn leaves, vibrant orange and yellow colors, sharp details'
upscale Type: integerDefault: 4Range: 1 - 4: Super-resolution scaling factor.
chunk_size Type: integerDefault: 3Range: 1 - 10: Parallel processing batches.
solver_mode Default: normal: Sampling strategy: 'fast' (fixed 15 steps) for quick results, 'normal' (custom steps) for quality tuning
max_chunk_len Type: integerDefault: 24Range: 1 - 32: Frame group size for temporal processing. Higher values improve motion consistency but increase VRAM usage (24 = ~8GB VRAM)
guidance_scale Type: numberDefault: 7.5Range: 1 - 20: Text-video alignment strength. Lower: creative interpretation (5.0-7.5), Higher: strict prompt adherence (8.0-15.0)

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

2025-01-31 16:06:52,250 - video_to_video - INFO - checkpoint_path: pretrained_weight/I2VGen-XL_heavy_deg
2025-01-31 16:06:59,212 - video_to_video - INFO - Build encoder with FrozenOpenCLIPEmbedder
Model found at pretrained_weight/I2VGen-XL_heavy_deg, skipping download.
model_file_path: pretrained_weight/I2VGen-XL_heavy_deg/model.pt
2025-01-31 16:07:10,872 - video_to_video - INFO - Load model path pretrained_weight/I2VGen-XL_heavy_deg, with local status <All keys matched successfully>
2025-01-31 16:07:10,873 - video_to_video - INFO - Build diffusion with GaussianDiffusion
Cannot initialize model with low cpu memory usage because `accelerate` was not found in the environment. Defaulting to `low_cpu_mem_usage=False`. It is strongly recommended to install `accelerate` for faster and less memory-intense model loading. You can do so with:
```
pip install accelerate
```
.
2025-01-31 16:07:11,405 - video_to_video - INFO - Build Temporal VAE
Enhancing video:   0%|          | 0/2 [00:00<?, ?it/s]2025-01-31 16:07:11,558 - video_to_video - INFO - input video path: /tmp/tmpq3y7lksq016_video.mp4
2025-01-31 16:07:11,558 - video_to_video - INFO - text: The video is a black-and-white silent film featuring two men in wheelchairs on a pier. The foreground man, in a suit and hat, holds a sign reading 'HELP A CRIPPLE.' The background shows a building and a boat, with early 20th-century clothing and image quality suggesting a narrative of disability and assistance
2025-01-31 16:07:11,572 - video_to_video - INFO - input frames length: 100
2025-01-31 16:07:11,572 - video_to_video - INFO - input fps: 25.0
2025-01-31 16:07:11,642 - video_to_video - INFO - input resolution: (240, 320)
2025-01-31 16:07:11,642 - video_to_video - INFO - target resolution: (960, 1280)
2025-01-31 16:07:11,690 - video_to_video - INFO - video_data shape: torch.Size([100, 3, 960, 1280])
Diffusion steps:   0%|          | 0/2 [00:00<?, ?step/s][A2025-01-31 16:07:20,006 - video_to_video - INFO - step: 0
Diffusion steps:  50%|█████     | 1/2 [00:26<00:26, 26.21s/step][A2025-01-31 16:07:46,213 - video_to_video - INFO - step: 1
Diffusion steps: 100%|██████████| 2/2 [00:52<00:00, 26.22s/step][A
Diffusion steps: 100%|██████████| 2/2 [00:52<00:00, 26.21s/step]
2025-01-31 16:08:12,434 - video_to_video - INFO - sampling, finished.
2025-01-31 16:08:34,479 - video_to_video - INFO - temporal vae decoding, finished.
Enhancing video:   0%|          | 0/2 [01:30<?, ?it/s]

Version Details

Version ID: f466b8466225aa1d2f444f683369b82c75dab0b270e4b2f7360960b364e0fa1e
Version Created: January 31, 2025

Run on Replicate →