vufinder/vggt-1b-point 🖼️✓🔢📝❓ → ❓

▶️ 1.4K runs 📅 Oct 2025 ⚙️ Cog 0.16.8 🔗 GitHub 📄 Paper ⚖️ License

3d-reconstruction camera-pose-estimation depth-estimation image-to-3d point-cloud point-cloud-generation video-to-3d

Performance

3.6sTypical run time

~463sCold start (first call)

1.4KTotal runs

About

Feed-forward neural network that directly infers all key 3D attributes of a scene.

Example Output

Output

Performance Metrics

3.63s Prediction Time

462.71s Total Time

All Input Parameters

{
  "inputs": [
    "https://replicate.delivery/pbxt/OF7OLqU16XK5izm8GIyLoP3ui824VPOQdOHwhg55LcNpTpct/001.mp4"
  ],
  "to_base64": true,
  "return_pcd": true,
  "sampling_rate": 24,
  "keys_to_exclude": "",
  "alpha_blend_onto": "keep"
}

Input Parameters

inputs (required) Type: array: Input files, accepts JPG, JPEG, PNG, WEBP files for images and MP4, AVI, MOV files for video. The input images and sampled video frames will be padded to a single aspect ratio and resized to a maximum dimension of 518 pixels.
to_base64 Type: booleanDefault: true: Whether to return the arrays in JSON files as base64 strings with shape and dtype.
return_pcd Type: booleanDefault: true: Whether to return a point cloud file or not.
sampling_rate Type: integerDefault: 24: Sampling rate for video input as every n-th frame. Only applies to video inputs. First and last frames of the video will always be included.
keys_to_exclude Type: stringDefault:: Comma-separated list of keys to exclude from the output JSON files.
alpha_blend_onto Default: white: Blend mode for images with alpha channels. The 'mean' mode blends the image onto ImageNet mean RGB values. The 'keep' mode keeps the original pixel values.

Output Schema

Example Execution Logs

2026-01-14 09:45:02 [INFO] predict: Received 0 valid images.
2026-01-14 09:45:02 [INFO] predict: Received 1 valid videos.
2026-01-14 09:45:02 [WARNING] predict: Some of the input files were ignored!
2026-01-14 09:45:02 [WARNING] predict: Accepted image extensions: .jpg, .jpeg, .png, .webp
2026-01-14 09:45:02 [WARNING] predict: Accepted video extensions: .mp4, .avi, .mov
2026-01-14 09:45:02 [INFO] utils.load: Extracting frames from video 1 of 1...
2026-01-14 09:45:03 [INFO] utils.load: Extracted 9 frames from video 1 of 1!
2026-01-14 09:45:03 [INFO] predict: Running inference...
2026-01-14 09:45:04 [INFO] predict: Inference done!
2026-01-14 09:45:04 [INFO] predict: Postprocessing the results...
2026-01-14 09:45:04 [INFO] utils.glb: Building GLB scene...
2026-01-14 09:45:04 [INFO] utils.glb: GLB Scene built!
2026-01-14 09:45:04 [INFO] utils.output: Dumping data to 9 JSON files...
2026-01-14 09:45:04 [INFO] utils.output: Dumped 9 JSON files!
2026-01-14 09:45:04 [INFO] predict: Postprocessed the results!

Version Details

Version ID: 9f6a02f93150fe8a7054d5bfed686076c4d88df610211e2e42302675a7e455da
Version Created: January 14, 2026

Run on Replicate →