vufinder/vggt-1b 🖼️✓❓🔢📝 → ❓

▶️ 12.1K runs 📅 Oct 2025 ⚙️ Cog 0.16.12 🔗 GitHub 📄 Paper ⚖️ License
3d-reconstruction camera-pose-estimation depth-estimation image-to-3d video-to-3d

About

Feed-forward neural network that directly infers all key 3D attributes of a scene.

Example Output

Output

Performance Metrics

3.96s Prediction Time
303.13s Total Time
All Input Parameters
{
  "inputs": [
    "https://replicate.delivery/pbxt/NttVVVnd5x4l2kwGctcHSV4MmWztBlPoPeu4FxYR7d15uUH0/001.mp4"
  ],
  "normals": true,
  "to_base64": true,
  "pcd_source": "point_head",
  "return_pcd": true,
  "point_scales": true,
  "sampling_rate": 24,
  "keys_to_exclude": "",
  "alpha_blend_onto": "keep",
  "weighted_pose_transform": true,
  "enable_pose_postprocessing": true
}
Input Parameters
inputs (required) Type: array
Input files, accepts JPG, JPEG, PNG, WEBP files for images and MP4, AVI, MOV files for video. The input images and sampled video frames will be padded to a single aspect ratio and resized to a maximum dimension of 518 pixels.
normals Type: booleanDefault: false
Whether to compute normals for the point cloud or not.
to_base64 Type: booleanDefault: true
Whether to return the arrays in JSON files as base64 strings with shape and dtype.
pcd_source Default: point_head
Whether point cloud is generated from the output of the point or depth head. Has no effect if return_pcd is False.
return_pcd Type: booleanDefault: true
Whether to return a point cloud file or not.
point_scales Type: booleanDefault: false
Whether to compute a scaling factor for each point based on its depth.
sampling_rate Type: integerDefault: 24
Sampling rate for video input as every n-th frame. Only applies to video inputs. First and last frames of the video will always be included.
keys_to_exclude Type: stringDefault:
Comma-separated list of keys to exclude from the output JSON files.
alpha_blend_onto Default: white
Blend mode for images with alpha channels. The 'mean' mode blends the image onto ImageNet mean RGB values. The 'keep' mode keeps the original pixel values.
weighted_pose_transform Type: booleanDefault: false
Whether to apply a weighted transformation to the predicted camera poses. Used when `enable_pose_postprocessing` is True. Defaults to False.
enable_pose_postprocessing Type: booleanDefault: false
Whether to postprocess the predicted camera poses. If True, the poses will be transformed by fitting unprojected depth to world points. Defaults to False.
Output Schema
Example Execution Logs
2026-03-11 12:42:13 [INFO] predict: Received 1 input files.
2026-03-11 12:42:13 [INFO] predict: Received 0 valid images.
2026-03-11 12:42:13 [INFO] predict: Received 1 valid videos.
2026-03-11 12:42:13 [INFO] utils.load: Extracting frames from video 1 of 1...
2026-03-11 12:42:13 [INFO] utils.load: Extracted 9 frames from video 1 of 1!
2026-03-11 12:42:14 [INFO] predict: Running inference...
2026-03-11 12:42:14 [INFO] predict: Inference done!
2026-03-11 12:42:14 [INFO] predict: Postprocessing the results...
2026-03-11 12:42:14 [INFO] utils.pose: Postprocessing 9 poses...
2026-03-11 12:42:15 [INFO] utils.pose: Postprocessed poses!
2026-03-11 12:42:15 [INFO] utils.normals: Computing normals from world points...
2026-03-11 12:42:15 [INFO] utils.normals: Normals computed!
2026-03-11 12:42:15 [INFO] utils.scales: Computing point scales from depth...
2026-03-11 12:42:15 [INFO] utils.scales: Point scales computed!
2026-03-11 12:42:15 [INFO] utils.glb: Building GLB scene...
2026-03-11 12:42:15 [INFO] utils.glb: GLB Scene built!
2026-03-11 12:42:15 [INFO] utils.output: Dumping data to 9 JSON files...
2026-03-11 12:42:15 [INFO] utils.output: Dumped 9 JSON files!
2026-03-11 12:42:15 [INFO] predict: Postprocessed the results!
Version Details
Version ID
8f588e57226dc37aecdfceda935eac3ab3f8632b48d385a6c2d86cf6bf73cd23
Version Created
March 11, 2026
Run on Replicate →