vufinder/vggt-1b 🖼️✓❓🔢📝 → ❓
About
Feed-forward neural network that directly infers all key 3D attributes of a scene.
Example Output
Output
Performance Metrics
3.96s
Prediction Time
303.13s
Total Time
All Input Parameters
{
"inputs": [
"https://replicate.delivery/pbxt/NttVVVnd5x4l2kwGctcHSV4MmWztBlPoPeu4FxYR7d15uUH0/001.mp4"
],
"normals": true,
"to_base64": true,
"pcd_source": "point_head",
"return_pcd": true,
"point_scales": true,
"sampling_rate": 24,
"keys_to_exclude": "",
"alpha_blend_onto": "keep",
"weighted_pose_transform": true,
"enable_pose_postprocessing": true
}
Input Parameters
- inputs (required)
- Input files, accepts JPG, JPEG, PNG, WEBP files for images and MP4, AVI, MOV files for video. The input images and sampled video frames will be padded to a single aspect ratio and resized to a maximum dimension of 518 pixels.
- normals
- Whether to compute normals for the point cloud or not.
- to_base64
- Whether to return the arrays in JSON files as base64 strings with shape and dtype.
- pcd_source
- Whether point cloud is generated from the output of the point or depth head. Has no effect if return_pcd is False.
- return_pcd
- Whether to return a point cloud file or not.
- point_scales
- Whether to compute a scaling factor for each point based on its depth.
- sampling_rate
- Sampling rate for video input as every n-th frame. Only applies to video inputs. First and last frames of the video will always be included.
- keys_to_exclude
- Comma-separated list of keys to exclude from the output JSON files.
- alpha_blend_onto
- Blend mode for images with alpha channels. The 'mean' mode blends the image onto ImageNet mean RGB values. The 'keep' mode keeps the original pixel values.
- weighted_pose_transform
- Whether to apply a weighted transformation to the predicted camera poses. Used when `enable_pose_postprocessing` is True. Defaults to False.
- enable_pose_postprocessing
- Whether to postprocess the predicted camera poses. If True, the poses will be transformed by fitting unprojected depth to world points. Defaults to False.
Output Schema
Example Execution Logs
2026-03-11 12:42:13 [INFO] predict: Received 1 input files. 2026-03-11 12:42:13 [INFO] predict: Received 0 valid images. 2026-03-11 12:42:13 [INFO] predict: Received 1 valid videos. 2026-03-11 12:42:13 [INFO] utils.load: Extracting frames from video 1 of 1... 2026-03-11 12:42:13 [INFO] utils.load: Extracted 9 frames from video 1 of 1! 2026-03-11 12:42:14 [INFO] predict: Running inference... 2026-03-11 12:42:14 [INFO] predict: Inference done! 2026-03-11 12:42:14 [INFO] predict: Postprocessing the results... 2026-03-11 12:42:14 [INFO] utils.pose: Postprocessing 9 poses... 2026-03-11 12:42:15 [INFO] utils.pose: Postprocessed poses! 2026-03-11 12:42:15 [INFO] utils.normals: Computing normals from world points... 2026-03-11 12:42:15 [INFO] utils.normals: Normals computed! 2026-03-11 12:42:15 [INFO] utils.scales: Computing point scales from depth... 2026-03-11 12:42:15 [INFO] utils.scales: Point scales computed! 2026-03-11 12:42:15 [INFO] utils.glb: Building GLB scene... 2026-03-11 12:42:15 [INFO] utils.glb: GLB Scene built! 2026-03-11 12:42:15 [INFO] utils.output: Dumping data to 9 JSON files... 2026-03-11 12:42:15 [INFO] utils.output: Dumped 9 JSON files! 2026-03-11 12:42:15 [INFO] predict: Postprocessed the results!
Version Details
- Version ID
8f588e57226dc37aecdfceda935eac3ab3f8632b48d385a6c2d86cf6bf73cd23- Version Created
- March 11, 2026