vufinder/vggt-1b 🖼️✓❓🔢📝 → ❓
About
Feed-forward neural network that directly infers all key 3D attributes of a scene.
Example Output
Output
Performance Metrics
3.68s
Prediction Time
201.71s
Total Time
All Input Parameters
{
"inputs": [
"https://replicate.delivery/pbxt/NttVVVnd5x4l2kwGctcHSV4MmWztBlPoPeu4FxYR7d15uUH0/001.mp4"
],
"to_base64": true,
"pcd_source": "point_head",
"return_pcd": true,
"sampling_rate": 24,
"keys_to_exclude": "",
"alpha_blend_onto": "keep",
"weighted_pose_transform": false,
"enable_pose_postprocessing": false
}
Input Parameters
- inputs (required)
- Input files, accepts JPG, JPEG, PNG, WEBP files for images and MP4, AVI, MOV files for video. The input images and sampled video frames will be padded to a single aspect ratio and resized to a maximum dimension of 518 pixels.
- to_base64
- Whether to return the arrays in JSON files as base64 strings with shape and dtype.
- pcd_source
- Whether point cloud is generated from the output of the point or depth head. Has no effect if return_pcd is False.
- return_pcd
- Whether to return a point cloud file or not.
- sampling_rate
- Sampling rate for video input as every n-th frame. Only applies to video inputs. First and last frames of the video will always be included.
- keys_to_exclude
- Comma-separated list of keys to exclude from the output JSON files.
- alpha_blend_onto
- Blend mode for images with alpha channels. The 'mean' mode blends the image onto ImageNet mean RGB values. The 'keep' mode keeps the original pixel values.
- weighted_pose_transform
- Whether to apply a weighted transformation to the predicted camera poses. Used when `enable_pose_postprocessing` is True. Defaults to False.
- enable_pose_postprocessing
- Whether to postprocess the predicted camera poses. If True, the poses will be transformed by fitting unprojected depth to world points. Defaults to False.
Output Schema
Example Execution Logs
2026-01-14 09:26:30 [INFO] predict: Received 0 valid images. 2026-01-14 09:26:30 [INFO] predict: Received 1 valid videos. 2026-01-14 09:26:30 [WARNING] predict: Some of the input files were ignored! 2026-01-14 09:26:30 [WARNING] predict: Accepted image extensions: .jpg, .jpeg, .png, .webp 2026-01-14 09:26:30 [WARNING] predict: Accepted video extensions: .mp4, .avi, .mov 2026-01-14 09:26:30 [INFO] utils.load: Extracting frames from video 1 of 1... 2026-01-14 09:26:31 [INFO] utils.load: Extracted 9 frames from video 1 of 1! 2026-01-14 09:26:31 [INFO] predict: Running inference... 2026-01-14 09:26:31 [INFO] predict: Inference done! 2026-01-14 09:26:31 [INFO] predict: Postprocessing the results... 2026-01-14 09:26:31 [INFO] utils.glb: Building GLB scene... 2026-01-14 09:26:32 [INFO] utils.glb: GLB Scene built! 2026-01-14 09:26:32 [INFO] utils.output: Dumping data to 9 JSON files... 2026-01-14 09:26:32 [INFO] utils.output: Dumped 9 JSON files! 2026-01-14 09:26:32 [INFO] predict: Postprocessed the results!
Version Details
- Version ID
35b7a45ef4430c677a6c71f5467ee0d98991502effe0d65bdbd8863fa205cc54- Version Created
- January 14, 2026