vufinder/vggt-1b 🖼️❓✓🔢 → ❓
About
Feed-forward neural network that directly infers all key 3D attributes of a scene.

Example Output
Output
Performance Metrics
26.88s
Prediction Time
277.57s
Total Time
All Input Parameters
{ "inputs": [ "https://replicate.delivery/pbxt/NttVVVnd5x4l2kwGctcHSV4MmWztBlPoPeu4FxYR7d15uUH0/001.mp4" ], "pcd_source": "point_head", "return_pcd": true, "sampling_rate": 24 }
Input Parameters
- inputs (required)
- Input files, accepts JPG, JPEG, PNG, WEBP files for images and MP4, AVI, MOV files for video. The input images and sampled video frames will be padded to a single aspect ratio and resized to a maximum dimension of 518 pixels.
- pcd_source
- Whether point cloud is generated from the output of the point or depth head. Has no effect if return_pcd is False.
- return_pcd
- Whether to return a point cloud file or not.
- sampling_rate
- Sampling rate for video input as every n-th frame. Only applies to video inputs. First and last frames of the video will always be included.
Output Schema
Example Execution Logs
Received 0 valid images. Received 1 valid videos. Extracting frames from video 1 of 1... Extracted 9 frames from video 1 of 1! Running inference... Inference done! Postprocessing the results... Building GLB scene... GLB Scene built! Dumping data to 9 JSON files... Dumped 9 JSON files! Postprocessed the results!
Version Details
- Version ID
75dd551d859879ad5c953c6865442ab885fa3c0e924637f8c03b6f4bbd3649d8
- Version Created
- October 17, 2025