vufinder/vggt-1b-depth 🖼️✓🔢 → ❓
About
Feed-forward neural network that directly infers all key 3D attributes of a scene.

Example Output
Output
Performance Metrics
4.00s
Prediction Time
172.42s
Total Time
All Input Parameters
{ "images": [ "https://replicate.delivery/pbxt/NqKIgWflm2NVvlBpLwR3rSlNKogFLy0VGwGI9apml3mtJEUy/0-1.jpg" ], "return_pcd": true, "return_depth": true }
Input Parameters
- inputs (required)
- Input files, accepts JPG, JPEG, PNG, WEBP files for images and MP4, AVI, MOV files for video. The input images and sampled video frames will be padded to a single aspect ratio and resized to a maximum dimension of 518 pixels.
- return_pcd
- Whether to return a point cloud file or not.
- return_depth
- Whether to return depth images or not.
- sampling_rate
- Sampling rate for video input as every n-th frame. Only applies to video inputs. First and last frames of the video will always be included.
Output Schema
Example Execution Logs
Received 1 valid images. Running inference... Inference done! Postprocessing the results... Building GLB scene... GLB Scene built! Dumping data to 1 JSON files... Dumped 1 JSON files! Dumping depth data into PNG files... Dumped 1 PNG files! Postprocessed the results!
Version Details
- Version ID
0c77fc7a66d501970981d5d9d32c8d0699425c787eb27bfaa9cfa449dfd574ec
- Version Created
- October 17, 2025