vufinder/vggt-1b 🖼️❓✓🔢 → ❓

▶️ 49 runs 📅 Oct 2025 ⚙️ Cog 0.16.8 🔗 GitHub 📄 Paper ⚖️ License
3d-reconstruction camera-pose-estimation depth-estimation image-to-3d video-to-3d

About

Feed-forward neural network that directly infers all key 3D attributes of a scene.

Example Output

Output

Performance Metrics

26.88s Prediction Time
277.57s Total Time
All Input Parameters
{
  "inputs": [
    "https://replicate.delivery/pbxt/NttVVVnd5x4l2kwGctcHSV4MmWztBlPoPeu4FxYR7d15uUH0/001.mp4"
  ],
  "pcd_source": "point_head",
  "return_pcd": true,
  "sampling_rate": 24
}
Input Parameters
inputs (required) Type: array
Input files, accepts JPG, JPEG, PNG, WEBP files for images and MP4, AVI, MOV files for video. The input images and sampled video frames will be padded to a single aspect ratio and resized to a maximum dimension of 518 pixels.
pcd_source Default: point_head
Whether point cloud is generated from the output of the point or depth head. Has no effect if return_pcd is False.
return_pcd Type: booleanDefault: true
Whether to return a point cloud file or not.
sampling_rate Type: integerDefault: 24
Sampling rate for video input as every n-th frame. Only applies to video inputs. First and last frames of the video will always be included.
Output Schema
Example Execution Logs
Received 0 valid images.
Received 1 valid videos.
Extracting frames from video 1 of 1...
Extracted 9 frames from video 1 of 1!
Running inference...
Inference done!
Postprocessing the results...
Building GLB scene...
GLB Scene built!
Dumping data to 9 JSON files...
Dumped 9 JSON files!
Postprocessed the results!
Version Details
Version ID
75dd551d859879ad5c953c6865442ab885fa3c0e924637f8c03b6f4bbd3649d8
Version Created
October 17, 2025
Run on Replicate →