vufinder/vggt-1b-depth 🖼️✓🔢📝❓ → ❓

▶️ 78 runs 📅 Oct 2025 ⚙️ Cog 0.16.8 🔗 GitHub 📄 Paper ⚖️ License
3d-reconstruction depth-estimation image-to-3d video-to-3d

About

Feed-forward neural network that directly infers all key 3D attributes of a scene.

Example Output

Output

{"data":["https://replicate.delivery/xezq/jvHaj6zhfExqOi6HEwLe6XExnXFcL0Bxc9Dybsei1vy6Wj6rA/image_0001.json","https://replicate.delivery/xezq/YsViN6AyOuL6CBOIeBT1yZPCcDvYBt6LLdieqeLrxcE6Wj6rA/image_0002.json","https://replicate.delivery/xezq/kxrdNRpp4s7PDRTCmPu4fn6Qer5sXn9RELW9uAP0fZD6Wj6rA/image_0003.json","https://replicate.delivery/xezq/UXdKPDtJLnY9JZTCSqeDt2TVemRXKeIv1iYeBR4U5rfubNqvC/image_0004.json","https://replicate.delivery/xezq/ufPR31UGtbXpCyTucRtJ3U5ljSu2PxGvTd1S7E47mUhu1oeVA/image_0005.json","https://replicate.delivery/xezq/XPhRK1T9O5LdJVh5PnGMKeZQiZsdUonbWKwMasAqj3hu1oeVA/image_0006.json","https://replicate.delivery/xezq/xGLC3RfvJxUREKOtdnq3ekiOeI74jThMCvmbwogwrfF6tG1XB/image_0007.json","https://replicate.delivery/xezq/SPMGuXj9sSroN1uV6YQjpZjKE2JnCNa9QrhwhUWNK2o3aUfKA/image_0008.json","https://replicate.delivery/xezq/VLheGZS2G3wiZqlNbQi4Q0ulXfKk0oqZ3kQJPxwNHSLeWj6rA/image_0009.json"],"point_cloud":"https://replicate.delivery/xezq/W7w8e0mXMLVZMaPRmyIT7eq5Q6OtFauae4GAirtjGGE4Wj6rA/point_cloud.glb","depth_images":["https://replicate.delivery/xezq/dITo2XcOoPKRD9NubdCFJ0x8xlfXjIY1CHCQEk6YETfcrR9VA/image_0001.png","https://replicate.delivery/xezq/6BMNdBklheVBcSPFb2wV95HfoEfZpNLrhdJWbNwaqyfwtG1XB/image_0002.png","https://replicate.delivery/xezq/XneHDz8SSKW6Iach5AWqQThsdEmTBQXhGt9dZaZrhwfcrR9VA/image_0003.png","https://replicate.delivery/xezq/NWITdEoGdUamPtHfeRM8l8jHEiIedX7NBTVlPTgGlq87Wj6rA/image_0004.png","https://replicate.delivery/xezq/mS8BDUTEsHaIIJ3eWwxYgSezON8opIpYDuR8AAolVCqdrR9VA/image_0005.png","https://replicate.delivery/xezq/2pM16e2QYiyegUNogmafBiYsqXvxC1LV27URfwEveh4qbNqvC/image_0006.png","https://replicate.delivery/xezq/tDUmN89Vw3JhHVNUoVrDPVzheun5LUYRfl1n8sWvKP5drR9VA/image_0007.png","https://replicate.delivery/xezq/XXHHH3XLEhqCFZypp2j7T7h8v8GP3pWaEqdebmxIsuqu1oeVA/image_0008.png","https://replicate.delivery/xezq/dK7XRQnbCg76Ohgr3rQIMNNichxc1kVqnvZKP7k0vMX3aUfKA/image_0009.png"]}

Performance Metrics

4.26s Prediction Time
435.20s Total Time
All Input Parameters
{
  "inputs": [
    "https://replicate.delivery/pbxt/OF7NySDTCZeexnSJBQhUapm0Ifi6C5LMMNIcHX9ckd2lEI3V/001.mp4"
  ],
  "to_base64": true,
  "return_pcd": true,
  "return_depth": true,
  "sampling_rate": 24,
  "keys_to_exclude": "",
  "alpha_blend_onto": "keep"
}
Input Parameters
inputs (required) Type: array
Input files, accepts JPG, JPEG, PNG, WEBP files for images and MP4, AVI, MOV files for video. The input images and sampled video frames will be padded to a single aspect ratio and resized to a maximum dimension of 518 pixels.
to_base64 Type: booleanDefault: true
Whether to return the arrays in JSON files as base64 strings with shape and dtype.
return_pcd Type: booleanDefault: true
Whether to return a point cloud file or not.
return_depth Type: booleanDefault: true
Whether to return depth images or not.
sampling_rate Type: integerDefault: 24
Sampling rate for video input as every n-th frame. Only applies to video inputs. First and last frames of the video will always be included.
keys_to_exclude Type: stringDefault:
Comma-separated list of keys to exclude from the output JSON files.
alpha_blend_onto Default: white
Blend mode for images with alpha channels. The 'mean' mode blends the image onto ImageNet mean RGB values. The 'keep' mode keeps the original pixel values.
Output Schema
Example Execution Logs
2026-01-14 09:58:50 [INFO] predict: Received 1 input files.
2026-01-14 09:58:50 [INFO] predict: Received 0 valid images.
2026-01-14 09:58:50 [INFO] predict: Received 1 valid videos.
2026-01-14 09:58:50 [INFO] utils.load: Extracting frames from video 1 of 1...
2026-01-14 09:58:50 [INFO] utils.load: Extracted 9 frames from video 1 of 1!
2026-01-14 09:58:51 [INFO] predict: Running inference...
2026-01-14 09:58:51 [INFO] predict: Inference done!
2026-01-14 09:58:51 [INFO] predict: Postprocessing the results...
2026-01-14 09:58:51 [INFO] utils.glb: Building GLB scene...
2026-01-14 09:58:51 [INFO] utils.glb: GLB Scene built!
2026-01-14 09:58:52 [INFO] utils.output: Dumping data to 9 JSON files...
2026-01-14 09:58:52 [INFO] utils.output: Dumped 9 JSON files!
2026-01-14 09:58:52 [INFO] utils.output: Dumping depth data into PNG files...
2026-01-14 09:58:52 [INFO] utils.output: Dumped 9 PNG files!
2026-01-14 09:58:52 [INFO] predict: Postprocessed the results!
Version Details
Version ID
cb9ca7fbf8477f0fdc598bd89a480b0e9fac4a06d7c50494137b1dd8cccd846e
Version Created
January 14, 2026
Run on Replicate →