gfodor/text2vox 🔢📝❓✓ → 🖼️

▶️ 131 runs 📅 Mar 2025 ⚙️ Cog 0.14.3 🔗 GitHub ⚖️ License
game-asset-generation text-to-3d voxel

About

Generates MagicaVoxel VOX models, using flux dev + hunyuan3d-2. Can generate high detail and low detail models at varying resolutions.

Example Output

Prompt:

"A wizard in a purple robe and pointed hat, wielding a staff and casting a spell."

Output

Example outputExample outputExample outputExample output

Performance Metrics

159.44s Prediction Time
166.16s Total Time
All Input Parameters
{
  "seed": 1234,
  "steps": 50,
  "prompt": "A wizard in a purple robe and pointed hat, wielding a staff and casting a spell.",
  "guidance": 6,
  "detail_level": "high",
  "guidance_scale": 5.5,
  "prompt_strength": 0.8,
  "octree_resolution": 512,
  "remove_background": true,
  "num_inference_steps": 50
}
Input Parameters
seed Type: integerDefault: 1234
Random seed
steps Type: integerDefault: 50Range: 20 - 50
Number of inference steps for Hunyuan
prompt (required) Type: string
Prompt for generated image
guidance Type: numberDefault: 6Range: 0 - 10
Guidance scale for Flux
detail_level Default: high
Detail level. High will try to generate a good high resolution voxel model, and low try to generate a good low resolution voxel model.
guidance_scale Type: numberDefault: 5.5Range: 1 - 20
Guidance scale for Hunyuan
prompt_strength Type: numberDefault: 0.8Range: 0 - 1
Prompt strength for img2img in Flux (only applicable if image is provided)
octree_resolution Default: 512
Octree resolution for Hunyuan
remove_background Type: booleanDefault: true
Remove the background from the generated image. Useful to turn off if you want to generate a full voxel scene.
num_inference_steps Type: integerDefault: 50Range: 1 - 50
Number of inference steps for Flux
Output Schema

Output

Type: arrayItems Type: stringItems Format: uri

Example Execution Logs
Using seed: 1234
0it [00:00, ?it/s]
1it [00:00,  3.51it/s]
2it [00:00,  2.62it/s]
3it [00:01,  2.42it/s]
4it [00:01,  2.34it/s]
5it [00:02,  2.29it/s]
6it [00:02,  2.27it/s]
7it [00:02,  2.25it/s]
8it [00:03,  2.24it/s]
9it [00:03,  2.23it/s]
10it [00:04,  2.23it/s]
11it [00:04,  2.22it/s]
12it [00:05,  2.22it/s]
13it [00:05,  2.21it/s]
14it [00:06,  2.21it/s]
15it [00:06,  2.21it/s]
16it [00:07,  2.21it/s]
17it [00:07,  2.21it/s]
18it [00:07,  2.21it/s]
19it [00:08,  2.21it/s]
20it [00:08,  2.21it/s]
21it [00:09,  2.21it/s]
22it [00:09,  2.21it/s]
23it [00:10,  2.21it/s]
24it [00:10,  2.21it/s]
25it [00:11,  2.21it/s]
26it [00:11,  2.21it/s]
27it [00:12,  2.21it/s]
28it [00:12,  2.20it/s]
29it [00:12,  2.21it/s]
30it [00:13,  2.21it/s]
31it [00:13,  2.20it/s]
32it [00:14,  2.21it/s]
33it [00:14,  2.20it/s]
34it [00:15,  2.21it/s]
35it [00:15,  2.20it/s]
36it [00:16,  2.21it/s]
37it [00:16,  2.20it/s]
38it [00:17,  2.20it/s]
39it [00:17,  2.21it/s]
40it [00:17,  2.21it/s]
41it [00:18,  2.20it/s]
42it [00:18,  2.20it/s]
43it [00:19,  2.20it/s]
44it [00:19,  2.20it/s]
45it [00:20,  2.20it/s]
46it [00:20,  2.20it/s]
47it [00:21,  2.20it/s]
48it [00:21,  2.20it/s]
49it [00:22,  2.20it/s]
50it [00:22,  2.20it/s]
50it [00:22,  2.22it/s]
Total safe images: 1 out of 1
Diffusion Sampling::   0%|          | 0/50 [00:00<?, ?it/s]
Diffusion Sampling::   4%|▍         | 2/50 [00:00<00:03, 14.64it/s]
Diffusion Sampling::   8%|▊         | 4/50 [00:00<00:03, 12.88it/s]
Diffusion Sampling::  12%|█▏        | 6/50 [00:00<00:03, 12.38it/s]
Diffusion Sampling::  16%|█▌        | 8/50 [00:00<00:03, 12.14it/s]
Diffusion Sampling::  20%|██        | 10/50 [00:00<00:03, 11.93it/s]
Diffusion Sampling::  24%|██▍       | 12/50 [00:00<00:03, 11.99it/s]
Diffusion Sampling::  28%|██▊       | 14/50 [00:01<00:03, 11.93it/s]
Diffusion Sampling::  32%|███▏      | 16/50 [00:01<00:02, 11.87it/s]
Diffusion Sampling::  36%|███▌      | 18/50 [00:01<00:02, 11.85it/s]
Diffusion Sampling::  40%|████      | 20/50 [00:01<00:02, 11.83it/s]
Diffusion Sampling::  44%|████▍     | 22/50 [00:01<00:02, 11.80it/s]
Diffusion Sampling::  48%|████▊     | 24/50 [00:02<00:02, 11.79it/s]
Diffusion Sampling::  52%|█████▏    | 26/50 [00:02<00:02, 11.77it/s]
Diffusion Sampling::  56%|█████▌    | 28/50 [00:02<00:01, 11.79it/s]
Diffusion Sampling::  60%|██████    | 30/50 [00:02<00:01, 11.82it/s]
Diffusion Sampling::  64%|██████▍   | 32/50 [00:02<00:01, 11.79it/s]
Diffusion Sampling::  68%|██████▊   | 34/50 [00:02<00:01, 11.80it/s]
Diffusion Sampling::  72%|███████▏  | 36/50 [00:03<00:01, 11.79it/s]
Diffusion Sampling::  76%|███████▌  | 38/50 [00:03<00:01, 11.80it/s]
Diffusion Sampling::  80%|████████  | 40/50 [00:03<00:00, 11.80it/s]
Diffusion Sampling::  84%|████████▍ | 42/50 [00:03<00:00, 11.81it/s]
Diffusion Sampling::  88%|████████▊ | 44/50 [00:03<00:00, 11.81it/s]
Diffusion Sampling::  92%|█████████▏| 46/50 [00:03<00:00, 11.79it/s]
Diffusion Sampling::  96%|█████████▌| 48/50 [00:04<00:00, 11.81it/s]
Diffusion Sampling:: 100%|██████████| 50/50 [00:04<00:00, 11.81it/s]
Diffusion Sampling:: 100%|██████████| 50/50 [00:04<00:00, 11.89it/s]
2025-03-24 07:17:43,790 - hy3dgen.shapgen - INFO - FlashVDMVolumeDecoding Resolution: [63, 126, 252, 504]
FlashVDM Volume Decoding:   0%|          | 0/2 [00:00<?, ?it/s]
FlashVDM Volume Decoding: 100%|██████████| 2/2 [00:00<00:00, 63.39it/s]
Extracting textures from mesh.glb...
param = textures
Launching asset import ...           OK
Validating postprocessing flags ...  OK
0 %
50 %
Importing file ...                   OK
import took approx. 0.01072 seconds
assimp extract: Exporting 1 textures
assimp extract: Texture 0 is compressed (png). Writing native file format.
assimp extract: Wrote texture 0 to textures_img0.png
Exporting to OBJ format...
param = mesh_with_textures.obj
param = -ptv
assimp export: select file format: 'obj' (Wavefront OBJ format)
Launching asset import ...           OK
Validating postprocessing flags ...  OK
0 %
50 %
50 %
51.5152 %
53.0303 %
54.5455 %
56.0606 %
57.5758 %
59.0909 %
60.6061 %
62.1212 %
63.6364 %
65.1515 %
66.6667 %
68.1818 %
69.697 %
71.2121 %
72.7273 %
74.2424 %
75.7576 %
77.2727 %
78.7879 %
80.303 %
81.8182 %
83.3333 %
84.8485 %
86.3636 %
87.8788 %
89.3939 %
90.9091 %
92.4242 %
93.9394 %
95.4546 %
96.9697 %
98.4848 %
100 %
Importing file ...                   OK
import took approx. 0.01152 seconds
Launching asset export ...           OK
Exporting file ...                   OK
export took approx. 0.15870 seconds
assimp export: wrote output file: mesh_with_textures.obj
Fixing texture references in MTL file...
sed: can't read : No such file or directory
Converting to VOX format with resolution 96...
wine: created the configuration directory '/root/.wine'
0048:err:explorer:initialize_display_settings Failed to query current display settings for L"\\\\.\\DISPLAY1".
0048:err:ole:StdMarshalImpl_MarshalInterface Failed to create ifstub, hr 0x80004002
0048:err:ole:CoMarshalInterface Failed to marshal the interface {6d5140c1-7436-11ce-8034-00aa006009fa}, hr 0x80004002
0048:err:ole:apartment_get_local_server_stream Failed: 0x80004002
002c:err:winediag:nodrv_CreateWindow Application tried to create a window, but no driver could be loaded.
002c:err:winediag:nodrv_CreateWindow Make sure that your X server is running and that $DISPLAY is set correctly.
0050:err:winediag:nodrv_CreateWindow Application tried to create a window, but no driver could be loaded.
0050:err:winediag:nodrv_CreateWindow Make sure that your X server is running and that $DISPLAY is set correctly.
0050:err:ole:apartment_createwindowifneeded CreateWindow failed with error 0
0050:err:ole:apartment_createwindowifneeded CreateWindow failed with error 0
0050:err:ole:apartment_createwindowifneeded CreateWindow failed with error 14007
0050:err:ole:StdMarshalImpl_MarshalInterface Failed to create ifstub, hr 0x800736b7
0050:err:ole:CoMarshalInterface Failed to marshal the interface {6d5140c1-7436-11ce-8034-00aa006009fa}, hr 0x800736b7
0050:err:ole:apartment_get_local_server_stream Failed: 0x800736b7
0050:err:ole:start_rpcss Failed to open RpcSs service
0040:err:winediag:nodrv_CreateWindow Application tried to create a window, but no driver could be loaded.
0040:err:winediag:nodrv_CreateWindow Make sure that your X server is running and that $DISPLAY is set correctly.
0040:err:setupapi:SetupDefaultQueueCallbackW copy error 1812 L"@wineusb.sys,-1" -> L"C:\\windows\\inf\\wineusb.inf"
Could not find Wine Gecko. HTML rendering will be disabled.
0098:err:winediag:nodrv_CreateWindow Application tried to create a window, but no driver could be loaded.
0098:err:winediag:nodrv_CreateWindow Make sure that your X server is running and that $DISPLAY is set correctly.
Could not find Wine Gecko. HTML rendering will be disabled.
wine: configuration in L"/root/.wine" has been updated.
Reading mesh_with_textures.obj
Reading mesh_with_textures.mtl.... found
mesh_with_textures.mtl NOT FOUND or not supported format
textures_img0.png loaded successfully
Scale factor used (voxel/polygon units):48.865746
x:0..69, y:0..53, z:0..96
Writing tmp.vox (70x54x97)
0.26 seconds
Done.
Converting to MagicaVoxel format...
Generating SVOX and GLTF...
Assertion failed: RGBA length mismatch 65456:65452
Converting GLTF to GLB...
Total: 58.206ms
Conversion complete: output/mesh_large.vox, output/mesh_large.vox.gltf, and output/mesh_large.vox.glb
Cleaning up intermediate files...
Extracting textures from mesh.glb...
param = textures
Launching asset import ...           OK
Validating postprocessing flags ...  OK
0 %
50 %
Importing file ...                   OK
import took approx. 0.00699 seconds
assimp extract: Exporting 1 textures
assimp extract: Texture 0 is compressed (png). Writing native file format.
assimp extract: Wrote texture 0 to textures_img0.png
Exporting to OBJ format...
param = mesh_with_textures.obj
param = -ptv
assimp export: select file format: 'obj' (Wavefront OBJ format)
Launching asset import ...           OK
Validating postprocessing flags ...  OK
0 %
50 %
50 %
51.5152 %
53.0303 %
54.5455 %
56.0606 %
57.5758 %
59.0909 %
60.6061 %
62.1212 %
63.6364 %
65.1515 %
66.6667 %
68.1818 %
69.697 %
71.2121 %
72.7273 %
74.2424 %
75.7576 %
77.2727 %
78.7879 %
80.303 %
81.8182 %
83.3333 %
84.8485 %
86.3636 %
87.8788 %
89.3939 %
90.9091 %
92.4242 %
93.9394 %
95.4546 %
96.9697 %
98.4848 %
100 %
Importing file ...                   OK
import took approx. 0.00774 seconds
Launching asset export ...           OK
Exporting file ...                   OK
export took approx. 0.14839 seconds
assimp export: wrote output file: mesh_with_textures.obj
Fixing texture references in MTL file...
sed: can't read : No such file or directory
Converting to VOX format with resolution 80...
0050:err:explorer:initialize_display_settings Failed to query current display settings for L"\\\\.\\DISPLAY1".
Reading mesh_with_textures.obj
Reading mesh_with_textures.mtl.... found
mesh_with_textures.mtl NOT FOUND or not supported format
textures_img0.png loaded successfully
Scale factor used (voxel/polygon units):40.712963
x:0..57, y:0..44, z:0..80
Writing tmp.vox (58x45x81)
0.27 seconds
Done.
Converting to MagicaVoxel format...
Skipping SVOX, GLTF, and GLB generation as requested
Conversion complete: output/mesh_medium.vox
Cleaning up intermediate files...
Extracting textures from mesh.glb...
param = textures
Launching asset import ...           OK
Validating postprocessing flags ...  OK
0 %
50 %
Importing file ...                   OK
import took approx. 0.00719 seconds
assimp extract: Exporting 1 textures
assimp extract: Texture 0 is compressed (png). Writing native file format.
assimp extract: Wrote texture 0 to textures_img0.png
Exporting to OBJ format...
param = mesh_with_textures.obj
param = -ptv
assimp export: select file format: 'obj' (Wavefront OBJ format)
Launching asset import ...           OK
Validating postprocessing flags ...  OK
0 %
50 %
50 %
51.5152 %
53.0303 %
54.5455 %
56.0606 %
57.5758 %
59.0909 %
60.6061 %
62.1212 %
63.6364 %
65.1515 %
66.6667 %
68.1818 %
69.697 %
71.2121 %
72.7273 %
74.2424 %
75.7576 %
77.2727 %
78.7879 %
80.303 %
81.8182 %
83.3333 %
84.8485 %
86.3636 %
87.8788 %
89.3939 %
90.9091 %
92.4242 %
93.9394 %
95.4546 %
96.9697 %
98.4848 %
100 %
Importing file ...                   OK
import took approx. 0.00741 seconds
Launching asset export ...           OK
Exporting file ...                   OK
export took approx. 0.14602 seconds
assimp export: wrote output file: mesh_with_textures.obj
Fixing texture references in MTL file...
sed: can't read : No such file or directory
Converting to VOX format with resolution 64...
Reading mesh_with_textures.obj
Reading mesh_with_textures.mtl.... found
mesh_with_textures.mtl NOT FOUND or not supported format
textures_img0.png loaded successfully
Scale factor used (voxel/polygon units):32.560181
x:0..46, y:0..35, z:0..64
Writing tmp.vox (47x36x65)
0.24 seconds
Done.
Converting to MagicaVoxel format...
Skipping SVOX, GLTF, and GLB generation as requested
Conversion complete: output/mesh_small.vox
Cleaning up intermediate files...
Version Details
Version ID
740dd98e409d65e7a0fa02ec9bf549bd903b19a18a0cab02865d4615225c8ba5
Version Created
March 24, 2025
Run on Replicate →