gfodor/text2vox 🔢📝❓✓ → 🖼️
About
Generates MagicaVoxel VOX models, using flux dev + hunyuan3d-2. Can generate high detail and low detail models at varying resolutions.

Example Output
Prompt:
"A wizard in a purple robe and pointed hat, wielding a staff and casting a spell."
Output
Performance Metrics
159.44s
Prediction Time
166.16s
Total Time
All Input Parameters
{ "seed": 1234, "steps": 50, "prompt": "A wizard in a purple robe and pointed hat, wielding a staff and casting a spell.", "guidance": 6, "detail_level": "high", "guidance_scale": 5.5, "prompt_strength": 0.8, "octree_resolution": 512, "remove_background": true, "num_inference_steps": 50 }
Input Parameters
- seed
- Random seed
- steps
- Number of inference steps for Hunyuan
- prompt (required)
- Prompt for generated image
- guidance
- Guidance scale for Flux
- detail_level
- Detail level. High will try to generate a good high resolution voxel model, and low try to generate a good low resolution voxel model.
- guidance_scale
- Guidance scale for Hunyuan
- prompt_strength
- Prompt strength for img2img in Flux (only applicable if image is provided)
- octree_resolution
- Octree resolution for Hunyuan
- remove_background
- Remove the background from the generated image. Useful to turn off if you want to generate a full voxel scene.
- num_inference_steps
- Number of inference steps for Flux
Output Schema
Output
Example Execution Logs
Using seed: 1234 0it [00:00, ?it/s] 1it [00:00, 3.51it/s] 2it [00:00, 2.62it/s] 3it [00:01, 2.42it/s] 4it [00:01, 2.34it/s] 5it [00:02, 2.29it/s] 6it [00:02, 2.27it/s] 7it [00:02, 2.25it/s] 8it [00:03, 2.24it/s] 9it [00:03, 2.23it/s] 10it [00:04, 2.23it/s] 11it [00:04, 2.22it/s] 12it [00:05, 2.22it/s] 13it [00:05, 2.21it/s] 14it [00:06, 2.21it/s] 15it [00:06, 2.21it/s] 16it [00:07, 2.21it/s] 17it [00:07, 2.21it/s] 18it [00:07, 2.21it/s] 19it [00:08, 2.21it/s] 20it [00:08, 2.21it/s] 21it [00:09, 2.21it/s] 22it [00:09, 2.21it/s] 23it [00:10, 2.21it/s] 24it [00:10, 2.21it/s] 25it [00:11, 2.21it/s] 26it [00:11, 2.21it/s] 27it [00:12, 2.21it/s] 28it [00:12, 2.20it/s] 29it [00:12, 2.21it/s] 30it [00:13, 2.21it/s] 31it [00:13, 2.20it/s] 32it [00:14, 2.21it/s] 33it [00:14, 2.20it/s] 34it [00:15, 2.21it/s] 35it [00:15, 2.20it/s] 36it [00:16, 2.21it/s] 37it [00:16, 2.20it/s] 38it [00:17, 2.20it/s] 39it [00:17, 2.21it/s] 40it [00:17, 2.21it/s] 41it [00:18, 2.20it/s] 42it [00:18, 2.20it/s] 43it [00:19, 2.20it/s] 44it [00:19, 2.20it/s] 45it [00:20, 2.20it/s] 46it [00:20, 2.20it/s] 47it [00:21, 2.20it/s] 48it [00:21, 2.20it/s] 49it [00:22, 2.20it/s] 50it [00:22, 2.20it/s] 50it [00:22, 2.22it/s] Total safe images: 1 out of 1 Diffusion Sampling:: 0%| | 0/50 [00:00<?, ?it/s] Diffusion Sampling:: 4%|▍ | 2/50 [00:00<00:03, 14.64it/s] Diffusion Sampling:: 8%|▊ | 4/50 [00:00<00:03, 12.88it/s] Diffusion Sampling:: 12%|█▏ | 6/50 [00:00<00:03, 12.38it/s] Diffusion Sampling:: 16%|█▌ | 8/50 [00:00<00:03, 12.14it/s] Diffusion Sampling:: 20%|██ | 10/50 [00:00<00:03, 11.93it/s] Diffusion Sampling:: 24%|██▍ | 12/50 [00:00<00:03, 11.99it/s] Diffusion Sampling:: 28%|██▊ | 14/50 [00:01<00:03, 11.93it/s] Diffusion Sampling:: 32%|███▏ | 16/50 [00:01<00:02, 11.87it/s] Diffusion Sampling:: 36%|███▌ | 18/50 [00:01<00:02, 11.85it/s] Diffusion Sampling:: 40%|████ | 20/50 [00:01<00:02, 11.83it/s] Diffusion Sampling:: 44%|████▍ | 22/50 [00:01<00:02, 11.80it/s] Diffusion Sampling:: 48%|████▊ | 24/50 [00:02<00:02, 11.79it/s] Diffusion Sampling:: 52%|█████▏ | 26/50 [00:02<00:02, 11.77it/s] Diffusion Sampling:: 56%|█████▌ | 28/50 [00:02<00:01, 11.79it/s] Diffusion Sampling:: 60%|██████ | 30/50 [00:02<00:01, 11.82it/s] Diffusion Sampling:: 64%|██████▍ | 32/50 [00:02<00:01, 11.79it/s] Diffusion Sampling:: 68%|██████▊ | 34/50 [00:02<00:01, 11.80it/s] Diffusion Sampling:: 72%|███████▏ | 36/50 [00:03<00:01, 11.79it/s] Diffusion Sampling:: 76%|███████▌ | 38/50 [00:03<00:01, 11.80it/s] Diffusion Sampling:: 80%|████████ | 40/50 [00:03<00:00, 11.80it/s] Diffusion Sampling:: 84%|████████▍ | 42/50 [00:03<00:00, 11.81it/s] Diffusion Sampling:: 88%|████████▊ | 44/50 [00:03<00:00, 11.81it/s] Diffusion Sampling:: 92%|█████████▏| 46/50 [00:03<00:00, 11.79it/s] Diffusion Sampling:: 96%|█████████▌| 48/50 [00:04<00:00, 11.81it/s] Diffusion Sampling:: 100%|██████████| 50/50 [00:04<00:00, 11.81it/s] Diffusion Sampling:: 100%|██████████| 50/50 [00:04<00:00, 11.89it/s] 2025-03-24 07:17:43,790 - hy3dgen.shapgen - INFO - FlashVDMVolumeDecoding Resolution: [63, 126, 252, 504] FlashVDM Volume Decoding: 0%| | 0/2 [00:00<?, ?it/s] FlashVDM Volume Decoding: 100%|██████████| 2/2 [00:00<00:00, 63.39it/s] Extracting textures from mesh.glb... param = textures Launching asset import ... OK Validating postprocessing flags ... OK 0 % 50 % Importing file ... OK import took approx. 0.01072 seconds assimp extract: Exporting 1 textures assimp extract: Texture 0 is compressed (png). Writing native file format. assimp extract: Wrote texture 0 to textures_img0.png Exporting to OBJ format... param = mesh_with_textures.obj param = -ptv assimp export: select file format: 'obj' (Wavefront OBJ format) Launching asset import ... OK Validating postprocessing flags ... OK 0 % 50 % 50 % 51.5152 % 53.0303 % 54.5455 % 56.0606 % 57.5758 % 59.0909 % 60.6061 % 62.1212 % 63.6364 % 65.1515 % 66.6667 % 68.1818 % 69.697 % 71.2121 % 72.7273 % 74.2424 % 75.7576 % 77.2727 % 78.7879 % 80.303 % 81.8182 % 83.3333 % 84.8485 % 86.3636 % 87.8788 % 89.3939 % 90.9091 % 92.4242 % 93.9394 % 95.4546 % 96.9697 % 98.4848 % 100 % Importing file ... OK import took approx. 0.01152 seconds Launching asset export ... OK Exporting file ... OK export took approx. 0.15870 seconds assimp export: wrote output file: mesh_with_textures.obj Fixing texture references in MTL file... sed: can't read : No such file or directory Converting to VOX format with resolution 96... wine: created the configuration directory '/root/.wine' 0048:err:explorer:initialize_display_settings Failed to query current display settings for L"\\\\.\\DISPLAY1". 0048:err:ole:StdMarshalImpl_MarshalInterface Failed to create ifstub, hr 0x80004002 0048:err:ole:CoMarshalInterface Failed to marshal the interface {6d5140c1-7436-11ce-8034-00aa006009fa}, hr 0x80004002 0048:err:ole:apartment_get_local_server_stream Failed: 0x80004002 002c:err:winediag:nodrv_CreateWindow Application tried to create a window, but no driver could be loaded. 002c:err:winediag:nodrv_CreateWindow Make sure that your X server is running and that $DISPLAY is set correctly. 0050:err:winediag:nodrv_CreateWindow Application tried to create a window, but no driver could be loaded. 0050:err:winediag:nodrv_CreateWindow Make sure that your X server is running and that $DISPLAY is set correctly. 0050:err:ole:apartment_createwindowifneeded CreateWindow failed with error 0 0050:err:ole:apartment_createwindowifneeded CreateWindow failed with error 0 0050:err:ole:apartment_createwindowifneeded CreateWindow failed with error 14007 0050:err:ole:StdMarshalImpl_MarshalInterface Failed to create ifstub, hr 0x800736b7 0050:err:ole:CoMarshalInterface Failed to marshal the interface {6d5140c1-7436-11ce-8034-00aa006009fa}, hr 0x800736b7 0050:err:ole:apartment_get_local_server_stream Failed: 0x800736b7 0050:err:ole:start_rpcss Failed to open RpcSs service 0040:err:winediag:nodrv_CreateWindow Application tried to create a window, but no driver could be loaded. 0040:err:winediag:nodrv_CreateWindow Make sure that your X server is running and that $DISPLAY is set correctly. 0040:err:setupapi:SetupDefaultQueueCallbackW copy error 1812 L"@wineusb.sys,-1" -> L"C:\\windows\\inf\\wineusb.inf" Could not find Wine Gecko. HTML rendering will be disabled. 0098:err:winediag:nodrv_CreateWindow Application tried to create a window, but no driver could be loaded. 0098:err:winediag:nodrv_CreateWindow Make sure that your X server is running and that $DISPLAY is set correctly. Could not find Wine Gecko. HTML rendering will be disabled. wine: configuration in L"/root/.wine" has been updated. Reading mesh_with_textures.obj Reading mesh_with_textures.mtl.... found mesh_with_textures.mtl NOT FOUND or not supported format textures_img0.png loaded successfully Scale factor used (voxel/polygon units):48.865746 x:0..69, y:0..53, z:0..96 Writing tmp.vox (70x54x97) 0.26 seconds Done. Converting to MagicaVoxel format... Generating SVOX and GLTF... Assertion failed: RGBA length mismatch 65456:65452 Converting GLTF to GLB... Total: 58.206ms Conversion complete: output/mesh_large.vox, output/mesh_large.vox.gltf, and output/mesh_large.vox.glb Cleaning up intermediate files... Extracting textures from mesh.glb... param = textures Launching asset import ... OK Validating postprocessing flags ... OK 0 % 50 % Importing file ... OK import took approx. 0.00699 seconds assimp extract: Exporting 1 textures assimp extract: Texture 0 is compressed (png). Writing native file format. assimp extract: Wrote texture 0 to textures_img0.png Exporting to OBJ format... param = mesh_with_textures.obj param = -ptv assimp export: select file format: 'obj' (Wavefront OBJ format) Launching asset import ... OK Validating postprocessing flags ... OK 0 % 50 % 50 % 51.5152 % 53.0303 % 54.5455 % 56.0606 % 57.5758 % 59.0909 % 60.6061 % 62.1212 % 63.6364 % 65.1515 % 66.6667 % 68.1818 % 69.697 % 71.2121 % 72.7273 % 74.2424 % 75.7576 % 77.2727 % 78.7879 % 80.303 % 81.8182 % 83.3333 % 84.8485 % 86.3636 % 87.8788 % 89.3939 % 90.9091 % 92.4242 % 93.9394 % 95.4546 % 96.9697 % 98.4848 % 100 % Importing file ... OK import took approx. 0.00774 seconds Launching asset export ... OK Exporting file ... OK export took approx. 0.14839 seconds assimp export: wrote output file: mesh_with_textures.obj Fixing texture references in MTL file... sed: can't read : No such file or directory Converting to VOX format with resolution 80... 0050:err:explorer:initialize_display_settings Failed to query current display settings for L"\\\\.\\DISPLAY1". Reading mesh_with_textures.obj Reading mesh_with_textures.mtl.... found mesh_with_textures.mtl NOT FOUND or not supported format textures_img0.png loaded successfully Scale factor used (voxel/polygon units):40.712963 x:0..57, y:0..44, z:0..80 Writing tmp.vox (58x45x81) 0.27 seconds Done. Converting to MagicaVoxel format... Skipping SVOX, GLTF, and GLB generation as requested Conversion complete: output/mesh_medium.vox Cleaning up intermediate files... Extracting textures from mesh.glb... param = textures Launching asset import ... OK Validating postprocessing flags ... OK 0 % 50 % Importing file ... OK import took approx. 0.00719 seconds assimp extract: Exporting 1 textures assimp extract: Texture 0 is compressed (png). Writing native file format. assimp extract: Wrote texture 0 to textures_img0.png Exporting to OBJ format... param = mesh_with_textures.obj param = -ptv assimp export: select file format: 'obj' (Wavefront OBJ format) Launching asset import ... OK Validating postprocessing flags ... OK 0 % 50 % 50 % 51.5152 % 53.0303 % 54.5455 % 56.0606 % 57.5758 % 59.0909 % 60.6061 % 62.1212 % 63.6364 % 65.1515 % 66.6667 % 68.1818 % 69.697 % 71.2121 % 72.7273 % 74.2424 % 75.7576 % 77.2727 % 78.7879 % 80.303 % 81.8182 % 83.3333 % 84.8485 % 86.3636 % 87.8788 % 89.3939 % 90.9091 % 92.4242 % 93.9394 % 95.4546 % 96.9697 % 98.4848 % 100 % Importing file ... OK import took approx. 0.00741 seconds Launching asset export ... OK Exporting file ... OK export took approx. 0.14602 seconds assimp export: wrote output file: mesh_with_textures.obj Fixing texture references in MTL file... sed: can't read : No such file or directory Converting to VOX format with resolution 64... Reading mesh_with_textures.obj Reading mesh_with_textures.mtl.... found mesh_with_textures.mtl NOT FOUND or not supported format textures_img0.png loaded successfully Scale factor used (voxel/polygon units):32.560181 x:0..46, y:0..35, z:0..64 Writing tmp.vox (47x36x65) 0.24 seconds Done. Converting to MagicaVoxel format... Skipping SVOX, GLTF, and GLB generation as requested Conversion complete: output/mesh_small.vox Cleaning up intermediate files...
Version Details
- Version ID
740dd98e409d65e7a0fa02ec9bf549bd903b19a18a0cab02865d4615225c8ba5
- Version Created
- March 24, 2025