lucataco/qwen2.5-omni-7b 🖼️📝❓✓ → ❓
About
Qwen2.5-Omni is an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner.

Example Output
Output
{"text":"Oh, that's a really cool drawing! It looks like a guitar. You've got the body and the neck drawn in a simple yet effective way. The lines are clean and the shape is well-defined. What made you choose to draw a guitar?","voice":"https://replicate.delivery/xezq/xpYIpzUZvRIIONe0qe1CfAcq53wfz46ecayvhcPCRY3mcI7jC/output.wav"}
Performance Metrics
113.05s
Prediction Time
219.54s
Total Time
All Input Parameters
{ "video": "https://replicate.delivery/pbxt/MmJqxKbRSknHd9fwtTEbywWqDDdhgsx5tNYLIDnFqJ9j5ObC/draw.mp4", "voice_type": "Chelsie", "system_prompt": "You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech.", "generate_audio": true, "use_audio_in_video": true }
Input Parameters
- audio
- Optional audio input
- image
- Optional image input
- video
- Optional video input
- prompt
- Text prompt for the model
- voice_type
- Voice type for audio output
- system_prompt
- System prompt for the model
- generate_audio
- Whether to generate audio output
- use_audio_in_video
- Whether to use audio in video
Output Schema
- text
- Text
- voice
- Voice
Example Execution Logs
/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/qwen_omni_utils/v2_5/audio_process.py:50: UserWarning: PySoundFile failed. Trying audioread instead. audios.append(librosa.load(path, sr=16000)[0]) /root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/librosa/core/audio.py:184: FutureWarning: librosa.core.audio.__audioread_load Deprecated as of librosa version 0.10.0. It will be removed in librosa version 1.0. y, sr_native = __audioread_load(path, offset, duration, dtype) qwen-vl-utils using decord to read video. Setting `pad_token_id` to `eos_token_id`:8292 for open-end generation.
Version Details
- Version ID
0ca8160f7aaf85703a6aac282d6c79aa64d3541b239fa4c5c1688b10cb1faef1
- Version Created
- April 4, 2025