pseudoram/rvc-v2 🔢❓🖼️📝 → 🖼️

▶️ 1.5M runs 📅 Jun 2024 ⚙️ Cog 0.9.10 🔗 GitHub ⚖️ License

audio-to-audio speech-style-transfer voice-cloning voice-conversion

About

Speech to speech with any RVC v2 trained AI voice

Example Output

Output

Performance Metrics

8.01s Prediction Time

8.05s Total Time

All Input Parameters

{
  "protect": 0.5,
  "f0_method": "rmvpe",
  "rvc_model": "CUSTOM",
  "index_rate": 1,
  "input_audio": "https://replicate.delivery/pbxt/LAxQbQLiKJZevqiV9Raodpdd6W0ihu3Wnb1K6xCpE6rcUIu5/ttsMP3.com_VoiceText_2024-6-29_0-22-2.mp3",
  "pitch_change": 8,
  "rms_mix_rate": 1,
  "filter_radius": 1,
  "output_format": "mp3",
  "crepe_hop_length": 128,
  "custom_rvc_model_download_url": "https://huggingface.co/Argax/doofenshmirtz-RUS/resolve/main/doofenshmirtz.zip"
}

Input Parameters

protect Type: numberDefault: 0.33Range: 0 - 0.5: Control how much of the original vocals' breath and voiceless consonants to leave in the AI vocals. Set 0.5 to disable.
f0_method Default: rmvpe: Pitch detection algorithm. 'rmvpe' for clarity in vocals, 'mangio-crepe' for smoother vocals.
rvc_model Default: Obama: RVC model for a specific voice. If using a custom model, this should match the name of the downloaded model. If a 'custom_rvc_model_download_url' is provided, this will be automatically set to the name of the downloaded model.
index_rate Type: numberDefault: 0.5Range: 0 - 1: Control how much of the AI's accent to leave in the vocals.
input_audio Type: string: Upload your audio file here.
pitch_change Type: numberDefault: 0: Adjust pitch of AI vocals in semitones. Use positive values to increase pitch, negative to decrease.
rms_mix_rate Type: numberDefault: 0.25Range: 0 - 1: Control how much to use the original vocal's loudness (0) or a fixed loudness (1).
filter_radius Type: integerDefault: 3Range: 0 - 7: If >=3: apply median filtering to the harvested pitch results.
output_format Default: mp3: wav for best quality and large file size, mp3 for decent quality and small file size.
crepe_hop_length Type: integerDefault: 128: When `f0_method` is set to `mangio-crepe`, this controls how often it checks for pitch changes in milliseconds.
custom_rvc_model_download_url Type: string: URL to download a custom RVC model. If provided, the model will be downloaded (if it doesn't already exist) and used for prediction, regardless of the 'rvc_model' value.

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

[!] The model will be downloaded as 'doofenshmirtz'.
[!] Voice model directory doofenshmirtz already exists. Overwriting...
[*] Downloading model from https://huggingface.co/Argax/doofenshmirtz-RUS/resolve/main/doofenshmirtz.zip...
[*] Extracting model to /src/rvc_models/doofenshmirtz...
2024-06-28 15:49:59 | INFO | fairseq.tasks.hubert_pretraining | current directory is /src
2024-06-28 15:49:59 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2024-06-28 15:49:59 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
gin_channels: 256 self.spk_embed_dim: 109
<All keys matched successfully>
[+] Converted audio generated at /src/voice_output/converted_tmp_fayvycvttsMP3.com_VoiceText_2024-6-29_0-22-2.wav

Version Details

Version ID: d18e2e0a6a6d3af183cc09622cebba8555ec9a9e66983261fc64c8b1572b7dce
Version Created: June 28, 2024

Run on Replicate →