mingyan

SoundClone - Create Audio Task

Create an audio generation task with a voice cloning model ID

POST/kyyReactApiServer/v1/soundCloning/audios

SoundClone Create Audio Task

Create an audio generation task with the modelId returned by a voice cloning preview task. The preview task returns a modelId that must be used in this API. The modelId is valid for 3 days and cannot be used after it expires. When a valid modelId is used with this API for the first time, it is automatically converted to a permanent model and can then be used for audio generation permanently.
All requests must include a Bearer token in the request header:
cURL
Authorization: Bearer {{key}}

Base URL

https://zcbservice.aizfw.cn/kyyReactApiServer
baseUrl is the shared prefix for all public APIs. The api field in the current page frontmatter shows the full endpoint. Use this baseUrl as the common prefix when reading or composing request paths.

Request Parameters

modelIdbodystringrequired
Voice model ID returned by the preview task query result.
contentTextbodystringrequired
Text content to generate audio from. The content must be shorter than 10000 characters.
To control pauses in the speech, insert <#x#> between characters. x is in seconds, supports 0.01-99.99, and can contain up to two decimal places.
soundVersionbodystring
Voice model version.
  • v1: Model 1, supports 24 languages
  • v2: Model 2, supports 40 languages
languagebodystring
Language type. Defaults to auto when omitted.
Supported by both v1 and v2: Chinese, Chinese,Yue, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi.
The following languages require soundVersion to be v2: Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans, auto.
Example: Chinese.
emotionbodystring
Emotion type. Defaults to neutral when omitted.
Supported values: happy, sad, angry, fearful, disgusted, surprised, neutral.
Example: happy.
speedbodyBigDecimal
Speech speed. Optional range: [0.5,2]. Defaults to 1.0 when omitted. A larger value means faster speech.
Example: 1.2.
volbodyBigDecimal
Volume. Optional range: (0,10]. Defaults to 1.0 when omitted. A larger value means louder audio.
Example: 2.5.
pitchbodyinteger
Pitch. Optional range: [-12,12]. Defaults to 0 when omitted. 0 outputs the original voice tone. The value must be an integer.
Example: 5.
subtitleEnablebodyboolean
Whether to generate subtitles. Defaults to false when omitted.
subtitleTypebodystring
Subtitle type. This parameter can be passed when subtitle generation is enabled.
  • Omitted: sentence-level subtitles
  • word: word-level subtitles

Response Parameters

idstring
Task ID, used to query the task status later.
objectstring
Object type, fixed as audio.
createdinteger
Task creation timestamp.
modelstring
Model name used by the task. For audio generation tasks, it is soundCloningAudio.
statusstring
Task status. It is usually queued after creation.
errorstring
Error message, returned when the task fails.