mingyan

SoundClone - Create Audio Task

Create an audio generation task with a voice cloning model ID

POST/kyyReactApiServer/v1/soundCloning/audios

SoundClone Create Audio Task

Create an audio generation task with the modelId returned by a voice cloning preview task. The preview task returns a modelId that must be used in this API. The modelId is valid for 3 days and cannot be used after it expires. When a valid modelId is used with this API for the first time, it is automatically converted to a permanent model and can then be used for audio generation permanently.

认证

获取 Key

All requests must include a Bearer token in the request header:

cURL

Authorization: Bearer {{key}}

Base URL

https://zcbservice.aizfw.cn/kyyReactApiServer

baseUrl is the shared prefix for all public APIs. The api field in the current page frontmatter shows the full endpoint. Use this baseUrl as the common prefix when reading or composing request paths.

Request Parameters

modelIdbodystringrequired

Voice model ID returned by the preview task query result.

contentTextbodystringrequired

Text content to generate audio from. The content must be shorter than 10000 characters.

To control pauses in the speech, insert <#x#> between characters. x is in seconds, supports 0.01-99.99, and can contain up to two decimal places.

soundVersionbodystring

Voice model version.

v1: Model 1, supports 24 languages
v2: Model 2, supports 40 languages

languagebodystring

Language type. Defaults to auto when omitted.

Supported by both v1 and v2: Chinese, Chinese,Yue, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi.

The following languages require soundVersion to be v2: Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans, auto.

Example: Chinese.

emotionbodystring

Emotion type. Defaults to neutral when omitted.

Supported values: happy, sad, angry, fearful, disgusted, surprised, neutral.

Example: happy.

speedbodyBigDecimal

Speech speed. Optional range: [0.5,2]. Defaults to 1.0 when omitted. A larger value means faster speech.

Example: 1.2.

volbodyBigDecimal

Volume. Optional range: (0,10]. Defaults to 1.0 when omitted. A larger value means louder audio.

Example: 2.5.

pitchbodyinteger

Pitch. Optional range: [-12,12]. Defaults to 0 when omitted. 0 outputs the original voice tone. The value must be an integer.

Example: 5.

subtitleEnablebodyboolean

Whether to generate subtitles. Defaults to false when omitted.

subtitleTypebodystring

Subtitle type. This parameter can be passed when subtitle generation is enabled.

Omitted: sentence-level subtitles
word: word-level subtitles

Response Parameters

idstring

Task ID, used to query the task status later.

objectstring

Object type, fixed as audio.

createdinteger

Task creation timestamp.

modelstring

Model name used by the task. For audio generation tasks, it is soundCloningAudio.

statusstring

Task status. It is usually queued after creation.

errorstring

Error message, returned when the task fails.