AivisSpeech Engine
May 7, 2026 · View on GitHub
ð AivisSpeech Engine: AI Voice Imitation System - Text to Speech Engine
AivisSpeech Engine ã¯ãVOICEVOX ENGINE ãããŒã¹ã«ãããæ¥æ¬èªé³å£°åæãšã³ãžã³ã§ãã
æ¥æ¬èªé³å£°åæãœãããŠã§ã¢ã® AivisSpeech ã«çµã¿èŸŒãŸããŠãããããããã«ææ
è±ããªé³å£°ãçæã§ããŸãã
ð AivisSpeech ãããŠã³ããŒã ïŒ ð AivisSpeech Engine ãããŠã³ããŒã
Tip
ð Aivis Cloud API ããªãªãŒã¹ããŸããïŒ
LLM 飿ºã«æé©ã»æé 0.3 ç§ã®ãªã¢ã«ã¿ã€ã é³å£°åæ API ããŸãã¯ç¡æã§ã詊ãããã ããŸãïŒ
AivisSpeech Engine ã¯äžè¬ç㪠PC äžã§1人ã§äœ¿ãããšãæ³å®ããŠèšèšãããŠãããå€ãã®ãªã¯ãšã¹ããé«éã«æãå¿
èŠã®ãã API ãµãŒããŒçšéã«ã¯æé©åãããŠããŸããã
AivisSpeech Engine 㯠CPU ã®ã¿ã§ãé«éã«åäœãããããã« ONNX Runtime ããŒã¹ã§éçºãããŠããŸãããGPU ãµãŒããŒäžã§ã®çæé床ãã¹ã±ãŒã©ããªãã£ã«ã¯æ ¹æ¬çãªããã«ããã¯ããããŸãã
ãŸããVOICEVOX ENGINE ãšã® API äºææ§ãä¿ã€ãã¬ãŒããªããšããŠãæè¡çã«æ°æ©èœã®è¿œå ã仿§å€æŽãé£ãããAPI 仿§èªäœãåããã«ãããšãã課é¡ããããŸãã
Aivis Cloud API ã®å
éšã§ã¯ãGPU ãµãŒããŒã§å€§éã®é³å£°åæãªã¯ãšã¹ããæãããããã«ã¹ã¯ã©ããã§æ°èŠéçºããé³å£°åæ API ãµãŒããŒè£œåãCitorasããæŽ»çšããŠããŸãïŒ
é³å£°ã®çæå質ã¯ãã®ãŸãŸãAivisSpeech Engine ã«ã¯ãªãããšã³ã¿ãŒãã©ã€ãºåãã«æé©åãããå€åœ©ãªæ©èœãåããŠããŸãã
- ð å§åçãªåŠçé床 - NVIDIA RTX A4000 ã§ããã°æé 0.3 ç§ã30 ç§ã®é³å£°ã§ãæé 0.8 ç§ä»¥äžã§çæã§ããŸãã
- ð äœé å»¶ãªãªã¢ã«ã¿ã€ã ã¹ããªãŒãã³ã°ã«å¯Ÿå¿ - é³å£°çæãå®äºããåããåçãéå§ã§ãããªã¢ã«ã¿ã€ã AI ãã£ããã«ãããäœæé å»¶ãå€§å¹ ã«åæžã§ããŸãã
- ð§ 倿§ãªé³å£°ãã©ãŒãããã«å¯Ÿå¿ - WAV / FLAC / MP3 / AAC / Opus 圢åŒã«å¯Ÿå¿ãããªã¢ã«ã¿ã€ã ã¹ããªãŒãã³ã°ããé«é³è³ªãŸã§å¹ åºãçšéã«å¯Ÿå¿ããŸãã
- ð æ¢åã·ã¹ãã ãšã®é«ãèŠªåæ§ - SSML ã®ãµãã»ããã«å¯Ÿå¿ããŠãããGoogle Cloud Speech / Amazon Polly ãªã©ã®åŸæ¥åã®é³å£°åæãšã³ãžã³ããã®ç§»è¡ã³ã¹ããæå°åã§ããŸãã
- ð é«ç²ŸåºŠãªå
èµèŸæžã«å ãããã«ãããã³ã察å¿ã®ãŠãŒã¶ãŒèŸæž API æ©èœãå®å - ãµãŒããŒèµ·åäžãéæèªåæŽæ°ãããå
èµèŸæžã«å ããæ±äº¬åŒã¢ã¯ã»ã³ãã»åè©ã»åªå
床ãŸã§è©³çްã«èª¿æŽãããŠãŒã¶ãŒèŸæžã远å ã§ããããã«è€æ°ã®ãŠãŒã¶ãŒèŸæžã®äœ¿ãåããå¯èœã§ãã
- Aivis Cloud API ãããå©çšã®å ŽåãCloud API ããã·ã¥ããŒã) ãããŠãŒã¶ãŒèŸæžãç·šéå¯èœãªã»ããAivisSpeech ãããšã¯ã¹ããŒããããŠãŒã¶ãŒèŸæžããŒã¿ãã€ã³ããŒãã§ããŸããïŒ2025幎10æçŸåšéçºäžïŒ
- âïž S3 äºæã¹ãã¬ãŒãžãžã®ã¢ãã«é
眮ã«å¯Ÿå¿ - S3 äºæã¹ãã¬ãŒãžã«é
眮ããã AIVM åœ¢åŒ (.aivm) ã®ã¢ãã«ãã¡ã€ã«ãéæèªåèªèããã»ãã¥ã¢ã§ã¹ã±ãŒã©ãã«ãªã¢ãã«ç®¡çãå®çŸã§ããŸãã
- Aivis Cloud API ãããå©çšã®å ŽåãAivisHub ã«ã¢ãã«ãã¡ã€ã«ãéå ¬éã§ã¢ããããŒãããã ããšããæå ã®ãã©ã€ããŒãã¢ãã«è³ç£ãã¯ã©ãŠã API ã«ãŠã掻çšããã ããŸãïŒ
- ãã¡ãããAivisHub äžã§å ¬éãããŠããé³å£°åæã¢ãã«ããå©çšããã ããŸãïŒåã¢ãã«ã®ã©ã€ã»ã³ã¹æ¡ä»¶ãé©çšãããŸãïŒã
- ðŸ éãããããŒããŠã§ã¢ãªãœãŒã¹ãæå€§éã«æŽ»çš - GPU VRAMã»CPU RAMã»SSD ã®3éå±€æŠç¥ãš LRU æé©åã«ãããVRAM 容éãè¶ ãã倧éã®é³å£°åæã¢ãã«ã1å°ã® GPU ãµãŒããŒã§éçšã§ããŸãã
- ð³ ãšã³ã¿ãŒãã©ã€ãºåãã®å ç¢ãªéçšåºç€ - Docker ã§ã®ãããã€ãåæãšããèšèšã§ãè€æ°ã® API ããŒã«ããã¢ã¯ã»ã¹å¶åŸ¡ããµãŒããŒè² è·ã»çµ±èšæ å ±ã®ç£èŠã®ããã® API ãå®åããŠããŸãã
èªç€ŸãµãŒããŒéçšãå¿
èŠãªæ³äººæ§ã«ã¯ããã®ãCitorasãããªã³ãã¬ãã¹ç°å¢åãã®é³å£°åæ API ãµãŒããŒè£œåãšããŠãæé¡å¶ã«ãŠæäŸããŠãããŸãã
ãèå³ã®ããæ³äººæ§ã¯ãã² ãåãåãããã©ãŒã ãããçžè«ãã ããïŒð
- ãŠãŒã¶ãŒã®æ¹ãž
- åäœç°å¢
- ãµããŒããããŠããé³å£°åæã¢ãã«
- å°å ¥æ¹æ³
- é³å£°åæ API ã䜿ã
- VOICEVOX API ãšã®äºææ§ã«ã€ããŠ
- ãããã質å / Q&A
- éçºæ¹é
- éçºç°å¢ã®æ§ç¯
- éçº
- ã©ã€ã»ã³ã¹
ãŠãŒã¶ãŒã®æ¹ãž
AivisSpeech ã®äœ¿ãæ¹ããæ¢ãã®æ¹ã¯ãAivisSpeech å ¬åŒãµã€ã ãã芧ãã ããã
ãã®ããŒãžã§ã¯ãäž»ã«éçºè
åãã®æ
å ±ãæ²èŒããŠããŸãã
以äžã¯ãŠãŒã¶ãŒã®æ¹åãã®ããã¥ã¡ã³ãã§ãã
åäœç°å¢
Windowsã»macOSã»Linux æèŒã® PC ã«å¯Ÿå¿ããŠããŸãã
AivisSpeech Engine ãèµ·åããã«ã¯ãPC ã« 1.5GB 以äžã®ç©ºãã¡ã¢ãª (RAM) ãå¿
èŠã§ãã
- Windows: Windows 10 (22H2 以é)ã»Windows 11
- macOS: macOS 13 Ventura 以é
- Linux: Ubuntu 20.04 以é
Tip
ãã¹ã¯ãããã¢ããªã§ãã AivisSpeech ã¯ãWindowsã»macOS ã®ã¿ãµããŒã察象ãšããŠããŸãã
äžæ¹ãé³å£°åæ API ãµãŒããŒã§ãã AivisSpeech Engine ã¯ãUbuntu / Debian 系㮠Linux ã§ãå©çšã§ããŸãã
Note
Intel CPU æèŒ Mac ã§ã®åäœã¯ç©æ¥µçã«æ€èšŒããŠããŸããã
Intel CPU æèŒ Mac ã¯ãã§ã«è£œé ãçµäºããŠãããæ€èšŒç°å¢ããã«ãç°å¢ã®çšæèªäœãé£ãããªã£ãŠããŠããŸãããªãã¹ã Apple Silicon æèŒ Mac ã§ã®å©çšãããããããããŸãã
Warning
Windows 10 ã§ã¯ãããŒãžã§ã³ 22H2 ã§ã®åäœç¢ºèªã®ã¿è¡ã£ãŠããŸãã
ãµããŒããçµäºãã Windows 10 ã®å€ãããŒãžã§ã³ã LTSC (Long Term Servicing Channel) çã® Windows 10 ã§ã¯ãAivisSpeech Engine ãã¯ã©ãã·ã¥ãèµ·åã«å€±æããäºäŸãå ±åãããŠããŸãã
ã»ãã¥ãªãã£äžã®èгç¹ããããWindows 10 ç°å¢ã®æ¹ã¯ãæäœéããŒãžã§ã³ 22H2 ãŸã§æŽæ°ããŠããã®å©çšã匷ãããããããããŸãã
ãµããŒããããŠããé³å£°åæã¢ãã«
AivisSpeech Engine ã¯ãAIVMX (Aivis Voice Model for ONNX) (æ¡åŒµå .aivmx) ãã©ãŒãããã®é³å£°åæã¢ãã«ãã¡ã€ã«ããµããŒãããŠããŸãã
AIVM (Aivis Voice Model) / AIVMX (Aivis Voice Model for ONNX) ã¯ãåŠç¿æžã¿ã¢ãã«ã»ãã€ããŒãã©ã¡ãŒã¿ã»ã¹ã¿ã€ã«ãã¯ãã«ã»è©±è ã¡ã¿ããŒã¿ïŒååã»æŠèŠã»ã©ã€ã»ã³ã¹ã»ã¢ã€ã³ã³ã»ãã€ã¹ãµã³ãã« ãªã©ïŒã 1 ã€ã®ãã¡ã€ã«ã«ã®ã¥ããšãŸãšãããAI é³å£°åæã¢ãã«çšãªãŒãã³ãã¡ã€ã«ãã©ãŒãããã§ãã
AIVM 仿§ã AIVM / AIVMX ãã¡ã€ã«ã«ã€ããŠã®è©³çްã¯ãAivis Project ã«ãŠçå®ãã AIVM 仿§ ããåç §ãã ããã
Note
ãAIVMãã¯ãAIVM / AIVMX äž¡æ¹ã®ãã©ãŒããã仿§ã»ã¡ã¿ããŒã¿ä»æ§ã®ç·ç§°ã§ããããŸãã
å
·äœçã«ã¯ãAIVM ãã¡ã€ã«ã¯ãAIVM ã¡ã¿ããŒã¿ã远å ãã Safetensors 圢åŒããAIVMX ãã¡ã€ã«ã¯ãAIVM ã¡ã¿ããŒã¿ã远å ãã ONNX 圢åŒãã®ã¢ãã«ãã¡ã€ã«ã§ãã
ãAIVM ã¡ã¿ããŒã¿ããšã¯ãAIVM 仿§ã«å®çŸ©ãããŠãããåŠç¿æžã¿ã¢ãã«ã«çŽã¥ãåçš®ã¡ã¿ããŒã¿ã®ããšããããŸãã
Important
AivisSpeech Engine 㯠AIVM 仿§ã®ãªãã¡ã¬ã³ã¹å®è£
ã§ããããŸãããæ¢ã㊠AIVMX ãã¡ã€ã«ã®ã¿ããµããŒãããèšèšãšããŠããŸãã
ããã«ãããPyTorch ãžã®äŸåãæé€ããŠã€ã³ã¹ããŒã«ãµã€ãºãåæžããONNX Runtime ã«ããé«é㪠CPU æšè«ãå®çŸããŠããŸãã
Tip
AIVM Generator ã䜿ããšãæ¢åã®é³å£°åæã¢ãã«ãã AIVM / AIVMX ãã¡ã€ã«ãçæããããæ¢åã® AIVM / AIVMX ãã¡ã€ã«ã®ã¡ã¿ããŒã¿ãç·šéãããã§ããŸãïŒ
察å¿ã¢ãã«ã¢ãŒããã¯ãã£
以äžã®ã¢ãã«ã¢ãŒããã¯ãã£ã® AIVMX ãã¡ã€ã«ãå©çšã§ããŸãã
Style-Bert-VITS2Style-Bert-VITS2 (JP-Extra)
Note
AIVM ã¡ã¿ããŒã¿ã®ä»æ§äžã¯å€èšèªå¯Ÿå¿ã®è©±è
ãå®çŸ©ã§ããŸãããAivisSpeech Engine 㯠VOICEVOX ENGINE ãšåæ§ã«ãæ¥æ¬èªé³å£°åæã®ã¿ã«å¯Ÿå¿ããŠããŸãã
ãã®ãããè±èªãäžåœèªã«å¯Ÿå¿ããé³å£°åæã¢ãã«ã§ãã£ãŠããæ¥æ¬èªä»¥å€ã®é³å£°åæã¯ã§ããŸããã
ã¢ãã«ãã¡ã€ã«ã®é çœ®å Žæ
AIVMX ãã¡ã€ã«ã¯ãOS ããšã«ä»¥äžã®ãã©ã«ãã«é 眮ããŠãã ããã
- Windows:
C:\Users\(ãŠãŒã¶ãŒå)\AppData\Roaming\AivisSpeech-Engine\Models - macOS:
~/Library/Application Support/AivisSpeech-Engine/Models - Linux:
~/.local/share/AivisSpeech-Engine/Models
å®éã®ãã©ã«ããã¹ã¯ãAivisSpeech Engine ã®èµ·åçŽåŸã®ãã°ã« Models directory: ãšããŠè¡šç€ºãããŸãã
Tip
AivisSpeech å©çšæã¯ãAivisSpeech ã® UI ç»é¢ããç°¡åã«é³å£°åæã¢ãã«ã远å ã§ããŸãïŒ
ãšã³ããŠãŒã¶ãŒã®æ¹ã¯ãåºæ¬çã«ãã¡ãã®æ¹æ³ã§é³å£°åæã¢ãã«ã远å ããããšãããããããŸãã
Important
éçºç (PyInstaller ã§ãã«ããããŠããªãç¶æ
ã§å®è¡ããŠããå Žå) ã®é
眮ãã©ã«ãã¯ãAivisSpeech-Engine 以äžã§ã¯ãªã AivisSpeech-Engine-Dev 以äžãšãªããŸãã
å°å ¥æ¹æ³
AivisSpeech Engine ã§ã¯ã以äžã®ãããªäŸ¿å©ãªã³ãã³ãã©ã€ã³ãªãã·ã§ã³ãå©çšã§ããŸãïŒ
--host 0.0.0.0ãæå®ãããšãåäžãããã¯ãŒã¯å ã®ä»ã®ç«¯æ«ããã AivisSpeech Engine ãžã¢ã¯ã»ã¹ã§ããããã«ãªããŸãã--cors_policy_mode allãæå®ãããšããã¹ãŠã®ãã¡ã€ã³ããã® CORS ãªã¯ãšã¹ããèš±å¯ããŸãã--load_all_modelsãæå®ãããšãAivisSpeech Engine ã®èµ·åæã«ãã€ã³ã¹ããŒã«ãããŠãããã¹ãŠã®é³å£°åæã¢ãã«ãäºåã«ããŒãããŸãã--helpãæå®ãããšãå©çšå¯èœãªãã¹ãŠã®ãªãã·ã§ã³ã®äžèЧãšèª¬æã衚瀺ããŸãã
ãã®ä»ã«ãå€ãã®ãªãã·ã§ã³ãçšæãããŠããŸãã詳现㯠--help ãªãã·ã§ã³ã§ã確èªãã ããã
Tip
--use_gpu ãªãã·ã§ã³ãä»ããŠå®è¡ãããšãWindows ã§ã¯ DirectML ãLinux ã§ã¯ NVIDIA GPU (CUDA) ãæŽ»çšããé«éã«é³å£°åæãè¡ããŸãã
ãªããWindows ç°å¢ã§ã¯ CPU å
èµã® GPU (iGPU) ã®ã¿ã® PC ã§ã DirectML æšè«ãè¡ããŸãããã»ãšãã©ã®å Žå CPU æšè«ãããããªãé
ããªã£ãŠããŸããããããããã§ããŸããã
詳现㯠ãããã質å ãåç
§ããŠãã ããã
Note
AivisSpeech Engine ã¯ãããã©ã«ãã§ã¯ããŒãçªå· 10101 ã§åäœããŸãã
ä»ã®ã¢ããªã±ãŒã·ã§ã³ãšç«¶åããå Žåã¯ã--port ãªãã·ã§ã³ã§ä»»æã®ããŒãçªå·ã«å€æŽã§ããŸãã
Warning
VOICEVOX ENGINE ãšç°ãªããäžéšã®ãªãã·ã§ã³ã¯ AivisSpeech Engine ã§ã¯æªå®è£ ã§ãã
Windows / macOS
Windows / macOS ã§ã¯ãAivisSpeech Engine ãåç¬ã§ã€ã³ã¹ããŒã«ããããšãã§ããŸãããAivisSpeech æ¬äœã«ä»å±ãã AivisSpeech Engine ãåç¬ã§èµ·åãããæ¹ãããç°¡åã§ãã
AivisSpeech ã«å梱ãããŠãã AivisSpeech Engine ã®å®è¡ãã¡ã€ã« (run.exe / run) ã®ãã¹ã¯ä»¥äžã®ãšããã§ãã
- Windows:
C:\Program Files\AivisSpeech\AivisSpeech-Engine\run.exe- ãŠãŒã¶ãŒæš©éã§ã€ã³ã¹ããŒã«ãããŠããå Žåã
C:\Users\(ãŠãŒã¶ãŒå)\AppData\Local\Programs\AivisSpeech\AivisSpeech-Engine\run.exeãšãªããŸãã
- ãŠãŒã¶ãŒæš©éã§ã€ã³ã¹ããŒã«ãããŠããå Žåã
- macOS:
/Applications/AivisSpeech.app/Contents/Resources/AivisSpeech-Engine/run- ãŠãŒã¶ãŒæš©éã§ã€ã³ã¹ããŒã«ãããŠããå Žåã
~/Applications/AivisSpeech.app/Contents/Resources/AivisSpeech-Engine/runãšãªããŸãã
- ãŠãŒã¶ãŒæš©éã§ã€ã³ã¹ããŒã«ãããŠããå Žåã
Note
ååèµ·åæã¯ããã©ã«ãã¢ãã« (çŽ 250MB) ãšæšè«æã«å¿
èŠãª BERT ã¢ãã« (çŽ 650MB) ãèªåçã«ããŠã³ããŒããããé¢ä¿ã§ãèµ·åå®äºãŸã§æå€§æ°åã»ã©ããããŸãã
èµ·åå®äºãŸã§ãã°ãããåŸ
ã¡ãã ããã
AivisSpeech Engine ã«é³å£°åæã¢ãã«ã远å ããã«ã¯ãã¢ãã«ãã¡ã€ã«ã®é
çœ®å Žæ ãã芧ãã ããã
AivisSpeech å
ã®ãèšå®ãâãé³å£°åæã¢ãã«ã®ç®¡çããã远å ããããšãå¯èœã§ãã
Linux
Linux + NVIDIA GPU ç°å¢ã§å®è¡ããéã¯ãONNX Runtime ã察å¿ãã CUDA / cuDNN ããŒãžã§ã³ãšãã¹ãç°å¢ã® CUDA / cuDNN ããŒãžã§ã³ãäžèŽããŠããå¿
èŠããããåäœæ¡ä»¶ãå³ããã§ãã
å
·äœçã«ã¯ãAivisSpeech Engine ã§å©çšããŠãã ONNX Runtime 㯠CUDA 12.x / cuDNN 9.x 以äžãèŠæ±ããŸãã
Docker ã§ããã°ãã¹ã OS ã®ç°å¢ã«é¢ãããåäœããŸãã®ã§ãDocker ã§ã®å°å ¥ãããããããŸãã
Linux + Docker
Docker ã³ã³ãããå®è¡ããéã¯ãåžžã«ãã¹ãåŽã® ~/.local/share/AivisSpeech-Engine ãã³ã³ããå
ã® /home/user/.local/share/AivisSpeech-Engine-Dev ã«ããŠã³ãããŠãã ããã
ããããããšã§ãã³ã³ããã忢ã»åèµ·åããåŸã§ããã€ã³ã¹ããŒã«ããé³å£°åæã¢ãã«ã BERT ã¢ãã«ãã£ãã·ã¥ (çŽ 650MB) ãç¶æã§ããŸãã
Docker ç°å¢ã® AivisSpeech Engine ã«é³å£°åæã¢ãã«ã远å ããã«ã¯ããã¹ãç°å¢ã® ~/.local/share/AivisSpeech-Engine/Models 以äžã«ã¢ãã«ãã¡ã€ã« (.aivmx) ãé
眮ããŠãã ããã
Important
å¿
ã /home/user/.local/share/AivisSpeech-Engine-Dev ã«å¯ŸããŠããŠã³ãããŠãã ããã
Docker ã€ã¡ãŒãžäžã® AivisSpeech Engine 㯠PyInstaller ã§ãã«ããããŠããªããããããŒã¿ãã©ã«ãåã«ã¯ -Dev ã® Suffix ãä»äžãã AivisSpeech-Engine-Dev ãšãªããŸãã
Important
Docker ã³ã³ããäžã® AivisSpeech Engine ã¯ãã»ãã¥ãªãã£ã®ããäžè¬ãŠãŒã¶ãŒæš©éã§åäœããŸãã
ãã®ãããã³ã³ãããå®è¡ããåã«ãããããããã¹ãåŽã« ~/.local/share/AivisSpeech-Engine ãã£ã¬ã¯ããªãäœæããææãŠãŒã¶ãŒãå®è¡ãŠãŒã¶ãŒïŒé垞㯠uid=1000ïŒã«èšå®ããŠããå¿
èŠããããŸãã
çŸåšã®ãŠãŒã¶ãŒ ID 㯠id ã³ãã³ãã§ç¢ºèªã§ããŸãã
CPU ã§å®è¡ãã
docker pull ghcr.io/aivis-project/aivisspeech-engine:cpu-latest
docker run --rm -p '10101:10101' \
-v ~/.local/share/AivisSpeech-Engine:/home/user/.local/share/AivisSpeech-Engine-Dev \
ghcr.io/aivis-project/aivisspeech-engine:cpu-latest
NVIDIA GPU (CUDA) ã§å®è¡ãã
docker pull ghcr.io/aivis-project/aivisspeech-engine:nvidia-latest
docker run --rm --gpus all -p '10101:10101' \
-v ~/.local/share/AivisSpeech-Engine:/home/user/.local/share/AivisSpeech-Engine-Dev \
ghcr.io/aivis-project/aivisspeech-engine:nvidia-latest
é³å£°åæ API ã䜿ã
Bash ã§ä»¥äžã®ã¯ã³ã©ã€ããŒãå®è¡ãããšãaudio.wav ã«é³å£°åæãã WAV ãã¡ã€ã«ãåºåãããŸãã
Important
äºåã« AivisSpeech Engine ãèµ·åããŠããŠããã€ãã°ã«è¡šç€ºããã Models directory: 以äžã®ãã£ã¬ã¯ããªã«ãã¹ã¿ã€ã« ID ã«å¯Ÿå¿ããé³å£°åæã¢ãã« (.aivmx) ãæ ŒçŽãããŠããããšãåæã§ãã
# STYLE_ID ã¯é³å£°åæå¯Ÿè±¡ã®ã¹ã¿ã€ã« ID ãå¥é /speakers API ããååŸãå¿
èŠ
STYLE_ID=888753760 && \
echo -n "ããã«ã¡ã¯ãé³å£°åæã®äžçãžããããïŒ" > text.txt && \
curl -s -X POST "127.0.0.1:10101/audio_query?speaker=$STYLE_ID" --get --data-urlencode text@text.txt > query.json && \
curl -s -H "Content-Type: application/json" -X POST -d @query.json "127.0.0.1:10101/synthesis?speaker=$STYLE_ID" > audio.wav && \
rm text.txt query.json
Tip
詳ãã API ãªã¯ãšã¹ãã»ã¬ã¹ãã³ã¹ä»æ§ã¯ API ããã¥ã¡ã³ã ã VOICEVOX API ãšã®äºææ§ã«ã€ã㊠ããåç §ãã ãããAPI ããã¥ã¡ã³ãã§ã¯ãææ°ã®éçºçã§ã®å€æŽãéæåæ ããŠããŸãã
èµ·åäžã® AivisSpeech Engine ã® API ããã¥ã¡ã³ã (Swagger UI) ã¯ãAivisSpeech Engine ããã㯠AivisSpeech ãšãã£ã¿ãèµ·åããç¶æ ã§ãhttp://127.0.0.1:10101/docs ã«ã¢ã¯ã»ã¹ãããšç¢ºèªã§ããŸãã
VOICEVOX API ãšã®äºææ§ã«ã€ããŠ
AivisSpeech Engine ã¯ãæŠã VOICEVOX ENGINE ã® HTTP API ãšäºææ§ããããŸãã
VOICEVOX ENGINE ã® HTTP API ã«å¯Ÿå¿ãããœãããŠã§ã¢ã§ããã°ãAPI URL ã http://127.0.0.1:10101 ã«å·®ãæ¿ããã ãã§ãAivisSpeech Engine ã«å¯Ÿå¿ã§ããã¯ãã§ãã
Important
ãã ããAPI ã¯ã©ã€ã¢ã³ãåŽã§ /audio_query API ããååŸãã AudioQuery ã®å
容ãç·šéããŠãã /synthesis API ã«æž¡ããŠããå Žåã¯ã仿§å·®ç°ã«ããæ£åžžã«é³å£°åæã§ããªãå ŽåããããŸã (åŸè¿°) ã
ãã®é¢ä¿ã§ãAivisSpeech ãšãã£ã¿ã¯ AivisSpeech Engine ãš VOICEVOX ENGINE ã®äž¡æ¹ãå©çšã§ããŸããïŒãã«ããšã³ãžã³æ©èœå©çšæïŒãVOICEVOX ãšãã£ã¿ãã AivisSpeech Engine ãå©çšããããšã¯ã§ããŸããã
VOICEVOX ãšãã£ã¿ã§ AivisSpeech Engine ãå©çšãããšããšãã£ã¿ã®å®è£
äžã®å¶éã«ããé³å£°åæã®å質ãèããäœäžããŸããAivisSpeech Engine ç¬èªã®ãã©ã¡ãŒã¿ã掻çšã§ããªããªãã»ããéå¯Ÿå¿æ©èœã®åŒã³åºãã§ãšã©ãŒãçºçããå¯èœæ§ããããŸãã
ããè¯ãé³å£°åæçµæãåŸããããAivisSpeech ãšãã£ã¿ã§ã®å©çšã匷ãããããããŸãã
Note
äžè¬ç㪠API ãŠãŒã¹ã±ãŒã¹ã«ãããŠã¯æŠãäºææ§ãããã¯ãã§ãããæ ¹æ¬çã«ç°ãªãã¢ãã«ã¢ãŒããã¯ãã£ã®é³å£°åæã·ã¹ãã ã匷åŒã«åäžã® API 仿§ã«åããŠããé¢ä¿ã§ãäžèšä»¥å€ã«ãäºææ§ã®ãªã API ããããããããŸããã
Issue ã«ãŠå ±åé ããã°ãäºææ§æ¹åãå¯èœãªãã®ã«é¢ããŠã¯ä¿®æ£ããããŸãã
VOICEVOX ENGINE ããã® API 仿§ã®å€æŽç¹ã¯æ¬¡ã®ãšããã§ãã
AivisSpeech Engine ã«ãããã¹ã¿ã€ã« ID
AIVMX ãã¡ã€ã«ã«å«ãŸãã AIVM ãããã§ã¹ãå
ã®è©±è
ã¹ã¿ã€ã«ã®ããŒã«ã« ID ã¯ã話è
ããšã« 0 ããå§ãŸãé£çªã§ç®¡çãããŠããŸãã
Style-Bert-VITS2 ã¢ãŒããã¯ãã£ã®é³å£°åæã¢ãã«ã§ã¯ããã®å€ã¯ã¢ãã«ã®ãã€ããŒãã©ã¡ãŒã¿ data.style2id ã®å€ãšäžèŽããŸãã
äžæ¹ãVOICEVOX ENGINE ã® API ã§ã¯ãæŽå²ççµç·¯ãããã話è
UUIDã(speaker_uuid) ãæå®ããããã¹ã¿ã€ã« IDã(style_id) ã®ã¿ãé³å£°åæ API ã«æž¡ã仿§ãšãªã£ãŠããŸãã
VOICEVOX ENGINE ã§ã¯æèŒãããŠãã話è
ãã¹ã¿ã€ã«ã¯åºå®ã®ãããéçºåŽã§ãã¹ã¿ã€ã« IDããäžæã«ç®¡çã§ããŠããŸããã
äžæ¹ãAivisSpeech Engine ã§ã¯ããŠãŒã¶ãŒãèªç±ã«é³å£°åæã¢ãã«ã远å ã§ãã仿§ãšãªã£ãŠããŸãã
ãã®ãããVOICEVOX API äºæã®ãã¹ã¿ã€ã« IDãã¯ãã©ã®ãããªé³å£°åæã¢ãã«ã远å ãããŠãäžæãªå€ã§ããå¿
èŠããããŸãã
ããã¯ãäžæãªå€ã§ãªãå Žåãæ°ããé³å£°åæã¢ãã«ã远å ããéã«æ¢åã®ã¢ãã«ã«å«ãŸãã話è
ã¹ã¿ã€ã«ãšã¹ã¿ã€ã« ID ãéè€ããŠããŸãå¯èœæ§ãããããã§ãã
ããã§ AivisSpeech Engine ã§ã¯ãAIVM ãããã§ã¹ãäžã®è©±è
UUID ãšã¹ã¿ã€ã« ID ãçµã¿åãããŠãVOICEVOX API äºæã®ã°ããŒãã«ã«äžæãªãã¹ã¿ã€ã« IDããçæããŠããŸãã
å
·äœçãªçææ¹æ³ã¯ä»¥äžã®ãšããã§ãã
- 話è UUID ã MD5 ããã·ã¥å€ã«å€æãã
- ãã®ããã·ã¥å€ã®äžäœ 27bit ãšããŒã«ã«ã¹ã¿ã€ã« ID ã® 5bit (0 ~ 31) ãçµã¿åããã
- 32bit 笊å·ä»ãæŽæ°ã«å€æãã
Warning
ãã®é¢ä¿ã§ããã¹ã¿ã€ã« IDãã« 32bit 笊å·ä»ãæŽæ°ãå
¥ãããšãæ³å®ããŠããªã VOICEVOX API 察å¿ãœãããŠã§ã¢ã§ã¯ãäºæãã¬äžå
·åãçºçããå¯èœæ§ããããŸãã
Warning
32bit 笊å·ä»ãæŽæ°ã®ç¯å²ã«åããããã«è©±è
UUID ã®ã°ããŒãã«ãªäžææ§ãç ç²ã«ããŠãããããæ¥µããŠäœã確çã§ãããç°ãªã話è
ã®ã¹ã¿ã€ã« ID ãéè€ïŒè¡çªïŒããå¯èœæ§ããããŸãã
çŸæç¹ã§ã¹ã¿ã€ã« ID ãéè€ããéã®åé¿çã¯ãããŸããããçŸå®çã«ã¯ã»ãšãã©ã®ã±ãŒã¹ã§åé¡ã«ãªããªããšèããããŸãã
Tip
AivisSpeech Engine ã«ãã£ãŠèªåçæããã VOICEVOX API äºæã®ãã¹ã¿ã€ã« IDãã¯ã/speakers API ããååŸã§ããŸãã
ãã® API ã¯ãAivisSpeech Engine ã«ã€ã³ã¹ããŒã«ãããŠãã話è
æ
å ±ã®äžèЧãè¿ããŸãã
AudioQuery åã®ä»æ§å€æŽ
AudioQuery åã¯ãããã¹ããé³çŽ åãæå®ããŠé³å£°åæãè¡ãããã®ã¯ãšãªã§ãã
VOICEVOX ENGINE ã® AudioQuery åããã®äž»ãªå€æŽç¹ã¯ä»¥äžã®ãšããã§ãã
intonationScaleãã£ãŒã«ãã®æå³ãç°ãªããŸãã- VOICEVOX ENGINE ã§ã¯ãå šäœã®ææãã衚ããã©ã¡ãŒã¿ã§ããããAivisSpeech Engine ã§ã¯ã話è ã¹ã¿ã€ã«ã®ææ 衚çŸã®åŒ·ããã衚ããã©ã¡ãŒã¿ãšãªã£ãŠããŸãã
- éžæãã話è ã¹ã¿ã€ã«ã®ææ 衚çŸã®åŒ·åŒ±ã 0.0 ~ 2.0 ã®ç¯å²ã§æå®ããŸã (ããã©ã«ã: 1.0) ã
- æ°å€ã倧ããã»ã©ãéžæãã話è
ã¹ã¿ã€ã«ã«è¿ãææ
衚çŸã蟌ãããã声ã«ãªããŸãã
- äŸãã°è©±è ã¹ã¿ã€ã«ããäžæ©å«ããªããæ°å€ã倧ããã»ã©ããå¬ããããªæããè©±ãæ¹ã«ãªããŸãã
- äžæ¹ã§ã話è ãã¹ã¿ã€ã«ã«ãã£ãŠã¯ãæ°å€ãäžãããããšçºå£°ããããããªã£ãããæ£èªã¿ã§äžèªç¶ãªå£°ã«ãªãå ŽåããããŸãã
- æ£ããçºå£°ã§ããäžéå€ã¯è©±è ãã¹ã¿ã€ã«ããšã«ç°ãªããŸããå¿ èŠã«å¿ããŠæé©ãªå€ãèŠã€ããŠèª¿æŽããŠã¿ãŠãã ããã
- å šã¹ã¿ã€ã«ã®å¹³åã§ããããŒãã«ã¹ã¿ã€ã«ã§ã¯èªåã§é©åãªææ 衚çŸãéžæãããããããã®å€ãæå®ããŠãç¡èŠãããŸãã
- Style-Bert-VITS2 ã«ããããã¹ã¿ã€ã«ã®åŒ·ãããã©ã¡ãŒã¿ã¯ãAivisSpeech Engine ã®
intonationScaleã«å€æãããéã«ä»¥äžã®ããã«å€æãããŸããintonationScaleã 0.0 ~ 1.0 ã®å ŽåãStyle-Bert-VITS2 ã§ã¯ 0.0 ~ 1.0 ã®ç¯å²ã«çžåœããŸããintonationScaleã 1.0 ~ 2.0 ã®å ŽåãStyle-Bert-VITS2 ã§ã¯ 1.0 ~ 10.0 ã®ç¯å²ã«çžåœããŸãã
tempoDynamicsScaleãã£ãŒã«ããç¬èªã«è¿œå ãããŸããã- AivisSpeech Engine åºæã®ãã©ã¡ãŒã¿ã§ãã話ãéãã®ç·©æ¥ã®åŒ·åŒ±ã 0.0 ~ 2.0 ã®ç¯å²ã§æå®ã§ããŸãïŒããã©ã«ã: 1.0ïŒã
- å€ã倧ããã»ã©ãããæ©å£ã§çã£ãœãææãã€ãã声ã«ãªããŸãã
- Style-Bert-VITS2 ã«ãããããã³ãã®ç·©æ¥ããã©ã¡ãŒã¿ã¯ãAivisSpeech Engine ã®
tempoDynamicsScaleã«å€æãããéã«ä»¥äžã®ããã«å€æãããŸããtempoDynamicsScaleã 0.0 ~ 1.0 ã®å ŽåãStyle-Bert-VITS2 ã§ã¯ 0.0 ~ 0.2 ã®ç¯å²ã«çžåœããŸããtempoDynamicsScaleã 1.0 ~ 2.0 ã®å ŽåãStyle-Bert-VITS2 ã§ã¯ 0.2 ~ 1.0 ã®ç¯å²ã«çžåœããŸãã
pitchScaleãã£ãŒã«ãã®ä»æ§ãç°ãªããŸãã- VOICEVOX ENGINE ãšç°ãªãããã®å€ã 0.0 ãã倿Žãããšé³è³ªãå£åããå¯èœæ§ããããŸãã
pauseLengthããã³pauseLengthScaleãã£ãŒã«ãã¯ãµããŒããããŠããŸããã- äºææ§ã®ãããã£ãŒã«ããšããŠååšã¯ããŸãããåžžã«ç¡èŠãããŸãã
kanaãã£ãŒã«ãã®ä»æ§ãç°ãªããŸãã- VOICEVOX ENGINE ã§ã¯ AquesTalk é¢šèšæ³ããã¹ããå ¥ãèªã¿åãå°çšãã£ãŒã«ãã§ããããAivisSpeech Engine ã§ã¯éåžžã®èªã¿äžãããã¹ããæå®ãããã£ãŒã«ããšããŠå©çšããŠããŸãã
- null ã空æååãæå®ãããå Žåã¯ãã¢ã¯ã»ã³ãå¥ããèªåçæãããã²ãããªæååãèªã¿äžãããã¹ããšãªããŸãããäžèªç¶ãªã€ã³ãããŒã·ã§ã³ã«ãªãå¯èœæ§ããããŸãã
- ããèªç¶ãªé³å£°åæçµæãåŸããããå¯èœãªéãéåžžã®èªã¿äžãããã¹ããæå®ããããšãæšå¥šããŸãã
倿Žç¹ã®è©³çްã¯ãmodel.py ãåç §ããŠãã ããã
Mora åã®ä»æ§å€æŽ
Mora åã¯ãèªã¿äžãããã¹ãã®ã¢ãŒã©ã衚ãããŒã¿æ§é ã§ãã
Tip
ã¢ãŒã©ãšã¯ãå®éã«çºé³ãããéã®é³ã®ãŸãšãŸãã®æå°åäœïŒããããããããããªã©ïŒã®ããšã§ãã
Mora ååç¬ã§ API ãªã¯ãšã¹ãã»ã¬ã¹ãã³ã¹ã«äœ¿ãããããšã¯ãªããåžžã« AudioQuery.accent_phrases[n].moras ãŸã㯠AudioQuery.accent_phrases[n].pause_mora ãéããŠéæ¥çã«å©çšãããŸãã
VOICEVOX ENGINE ã® Mora åããã®äž»ãªå€æŽç¹ã¯ä»¥äžã®ãšããã§ãã
- èšå·ãã¢ãŒã©ãšããŠæ±ãããŸãã
- VOICEVOX ENGINE ã§ã¯æå笊ã»å¥èªç¹ãªã©ã®èšå·ã¯
pause_moraãšããŠæ±ãããŠããŸããããAivisSpeech Engine ã§ã¯éåžžã®ã¢ãŒã©ãšããŠæ±ãããŸãã - èšå·ã¢ãŒã©ã®å Žåã
textã«ã¯èšå·ããã®ãŸãŸãvowelã«ã¯ "pau" ãèšå®ãããŸãã
- VOICEVOX ENGINE ã§ã¯æå笊ã»å¥èªç¹ãªã©ã®èšå·ã¯
consonant/vowelãã£ãŒã«ãã¯èªã¿åãå°çšã§ãã- é³å£°åææã®ããã¹ãã®èªã¿ã«ã¯ãåžžã«
textãã£ãŒã«ãã®å€ãå©çšãããŸãã - ãããã®ãã£ãŒã«ãã®å€ã倿ŽããŠããé³å£°åæçµæã«ã¯åœ±é¿ããŸããã
- é³å£°åææã®ããã¹ãã®èªã¿ã«ã¯ãåžžã«
consonant_length/vowel_length/pitchãã£ãŒã«ãã¯ãµããŒããããŠããŸããã- AivisSpeech Engine ã®å®è£ äžããããã®å€ãç®åºããããšãã§ããªããããåžžã«ãããŒå€ãšã㊠0.0 ãè¿ãããŸãã
- äºææ§ã®ãããã£ãŒã«ããšããŠååšã¯ããŸãããåžžã«ç¡èŠãããŸãã
倿Žç¹ã®è©³çްã¯ãtts_pipeline/model.py ãåç §ããŠãã ããã
Preset åã®ä»æ§å€æŽ
Preset åã¯ããšãã£ã¿åŽã§é³å£°åæã¯ãšãªã®åæå€ã決å®ããããã®ããªã»ããæ
å ±ã§ãã
倿Žç¹ã¯ãAudioQuery åã§èª¬æãã intonationScale / tempoDynamicsScale / pitchScale / pauseLength / pauseLengthScale ã®ãã£ãŒã«ãã®ä»æ§å€æŽã«æŠã察å¿ããŠããŸãã
倿Žç¹ã®è©³çްã¯ãpreset/model.py ãåç §ããŠãã ããã
AivisSpeech Engine ã§ã¯ãµããŒããããŠããªã API ãšã³ããã€ã³ã
Warning
æå£°åæç³» API ãšããã£ã³ã»ã«å¯èœãªé³å£°åæ API ã¯ãµããŒããããŠããŸããã
äºææ§ã®ãããšã³ããã€ã³ããšããŠååšã¯ããŸãããåžžã« 501 Not Implemented ãè¿ããŸãã
詳现㯠app/routers/character.py / app/routers/tts_pipeline.py ã確èªããŠãã ããã
- GET
/singers - GET
/singer_info - POST
/cancellable_synthesis - POST
/sing_frame_audio_query - POST
/sing_frame_volume - POST
/frame_synthesis
Warning
ã¢ãŒãã£ã³ã°æ©èœãæäŸãã /synthesis_morphing API ã¯ãµããŒããããŠããŸããã
話è
ããšã«çºå£°ã¿ã€ãã³ã°ãç°ãªãé¢ä¿ã§å®è£
äžå¯èœãªããïŒåäœãããããèŽãã«èããªãïŒãåžžã« 400 Bad Request ãè¿ããŸãã
å話è
ããšã«ã¢ãŒãã£ã³ã°ã®å©çšå¯åŠãè¿ã /morphable_targets API ã§ã¯ããã¹ãŠã®è©±è
ã§ã¢ãŒãã£ã³ã°çŠæ¢æ±ããšããŠããŸãã
詳现㯠app/routers/morphing.py ã確èªããŠãã ããã
- POST
/synthesis_morphing - POST
/morphable_targets
AivisSpeech Engine ã§ã¯ãµããŒããããŠããªã API ãã©ã¡ãŒã¿
Warning
äºææ§ã®ãããã©ã¡ãŒã¿ãšããŠååšã¯ããŸãããåžžã«ç¡èŠãããŸãã
詳现㯠app/routers/character.py / app/routers/tts_pipeline.py ã確èªããŠãã ããã
core_versionãã©ã¡ãŒã¿- VOICEVOX CORE ã®ããŒãžã§ã³ãæå®ãããã©ã¡ãŒã¿ã§ãã
- AivisSpeech Engine ã§ã¯ VOICEVOX CORE ã«å¯Ÿå¿ããã³ã³ããŒãã³ãããªããããåžžã«ç¡èŠãããŸãã
enable_interrogative_upspeakãã©ã¡ãŒã¿- çåç³»ã®ããã¹ããäžãããããèªå°Ÿãèªå調æŽãããã®ãã©ã¡ãŒã¿ã§ãã
- AivisSpeech Engine ã§ã¯ãåžžã«ãïŒããïŒããâŠãããããªã©ã®ããã¹ãã«å«ãŸããèšå·ã«å¯Ÿå¿ãããèªç¶ãªææã§èªã¿äžããããŸãã
- ãããã£ãŠã
ã©ãã§ããâŠïŒã®ããã«èªã¿äžãããã¹ãã®æ«å°Ÿã«ãïŒããä»äžããã ãã§ãçåç³»ã®ææã§èªã¿äžããããšãã§ããŸãã
ãããã質å / Q&A
Tip
AivisSpeech ãšãã£ã¿ã® ãããã質å / Q&A ãããããŠã芧ãã ããã
Q. ãææ
衚çŸã®åŒ·ãã(intonationScale) ã®å€ãäžãããšçºå£°ããããããªããŸãã
AivisSpeech Engine ã§å¯Ÿå¿ããŠãããStyle-Bert-VITS2 ã¢ãã«ã¢ãŒããã¯ãã£ã®çŸæç¹ã§ã®ä»æ§ã«ãªããŸãã
話è
ãã¹ã¿ã€ã«ã«ããããŸãããintonationScale ã®å€ãäžãããããšçºå£°ããããããªã£ãããæ£èªã¿ã§äžèªç¶ãªå£°ã«ãªãå ŽåããããŸãã
æ£ããçºå£°ã§ãã intonationScale ã®äžéå€ã¯ã話è
ãã¹ã¿ã€ã«ããšã«ç°ãªããŸãã
å¿
èŠã«å¿ããŠãããããã®å£°ã«åã£ãæé©ãªå€ãèŠã€ããŠèª¿æŽããŠã¿ãŠãã ããã
Q. èªã¿æ¹ãã¢ã¯ã»ã³ããæ³å®ãšç°ãªããŸãã
AivisSpeech Engine ã§ã¯ãªãã¹ãäžçºã§æ£ããèªã¿ã»æ£ããã¢ã¯ã»ã³ãã«ãªãããåŠçã工倫ããŠããŸãããã©ãããŠãééã£ãèªã¿ã»ã¢ã¯ã»ã³ãã«ãªãå ŽåããããŸãã
ããŸã䜿ãããªãåºæåè©ã人åïŒç¹ã«ãã©ãã©ããŒã ïŒãªã©ãå
èµèŸæžã«ç»é²ãããŠããªãåèªã¯ãæ£ããèªã¿ã«ãªããªãããšãå€ãã§ãã
ããããåèªã®èªã¿æ¹ã¯èŸæžç»é²ã§å€æŽã§ããŸããAivisSpeech ãšãã£ã¿ãŸã㯠API ããåèªãç»é²ããŠã¿ãŠãã ããã
ãªããè€åèªãè±åèªã«é¢ããŠã¯ãåèªã®åªå
床ã«ããããããèŸæžãžã®ç»é²å
容ãåæ ãããªãããšããããŸããããã¯çŸæç¹ã§ã®ä»æ§ã«ãªããŸãã
Q. é·ãæç« ãäžåºŠã«é³å£°åæ API ã«éããšãé³å£°ãäžèªç¶ã«ãªã£ããã¡ã¢ãªãªãŒã¯ãçºçããŸãã
AivisSpeech Engine ã¯ãäžæãæå³ã®ãŸãšãŸããªã©ãæ¯èŒççãæã®åäœã§é³å£°åæããããšãæ³å®ããŠèšèšãããŠããŸãã
ãã®ããã1000 æåãè¶
ãããããªé·ãæç« ãäžåºŠã« /synthesis API ã«éããšã以äžã®ãããªåé¡ãçºçããå¯èœæ§ããããŸãã
- ã¡ã¢ãªäœ¿çšéãæ¥æ¿ã«å¢å ããPC ã®åäœãé ããªã
- ã¡ã¢ãªãªãŒã¯ãçºçããAivisSpeech Engine ãã¯ã©ãã·ã¥ãã
- é³å£°ã®ææãäžèªç¶ã«ãªããæ£èªã¿ã®ãããªå£°ã«ãªã
é·ãæç« ãé³å£°åæããå Žåã¯ã以äžã®ãããªäœçœ®ã§æç« ãåºåã£ãŠãããããé³å£°åæ API ã«éä¿¡ããããšãããããããŸãã
ããŒããªãããã¯ãããŸããããé³å£°åæ1åã«ã€ã 500 æå以å
ãæãŸããã§ãã
- å¥èªç¹ïŒããããããïŒã®äœçœ®
- æã®æå³ã®åãç®ïŒæ®µèœã®åºåããªã©ïŒ
- äŒè©±æã®åºåãïŒããã§å²ãŸããéšåïŒ
Tip
æã®æå³ã®åãç®ã§åå²ãããšãããèªç¶ãªææã®é³å£°ãçæã§ããåŸåããããŸãã
ããã¯ãäžåºŠã«é³å£°åæ API ã«éãããæç« å
šäœã«ãããã¹ãã®å
容ã«å¯Ÿå¿ããææ
衚çŸãææãé©çšãããããã§ãã
æç« ãé©åã«åå²ããããšã§ãåæã®ææ
衚çŸãã€ã³ãããŒã·ã§ã³ããªã»ããããããèªç¶ãªèªã¿äžããå®çŸã§ããŸãã
Q. ãªãã©ã€ã³ã® PC ã§ãå©çšã§ããŸããïŒ
AivisSpeech ãã¯ãããŠèµ·åãããšãã®ã¿ãã¢ãã«ããŒã¿ã®ããŠã³ããŒãã®ãããã€ã³ã¿ãŒãããã¢ã¯ã»ã¹ãå¿
èŠã«ãªããŸãã
2åç®ä»¥éã®èµ·åã§ã¯ãPC ããªãã©ã€ã³ã§ãã䜿ãããã ããŸãã
Q. èŸæžãã€ã³ããŒãïŒãšã¯ã¹ããŒããããã§ãã
èµ·åäžã® AivisSpeech Engine ã®èšå®ç»é¢ã§è¡ããŸãã
AivisSpeech Engine èµ·åäžã«ãã©ãŠã¶ãã http://127.0.0.1:[AivisSpeech Engine ã®ããŒãçªå·]/setting ã«ã¢ã¯ã»ã¹ãããšãAivisSpeech Engine ã®èšå®ç»é¢ãéããŸãã
AivisSpeech Engine ã®ããŒãçªå·ã®ããã©ã«ã㯠10101 ã§ãã
Q. GPU ã¢ãŒã (--use_gpu) ã«åãæ¿ããã®ã«é³å£°çæã CPU ã¢ãŒããããé
ãã§ãã
CPU å
èµã® GPU (iGPU) ã®ã¿ã® PC ã§ã GPU ã¢ãŒãã¯äœ¿ããŸãããã»ãšãã©ã®å Žå CPU ã¢ãŒãããããªãé
ããªã£ãŠããŸããããããããã§ããŸããã
CPU å
èµã® GPU ã¯ç¬ç«ãã GPU (dGPU) ã«æ¯ã¹ãŠæ§èœãäœããAI é³å£°åæã®ãããªéãåŠçãèŠæãªããã§ãã
äžæ¹ã§ãæè¿ã® CPU ã¯æ§èœã倧å¹
ã«åäžããŠãããCPU ã ãã§ãååé«éã«é³å£°ãçæã§ããŸãã
ãã®ãããdGPU éæèŒã® PC ã§ã¯ CPU ã¢ãŒãã®å©çšãããããããŸãã
Q. é³å£°çææãIntel 第 12 äžä»£ä»¥éã® CPU ã§ãã«æ§èœãçºæ®ã§ããªãã
Intel ã®ç¬¬ 12 äžä»£ä»¥éã® CPUïŒPã³ã¢ã»Eã³ã¢ã®ãã€ããªããæ§æïŒæèŒ PC ãã䜿ãã®å ŽåãWindows ã®é»æºèšå®ã«ãã£ãŠé³å£°çæã®æ§èœã倧ããå€ããããšããããŸãã
ããã¯ãããã©ã«ãã®ããã©ã³ã¹ãã¢ãŒãã§ã¯ãé³å£°çæã¿ã¹ã¯ãçé»åéèŠã®Eã³ã¢ã«å²ãåœãŠãããããããã§ãã
以äžã®æé ã§èšå®ã倿ŽãããšãPã³ã¢ãšEã³ã¢ã®äž¡æ¹ãæå€§é掻çšããé³å£°çæãããé«éã«è¡ããŸãã
- Windows 11 ã®èšå®ãéã
- ã·ã¹ãã â 黿º ãšé²ã
- ã黿ºã¢ãŒããããæé©ãªããã©ãŒãã³ã¹ãã«å€æŽãã
â ïž ã³ã³ãããŒã«ããã«å
ã黿ºãã©ã³ãã«ããé«ããã©ãŒãã³ã¹ãèšå®ããããŸãããèšå®å
容ãç°ãªããŸãã
Intel 第 12 äžä»£ä»¥éã® CPU ã§ã¯ãWindows 11 ã®èšå®ç»é¢ããã®ã黿ºã¢ãŒããã®å€æŽãããããããŸãã
Q. AivisSpeech Engine ã¯ç¡æã§å©çšã§ããŸããïŒã¯ã¬ãžãã衚èšã¯å¿ èŠã§ããïŒ
AivisSpeech ã¯ãå©çšçšéãæçžãããªããèªç±ãª AI é³å£°åæãœãããŠã§ã¢ãç®æããŠããŸãã
ïŒææç©ã§äœ¿ã£ãé³å£°åæã¢ãã«ã®ã©ã€ã»ã³ã¹æ¬¡ç¬¬ã§ã¯ãããŸããïŒå°ãªããšããœãããŠã§ã¢æ¬äœã¯ã¯ã¬ãžãã衚èšäžèŠã§ãåäººã»æ³äººã»åçšã»éåçšãåãããèªç±ã«ã䜿ãããã ããŸãã
AivisHub ãªã©ã§å
¬éãããŠãã ACML / ACML-NC / ãããªãã¯ãã¡ã€ã³ (CC0) ã©ã€ã»ã³ã¹ã®é³å£°åæã¢ãã«ãã䜿ããªããã¯ã¬ãžãã衚èšã®çŸ©åã¯ãããŸããã
âŠãšã¯ãããããå€ãã®æ¹ã« AivisSpeech ã®ããšãç¥ã£ãŠããã ãããæ°æã¡ããããŸãã
ãããããã°ãææç©ã®ã©ããã« AivisSpeech ã®ããšãã¯ã¬ãžããããŠããã ãããšå¬ããã§ããïŒã¯ã¬ãžããã®è¡šèšãã©ãŒãããã¯ãä»»ãããŸããïŒ
Q. AivisSpeech Engine ã®ãšã©ãŒãã°ã¯ã©ãã§ç¢ºèªã§ããŸããïŒ
以äžã®ãã©ã«ãã«ä¿åãããŠããŸãã
- Windows:
C:\Users\(ãŠãŒã¶ãŒå)\AppData\Roaming\AivisSpeech-Engine\Logs - Mac:
~/Library/Application Support/AivisSpeech-Engine/Logs - Linux:
~/.local/share/AivisSpeech-Engine/Logs
Q. ãã£ãŒãããã¯ãäžå ·åãå ±åããã«ã¯ïŒ
ãææ³ã»ãèŠæã¯ããã² Twitter (X) ã«ãŠããã·ã¥ã¿ã° #AivisSpeech ãä»ããŠãã€ãŒãããŠãã ããïŒ
ããŸãåããªãå Žåãäžå ·åãèŠã€ããããæ¹ã¯ã以äžã®ããããã®æ¹æ³ã§ãé£çµ¡ãã ããã
1. GitHub IssueïŒæšå¥šïŒ
GitHub ã¢ã«ãŠã³ãããæã¡ã®æ¹ã¯ãGitHub ã® Issue ãããå ±åããã ããŸããšãæ©æã®å¯Ÿå¿ãå¯èœã§ãã
2. ãåãåãããã©ãŒã
Aivis Project ãåãåãããã©ãŒã ããããå ±åããã ããŸãã
Tip
ãšã©ãŒã¡ãã»ãŒãžãæäœå 容ãªã©ããªãã¹ãå ·äœçãªç¶æ³ãèšèŒããã ããŸããšãããè¿ éãªå¯Ÿå¿ãå¯èœã§ãã
- äžå ·åã®å 容
- åçŸæé ïŒåç»ãåçãããã°æ·»ä»ããŠãã ããïŒ
- OS ã®çš®é¡ã»AivisSpeech ã®ããŒãžã§ã³
- 解決ã®ããã«è©Šãããããš
- ãŠã€ã«ã¹å¯Ÿçãœãããªã©ã®æç¡ïŒé¢ä¿ãããããã§ããã°ïŒ
- 衚瀺ããããšã©ãŒã¡ãã»ãŒãž
- ãšã©ãŒãã°
éçºæ¹é
VOICEVOX ã¯éåžžã«å·šå€§ãªãœãããŠã§ã¢ã§ãããçŸåšã掻çºã«éçºãç¶ããããŠããŸãã
ãã®ãããAivisSpeech Engine ã§ã¯ VOICEVOX ENGINE ã®ææ°çãããŒã¹ã«ã以äžã®æ¹éã§éçºãè¡ã£ãŠããŸãã
- VOICEVOX ææ°çãžã®è¿œåŸã容æã«ãããããã§ããã ãæ¹å€ãå¿
èŠæå°éã«çãã
- VOICEVOX ENGINE ãã AivisSpeech Engine ãžã®ãªãã©ã³ãã£ã³ã°ã¯å¿ èŠãªç®æã®ã¿è¡ã
voicevox_engineãã£ã¬ã¯ããªããªããŒã ãããš import æã®å€æŽå·®åãèšå€§ã«ãªãããããããŠãªãã©ã³ãã£ã³ã°ãè¡ããªã
- ãªãã¡ã¯ã¿ãªã³ã°ãè¡ããªã
- VOICEVOX ENGINE ãšã®ã³ã³ããªã¯ããçºçããããšã容æã«äºæ³ãããäžãã³ãŒãå šäœã«ç²ŸéããŠããããã§ã¯ãªããã
- AivisSpeech ã§å©çšããªãæ©èœ (æå£°åææ©èœãªã©) ã§ãã£ãŠããã³ãŒãã®åé€ã¯è¡ããªã
- ãããã³ã³ããªã¯ããåé¿ãããã
- å©çšããªãã³ãŒãã®ç¡å¹åã¯åé€ã§ã¯ãªããã³ã¡ã³ãã¢ãŠãã§è¡ã
- VOICEVOX ENGINE ãšã®å·®åãæå°éã«æããããã倧éã«ã³ã¡ã³ãã¢ãŠããå¿ èŠãªå Žåã¯ã# ã§ã¯ãªã """ """ ã䜿ã
- ãã ããDockerfile ã GitHub Actions ãªã©ã®æ§æãã¡ã€ã«ããã«ãããŒã«é¡ã¯ãã®éãã§ã¯ãªã
- å ã AivisSpeech Engine ã§ã®æ¹å€éã倧ããéšåã«ã€ããã³ã¡ã³ãã¢ãŠãã§ã¯éåžžã«éå€ãªã³ãŒãã«ãªããã
- ä¿å®ã远åŸãå°é£ãªãããããã¥ã¡ã³ãã®æŽæ°ã¯è¡ããªã
- ãã®ããåããã¥ã¡ã³ãã¯äžåæŽæ°ãããŠããããAivisSpeech Engine ã§ã®å€æŽãåæ ããŠããªã
- AivisSpeech Engine åãã®æ¹å€ã«ãšããªããã¹ãã³ãŒãã®ç¶æãå°é£ãªããããã¹ãã³ãŒãã®è¿œå ã¯è¡ããªã
- æ¢åã®ãã¹ãã³ãŒãã®ã¿ããã¹ããéãããã«äžéšç®æã®ä¿®æ£ãã³ã¡ã³ãã¢ãŠããè¡ããæ¶æ¥µçã«ç¶æãã
- AivisSpeech Engine ã§ã®æ¹å€ã«ããããã¹ãçµæã®ã¹ãããã·ã§ãã㯠VOICEVOX ENGINE ãšç°ãªã
- AivisSpeech Engine ã§ã®æ¹å€ã«ããåããªããªã£ããã¹ãã®ä¿®æ£ã¯è¡ãããã³ã¡ã³ãã¢ãŠãã§å¯Ÿå¿ãã
- AivisSpeech Engine åãã«æ°èŠéçºããç®æã¯ãä¿å®ã³ã¹ããéã¿ãã¹ãã³ãŒãã远å ããªã
- æ¢åã®ãã¹ãã³ãŒãã®ã¿ããã¹ããéãããã«äžéšç®æã®ä¿®æ£ãã³ã¡ã³ãã¢ãŠããè¡ããæ¶æ¥µçã«ç¶æãã
éçºç°å¢ã®æ§ç¯
æé ã¯ãªãªãžãã«ã® VOICEVOX ENGINE ãšå€§å¹
ã«ç°ãªããŸãã
äºåã« Python 3.11 ãã€ã³ã¹ããŒã«ãããŠããå¿
èŠããããŸãã
# uv ãš pre-commit ãã€ã³ã¹ããŒã«
pip install uv pre-commit
# pre-commit ãæå¹å
pre-commit install
# äŸåé¢ä¿ããã¹ãŠã€ã³ã¹ããŒã«
uv sync --group dev --group build
éçº
æé ã¯ãªãªãžãã«ã® VOICEVOX ENGINE ãšå€§å¹ ã«ç°ãªããŸãã
# éçºç°å¢ã§ AivisSpeech Engine ãèµ·å
uv run task serve
# AivisSpeech Engine ã®ãã«ãã衚瀺
uv run task serve --help
# ã³ãŒããã©ãŒããããèªåä¿®æ£
uv run task format
# ã³ãŒããã©ãŒãããããã§ãã¯
uv run task lint
# typos ã«ããã¿ã€ããã§ãã¯
uv run task typos
# ãã¹ããå®è¡
uv run task test
# ãã¹ãã®ã¹ãããã·ã§ãããæŽæ°
uv run task update-snapshots
# ã©ã€ã»ã³ã¹æ
å ±ãæŽæ°
uv run task update-licenses
# AivisSpeech Engine ããã«ã
uv run task build
ã©ã€ã»ã³ã¹
ããŒã¹ã§ãã VOICEVOX ENGINE ã®ãã¥ã¢ã«ã©ã€ã»ã³ã¹ã®ãã¡ãLGPL-3.0 ã®ã¿ãåç¬ã§ç¶æ¿ããŸãã
äžèšãªãã³ã« docs/ 以äžã®ããã¥ã¡ã³ãã¯ãVOICEVOX ENGINE æ¬å®¶ã®ããã¥ã¡ã³ããæ¹å€ãªãã§ãã®ãŸãŸåŒãç¶ãã§ããŸãããããã®ããã¥ã¡ã³ãã®å 容ã AivisSpeech Engine ã«ãéçšãããã¯ä¿èšŒãããŸããã
Special Thanks
AivisSpeech Engine ã¯ãå€ãã®çŽ æŽããããªãŒãã³ãœãŒã¹ãœãããŠã§ã¢ãšãã®è²¢ç®ã«æ·±ãæ¯ããããŠããŸãã
ãªãŒãã³ãœãŒã¹ãœãããŠã§ã¢ãéçºããŠãã ãã£ãå
šãŠã®æ¹ã
ãã³ãã¥ããã£ã®çæ§ã®è²¢ç®ãšãµããŒãã«ãå¿ããæè¬ããããŸãã
- @Stardust-minus
- @tuna2134
- @googlefan256
- @WariHima
- @Patchethium
- VOICEVOX ENGINE Contributors
- Everyone in AI声ã¥ããæè¡ç ç©¶äŒ
VOICEVOX ENGINE
VOICEVOX ã®ãšã³ãžã³ã§ãã
宿
㯠HTTP ãµãŒããŒãªã®ã§ããªã¯ãšã¹ããéä¿¡ããã°ããã¹ãé³å£°åæã§ããŸãã
ïŒãšãã£ã¿ãŒã¯ VOICEVOX ã ã³ã¢ã¯ VOICEVOX CORE ã å šäœæ§æã¯ ãã¡ã ã«è©³çްããããŸããïŒ
ç®æ¬¡
ç®çã«åãããã¬ã€ãã¯ãã¡ãã§ãã
- ãŠãŒã¶ãŒã¬ã€ã: é³å£°åæããããæ¹åã
- è²¢ç®è ã¬ã€ã: ã³ã³ããªãã¥ãŒããããæ¹åã
- éçºè ã¬ã€ã: ã³ãŒããå©çšãããæ¹åã
ãŠãŒã¶ãŒã¬ã€ã
ããŠã³ããŒã
ãã¡ããã察å¿ãããšã³ãžã³ãããŠã³ããŒãããŠãã ããã
API ããã¥ã¡ã³ã
API ããã¥ã¡ã³ãããåç §ãã ããã
VOICEVOX ãšã³ãžã³ãããã¯ãšãã£ã¿ãèµ·åããç¶æ
ã§ http://127.0.0.1:50021/docs ã«ã¢ã¯ã»ã¹ãããšãèµ·åäžã®ãšã³ãžã³ã®ããã¥ã¡ã³ãã確èªã§ããŸãã
ä»åŸã®æ¹éãªã©ã«ã€ããŠã¯ VOICEVOX é³å£°åæãšã³ãžã³ãšã®é£æº ãåèã«ãªããããããŸããã
Docker ã€ã¡ãŒãž
CPU
docker pull voicevox/voicevox_engine:cpu-latest
docker run --rm -p '127.0.0.1:50021:50021' voicevox/voicevox_engine:cpu-latest
GPU
docker pull voicevox/voicevox_engine:nvidia-latest
docker run --rm --gpus all -p '127.0.0.1:50021:50021' voicevox/voicevox_engine:nvidia-latest
ãã©ãã«ã·ã¥ãŒãã£ã³ã°
GPU çãå©çšããå Žåãç°å¢ã«ãã£ãŠãšã©ãŒãçºçããããšããããŸãããã®å Žåã--runtime=nvidiaãdocker runã«ã€ããŠå®è¡ãããšè§£æ±ºã§ããããšããããŸãã
HTTP ãªã¯ãšã¹ãã§é³å£°åæãããµã³ãã«ã³ãŒã
echo -n "ããã«ã¡ã¯ãé³å£°åæã®äžçãžãããã" >text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1"\
--get --data-urlencode text@text.txt \
> query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio.wav
çæãããé³å£°ã¯ãµã³ããªã³ã°ã¬ãŒãã 24000Hz ãšå°ãç¹æ®ãªãããé³å£°ãã¬ãŒã€ãŒã«ãã£ãŠã¯åçã§ããªãå ŽåããããŸãã
speaker ã«æå®ããå€ã¯ /speakers ãšã³ããã€ã³ãã§åŸãããã¹ã¿ã€ã«ã®æ
å ±ã«ãã id ã§ããäºææ§ã®ããã« speaker ãšããååã«ãªã£ãŠããŸãã
é³å£°ã調æŽãããµã³ãã«ã³ãŒã
/audio_query ã§åŸãããé³å£°åæçšã®ã¯ãšãªã®ãã©ã¡ãŒã¿ãç·šéããããšã§ãé³å£°ã調æŽã§ããŸãã
äŸãã°ã話éã 1.5 åéã«ããŠã¿ãŸãã
echo -n "ããã«ã¡ã¯ãé³å£°åæã®äžçãžãããã" >text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1" \
--get --data-urlencode text@text.txt \
> query.json
# sed ã䜿çšã㊠speedScale ã®å€ã 1.5 ã«å€æŽ
sed -i -r 's/"speedScale":[0-9.]+/"speedScale":1.5/' query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio_fast.wav
èªã¿æ¹ã AquesTalk é¢šèšæ³ã§ååŸã»ä¿®æ£
AquesTalk é¢šèšæ³
ãAquesTalk é¢šèšæ³ãã¯ã«ã¿ã«ããšèšå·ã ãã§èªã¿æ¹ãæå®ããèšæ³ã§ããAquesTalk æ¬å®¶ã®èšæ³ãšã¯äžéšãç°ãªããŸãã
AquesTalk é¢šèšæ³ã¯æ¬¡ã®ã«ãŒã«ã«åŸããŸãïŒ
- å šãŠã®ã«ãã¯ã«ã¿ã«ãã§èšè¿°ããã
- ã¢ã¯ã»ã³ãå¥ã¯
/ãŸãã¯ãã§åºåãããã§åºåã£ãå Žåã«éãç¡é³åºéãæ¿å ¥ãããã - ã«ãã®æåã«
_ãå ¥ãããšãã®ã«ãã¯ç¡å£°åããã - ã¢ã¯ã»ã³ãäœçœ®ã
'ã§æå®ãããå šãŠã®ã¢ã¯ã»ã³ãå¥ã«ã¯ã¢ã¯ã»ã³ãäœçœ®ã 1 ã€æå®ããå¿ èŠãããã - ã¢ã¯ã»ã³ã奿«ã«
ïŒ(å šè§)ãå ¥ããããšã«ããçåæã®çºé³ãã§ãã
AquesTalk é¢šèšæ³ã®ãµã³ãã«ã³ãŒã
/audio_queryã®ã¬ã¹ãã³ã¹ã«ã¯ãšã³ãžã³ã倿ããèªã¿æ¹ãAquesTalk é¢šèšæ³ã§èšè¿°ãããŸãã
ãããä¿®æ£ããããšã§é³å£°ã®èªã¿ä»®åãã¢ã¯ã»ã³ããå¶åŸ¡ã§ããŸãã
# èªãŸãããæç« ãutf-8ã§text.txtã«æžãåºã
echo -n "ãã£ãŒãã©ãŒãã³ã°ã¯äžèœè¬ã§ã¯ãããŸãã" >text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1" \
--get --data-urlencode text@text.txt \
> query.json
cat query.json | grep -o -E "\"kana\":\".*\""
# çµæ... "kana":"ãã£'ã€ã/ã©'ã¢ãã³ã°ã¯/ãã³ããªã€ã¯ãã¯ã¢ãªãã»'ã³"
# "ãã£ã€ãã©'ã¢ãã³ã°ã¯/ãã³ã'ãªã€ã¯ãã¯/ã¢ãªãã»'ã³"ãšèªãŸãããã®ã§ã
# is_kana=trueãã€ããŠã€ã³ãããŒã·ã§ã³ãååŸãnewphrases.jsonã«ä¿å
echo -n "ãã£ã€ãã©'ã¢ãã³ã°ã¯/ãã³ã'ãªã€ã¯ãã¯/ã¢ãªãã»'ã³" > kana.txt
curl -s \
-X POST \
"127.0.0.1:50021/accent_phrases?speaker=1&is_kana=true" \
--get --data-urlencode text@kana.txt \
> newphrases.json
# query.jsonã®"accent_phrases"ã®å
容ãnewphrases.jsonã®å
容ã«çœ®ãæãã
cat query.json | sed -e "s/\[{.*}\]/$(cat newphrases.json)/g" > newquery.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @newquery.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio.wav
ãŠãŒã¶ãŒèŸæžæ©èœã«ã€ããŠ
API ãããŠãŒã¶ãŒèŸæžã®åç §ãåèªã®è¿œå ãç·šéãåé€ãè¡ãããšãã§ããŸãã
åç §
/user_dictã« GET ãªã¯ãšã¹ããæããããšã§ãŠãŒã¶ãŒèŸæžã®äžèЧãååŸããããšãã§ããŸãã
curl -s -X GET "127.0.0.1:50021/user_dict"
åèªè¿œå
/user_dict_wordã« POST ãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã«åèªã远å ããããšãã§ããŸãã
URL ãã©ã¡ãŒã¿ãšããŠã以äžãå¿
èŠã§ãã
- surface ïŒèŸæžã«ç»é²ããåèªïŒ
- pronunciation ïŒã«ã¿ã«ãã§ã®èªã¿æ¹ïŒ
- accent_type ïŒã¢ã¯ã»ã³ãæ žäœçœ®ãæŽæ°ïŒ
ã¢ã¯ã»ã³ãæ žäœçœ®ã«ã€ããŠã¯ããã¡ãã®æç« ãåèã«ãªãããšæããŸãã
ãåãšãªã£ãŠããæ°åã®éšåãã¢ã¯ã»ã³ãæ žäœçœ®ã«ãªããŸãã
https://tdmelodic.readthedocs.io/ja/latest/pages/introduction.html
æåããå Žåã®è¿ãå€ã¯åèªã«å²ãåœãŠããã UUID ã®æååã«ãªããŸãã
surface="test"
pronunciation="ãã¹ã"
accent_type="1"
curl -s -X POST "127.0.0.1:50021/user_dict_word" \
--get \
--data-urlencode "surface=$surface" \
--data-urlencode "pronunciation=$pronunciation" \
--data-urlencode "accent_type=$accent_type"
åèªä¿®æ£
/user_dict_word/{word_uuid}ã« PUT ãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã®åèªãä¿®æ£ããããšãã§ããŸãã
URL ãã©ã¡ãŒã¿ãšããŠã以äžãå¿
èŠã§ãã
- surface ïŒèŸæžã«ç»é²ããã¯ãŒãïŒ
- pronunciation ïŒã«ã¿ã«ãã§ã®èªã¿æ¹ïŒ
- accent_type ïŒã¢ã¯ã»ã³ãæ žäœçœ®ãæŽæ°ïŒ
word_uuid ã¯åèªè¿œå æã«ç¢ºèªã§ããã»ãããŠãŒã¶ãŒèŸæžãåç
§ããããšã§ã確èªã§ããŸãã
æåããå Žåã®è¿ãå€ã¯204 No Contentã«ãªããŸãã
surface="test2"
pronunciation="ãã¹ãããŒ"
accent_type="2"
# ç°å¢ã«ãã£ãŠword_uuidã¯é©å®æžãæããŠãã ãã
word_uuid="cce59b5f-86ab-42b9-bb75-9fd3407f1e2d"
curl -s -X PUT "127.0.0.1:50021/user_dict_word/$word_uuid" \
--get \
--data-urlencode "surface=$surface" \
--data-urlencode "pronunciation=$pronunciation" \
--data-urlencode "accent_type=$accent_type"
åèªåé€
/user_dict_word/{word_uuid}ã« DELETE ãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã®åèªãåé€ããããšãã§ããŸãã
word_uuid ã¯åèªè¿œå æã«ç¢ºèªã§ããã»ãããŠãŒã¶ãŒèŸæžãåç
§ããããšã§ã確èªã§ããŸãã
æåããå Žåã®è¿ãå€ã¯204 No Contentã«ãªããŸãã
# ç°å¢ã«ãã£ãŠword_uuidã¯é©å®æžãæããŠãã ãã
word_uuid="cce59b5f-86ab-42b9-bb75-9fd3407f1e2d"
curl -s -X DELETE "127.0.0.1:50021/user_dict_word/$word_uuid"
èŸæžã®ã€ã³ããŒã&ãšã¯ã¹ããŒã
ãšã³ãžã³ã®èšå®ããŒãžå ã®ããŠãŒã¶ãŒèŸæžã®ãšã¯ã¹ããŒã&ã€ã³ããŒããç¯ã§ããŠãŒã¶ãŒèŸæžã®ã€ã³ããŒã&ãšã¯ã¹ããŒããå¯èœã§ãã
ä»ã«ã API ã§ãŠãŒã¶ãŒèŸæžã®ã€ã³ããŒã&ãšã¯ã¹ããŒããå¯èœã§ãã
ã€ã³ããŒãã«ã¯ POST /import_user_dictããšã¯ã¹ããŒãã«ã¯ GET /user_dict ãå©çšããŸãã
åŒæ°çã®è©³çŽ°ã¯ API ããã¥ã¡ã³ããã芧ãã ããã
ããªã»ããæ©èœã«ã€ããŠ
ãŠãŒã¶ãŒãã£ã¬ã¯ããªã«ããpresets.yamlãç·šéããããšã§ãã£ã©ã¯ã¿ãŒã話éãªã©ã®ããªã»ããã䜿ãããšãã§ããŸãã
echo -n "ããªã»ãããããŸã掻çšããã°ããµãŒãããŒãã£éã§åãèšå®ã䜿ãããšãã§ããŸã" >text.txt
# ããªã»ããæ
å ±ãååŸ
curl -s -X GET "127.0.0.1:50021/presets" > presets.json
preset_id=$(cat presets.json | sed -r 's/^.+"id"\:\s?([0-9]+?).+$/\1/g')
style_id=$(cat presets.json | sed -r 's/^.+"style_id"\:\s?([0-9]+?).+$/\1/g')
# é³å£°åæçšã®ã¯ãšãªãååŸ
curl -s \
-X POST \
"127.0.0.1:50021/audio_query_from_preset?preset_id=$preset_id"\
--get --data-urlencode text@text.txt \
> query.json
# é³å£°åæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=$style_id" \
> audio.wav
speaker_uuidã¯ã/speakersã§ç¢ºèªã§ããŸãidã¯éè€ããŠã¯ãããŸãã- ãšã³ãžã³èµ·ååŸã«ãã¡ã€ã«ãæžãæãããšãšã³ãžã³ã«åæ ãããŸã
2 çš®é¡ã®ã¹ã¿ã€ã«ã§ã¢ãŒãã£ã³ã°ãããµã³ãã«ã³ãŒã
/synthesis_morphingã§ã¯ã2 çš®é¡ã®ã¹ã¿ã€ã«ã§ããããåæãããé³å£°ãå
ã«ãã¢ãŒãã£ã³ã°ããé³å£°ãçæããŸãã
echo -n "ã¢ãŒãã£ã³ã°ãå©çšããããšã§ãïŒçš®é¡ã®å£°ãæ··ããããšãã§ããŸãã" > text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=8"\
--get --data-urlencode text@text.txt \
> query.json
# å
ã®ã¹ã¿ã€ã«ã§ã®åæçµæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=8" \
> audio.wav
export MORPH_RATE=0.5
# ã¹ã¿ã€ã«2çš®é¡åã®é³å£°åæ+WORLDã«ããé³å£°åæãå
¥ãããæéãæããã®ã§æ³šæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis_morphing?base_speaker=8&target_speaker=10&morph_rate=$MORPH_RATE" \
> audio.wav
export MORPH_RATE=0.9
# queryãbase_speakerãtarget_speakerãåãå Žåã¯ãã£ãã·ã¥ã䜿çšãããããæ¯èŒçé«éã«çæããã
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis_morphing?base_speaker=8&target_speaker=10&morph_rate=$MORPH_RATE" \
> audio.wav
ãã£ã©ã¯ã¿ãŒã®è¿œå æ å ±ãååŸãããµã³ãã«ã³ãŒã
è¿œå æ
å ±ã®äžã® portrait.png ãååŸããã³ãŒãã§ãã
ïŒjqã䜿çšã㊠json ãããŒã¹ããŠããŸããïŒ
curl -s -X GET "127.0.0.1:50021/speaker_info?speaker_uuid=7ffcb7ce-00ec-4bdc-82cd-45a8889e43ff" \
| jq -r ".portrait" \
| base64 -d \
> portrait.png
ãã£ã³ã»ã«å¯èœãªé³å£°åæ
/cancellable_synthesisã§ã¯éä¿¡ãåæããå Žåã«å³åº§ã«èšç®ãªãœãŒã¹ãéæŸãããŸãã
(/synthesisã§ã¯éä¿¡ãåæããŠãæåŸãŸã§é³å£°åæã®èšç®ãè¡ãããŸã)
ãã® API ã¯å®éšçæ©èœã§ããããšã³ãžã³èµ·åæã«åŒæ°ã§--enable_cancellable_synthesisãæå®ããªããšæå¹åãããŸããã
é³å£°åæã«å¿
èŠãªãã©ã¡ãŒã¿ã¯/synthesisãšåæ§ã§ãã
HTTP ãªã¯ãšã¹ãã§æå£°åæãããµã³ãã«ã³ãŒã
echo -n '{
"notes": [
{ "key": null, "frame_length": 15, "lyric": "" },
{ "key": 60, "frame_length": 45, "lyric": "ã" },
{ "key": 62, "frame_length": 45, "lyric": "ã¬" },
{ "key": 64, "frame_length": 45, "lyric": "ã" },
{ "key": null, "frame_length": 15, "lyric": "" }
]
}' > score.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @score.json \
"127.0.0.1:50021/sing_frame_audio_query?speaker=6000" \
> query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/frame_synthesis?speaker=3001" \
> audio.wav
楜èã®key㯠MIDI çªå·ã§ãã
lyricã¯æè©ã§ãä»»æã®æååãæå®ã§ããŸããããšã³ãžã³ã«ãã£ãŠã¯ã²ãããªã»ã«ã¿ã«ãïŒã¢ãŒã©ä»¥å€ã®æååã¯ãšã©ãŒã«ãªãããšããããŸãã
ãã¬ãŒã ã¬ãŒãã¯ããã©ã«ãã 93.75Hz ã§ããšã³ãžã³ãããã§ã¹ãã®frame_rateã§ååŸã§ããŸãã
ïŒã€ç®ã®ããŒãã¯ç¡é³ã§ããå¿
èŠããããŸãã
/sing_frame_audio_queryã§æå®ã§ããspeakerã¯ã/singersã§ååŸã§ããã¹ã¿ã€ã«ã®å
ãçš®é¡ãsingãsinging_teacherãªã¹ã¿ã€ã«ã®style_idã§ãã
/frame_synthesisã§æå®ã§ããspeakerã¯ã/singersã§ååŸã§ããã¹ã¿ã€ã«ã®å
ãçš®é¡ãframe_decodeã®style_idã§ãã
åŒæ°ã speaker ãšããååã«ãªã£ãŠããã®ã¯ãä»ã® API ãšäžè²«æ§ãããããããã§ãã
/sing_frame_audio_queryãš/frame_synthesisã«ç°ãªãã¹ã¿ã€ã«ãæå®ããããšãå¯èœã§ãã
CORS èšå®
VOICEVOX ã§ã¯ã»ãã¥ãªãã£ä¿è·ã®ããlocalhostã»127.0.0.1ã»app://ã»ãã©ãŠã¶æ¡åŒµ URIã»Origin ãªã以å€ã® Origin ãããªã¯ãšã¹ããåãå
¥ããªãããã«ãªã£ãŠããŸãã
ãã®ãããäžéšã®ãµãŒãããŒãã£ã¢ããªããã®ã¬ã¹ãã³ã¹ãåãåããªãå¯èœæ§ããããŸãã
ãããåé¿ããæ¹æ³ãšããŠããšã³ãžã³ããèšå®ã§ãã UI ãçšæããŠããŸãã
èšå®æ¹æ³
- http://127.0.0.1:50021/setting ã«ã¢ã¯ã»ã¹ããŸãã
- å©çšããã¢ããªã«åãããŠèšå®ã倿Žã远å ããŠãã ããã
- ä¿åãã¿ã³ãæŒããŠã倿Žã確å®ããŠãã ããã
- èšå®ã®é©çšã«ã¯ãšã³ãžã³ã®åèµ·åãå¿ èŠã§ããå¿ èŠã«å¿ããŠåèµ·åãããŠãã ããã
ããŒã¿ã倿Žãã API ãç¡å¹åãã
å®è¡æåŒæ°--disable_mutable_apiãç°å¢å€æ°VV_DISABLE_MUTABLE_API=1ãæå®ããããšã§ããšã³ãžã³ã®èšå®ãèŸæžãªã©ã倿Žãã API ãç¡å¹ã«ã§ããŸãã
æåã³ãŒã
ãªã¯ãšã¹ãã»ã¬ã¹ãã³ã¹ã®æåã³ãŒãã¯ãã¹ãŠ UTF-8 ã§ãã
è±åèªã®èªã¿æ¹ãå€ãã
èŸæžã«ç»é²ãããŠããªãè±åèªã¯ãããã©ã«ãã§èªç¶ã«ã«ã¿ã«ãèªã¿ããŸãã
ãã®æ©èœãç¡å¹ã«ãããå Žå㯠/audio_query ã® enable_katakana_english ãã©ã¡ãŒã¿ã« false ãæå®ããŠãã ããã
echo -n "ããã«ã¡ã¯ãvoice synthesisã®worldãžwelcome" >text.txt
# ãããã«ã¡ã¯ããã€ã¹ ã·ã³ã»ã·ã¹ã®...ãã®ããã«èªãŸããŸãã
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1" \
--get --data-urlencode text@text.txt \
> query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio.wav
# ãããã«ã¡ã¯ããã€ã¹ ãšã¹ã¯ã€ãšããã£ãŒãšãããšã¹ã¢ã€ãšã¹ã®...ãã®ããã«èªãŸããŸãã
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1&enable_katakana_english=false" \
--get --data-urlencode text@text.txt \
> disabled_query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @disabled_query.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> disabled_audio.wav
ãã®ä»ã®åŒæ°
ãšã³ãžã³èµ·åæã«åŒæ°ãæå®ã§ããŸãã詳ããããšã¯-håŒæ°ã§ãã«ãã確èªããŠãã ããã
$ uv run run.py -h
usage: run.py [-h] [--host HOST] [--port PORT] [--use_gpu | --no-use_gpu] [--voicevox_dir VOICEVOX_DIR] [--voicelib_dir VOICELIB_DIR] [--runtime_dir RUNTIME_DIR] [--enable_mock]
[--enable_cancellable_synthesis] [--init_processes INIT_PROCESSES] [--load_all_models] [--cpu_num_threads CPU_NUM_THREADS] [--output_log_utf8] [--cors_policy_mode {all,localapps}]
[--allow_origin [ALLOW_ORIGIN ...]] [--setting_file SETTING_FILE] [--preset_file PRESET_FILE] [--disable_mutable_api]
VOICEVOX ã®ãšã³ãžã³ã§ãã
options:
-h, --help show this help message and exit
--host HOST æ¥ç¶ãåãä»ãããã¹ãã¢ãã¬ã¹ã§ããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_HOST ã®å€ã䜿ãããŸãã
--port PORT æ¥ç¶ãåãä»ããããŒãçªå·ã§ããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_PORT ã®å€ã䜿ãããŸãã
--use_gpu, --no-use_gpu
GPUã䜿ã£ãŠé³å£°åæãããèšå®ããŸããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_USE_GPU ã®å€ã䜿ãããŸããVV_USE_GPU ã®å€ã1ã®å Žåã¯GPUã䜿çšãã0ãŸãã¯ç©ºæåãå€ããªãå Žåã¯äœ¿çšãããŸããã
--voicevox_dir VOICEVOX_DIR
VOICEVOXã®ãã£ã¬ã¯ããªãã¹ã§ãã
--voicelib_dir VOICELIB_DIR
VOICEVOX COREã®ãã£ã¬ã¯ããªãã¹ã§ãã
--runtime_dir RUNTIME_DIR
VOICEVOX COREã§äœ¿çšããã©ã€ãã©ãªã®ãã£ã¬ã¯ããªãã¹ã§ãã
--enable_mock VOICEVOX COREã䜿ããã¢ãã¯ã§é³å£°åæãè¡ããŸãã
--enable_cancellable_synthesis
é³å£°åæãéäžã§ãã£ã³ã»ã«ã§ããããã«ãªããŸãã
--init_processes INIT_PROCESSES
cancellable_synthesisæ©èœã®åæåæã«çæããããã»ã¹æ°ã§ãã
--load_all_models èµ·åæã«å
šãŠã®é³å£°åæã¢ãã«ãèªã¿èŸŒã¿ãŸãã
--cpu_num_threads CPU_NUM_THREADS
é³å£°åæãè¡ãã¹ã¬ããæ°ã§ããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_CPU_NUM_THREADS ã®å€ã䜿ãããŸããVV_CPU_NUM_THREADS ã空æååã§ãªãæ°å€ã§ããªãå Žåã¯ãšã©ãŒçµäºããŸãã
--output_log_utf8 ãã°åºåãUTF-8ã§ãããªããŸããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_OUTPUT_LOG_UTF8 ã®å€ã䜿ãããŸããVV_OUTPUT_LOG_UTF8 ã®å€ã1ã®å Žåã¯UTF-8ã§ã0ãŸãã¯ç©ºæåãå€ããªãå Žåã¯ç°å¢ã«ãã£ãŠèªåçã«æ±ºå®ãããŸãã
--cors_policy_mode {all,localapps}
CORSã®èš±å¯ã¢ãŒããallãŸãã¯localappsãæå®ã§ããŸããallã¯ãã¹ãŠãèš±å¯ããŸããlocalappsã¯ãªãªãžã³éãªãœãŒã¹å
±æããªã·ãŒããapp://.ãšlocalhosté¢é£ããã©ãŠã¶æ¡åŒµURIã«éå®ããŸãããã®ä»ã®ãªãªãžã³ã¯allow_originãªãã·ã§ã³ã§è¿œå ã§ããŸããããã©ã«ãã¯localappsããã®ãªãã·ã§ã³ã¯--
setting_fileã§æå®ãããèšå®ãã¡ã€ã«ãããåªå
ãããŸãã
--allow_origin [ALLOW_ORIGIN ...]
èš±å¯ãããªãªãžã³ãæå®ããŸããã¹ããŒã¹ã§åºåãããšã§è€æ°æå®ã§ããŸãããã®ãªãã·ã§ã³ã¯--setting_fileã§æå®ãããèšå®ãã¡ã€ã«ãããåªå
ãããŸãã
--setting_file SETTING_FILE
èšå®ãã¡ã€ã«ãæå®ã§ããŸãã
--preset_file PRESET_FILE
ããªã»ãããã¡ã€ã«ãæå®ã§ããŸããæå®ããªãå Žåãç°å¢å€æ° VV_PRESET_FILEããŠãŒã¶ãŒãã£ã¬ã¯ããªã®presets.yamlãé ã«æ¢ããŸãã
--disable_mutable_api
èŸæžç»é²ãèšå®å€æŽãªã©ããšã³ãžã³ã®éçãªããŒã¿ã倿ŽããAPIãç¡å¹åããŸããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_DISABLE_MUTABLE_API ã®å€ã䜿ãããŸããVV_DISABLE_MUTABLE_API ã®å€ã1ã®å Žåã¯ç¡å¹åã§ã0ãŸãã¯ç©ºæåãå€ããªãå Žåã¯ç¡èŠãããŸãã
ãããã·ç°å¢ã§ã®å©çš
ãªããŒã¹ãããã·çã§ãšã³ãžã³ãåäœãããå Žåãç°å¢å€æ° FORWARDED_ALLOW_IPS ã«ãããã·ã®IPã¢ãã¬ã¹ãæå®ããããšã§ãAPIãè¿ãURLãªã©ãæ£ããçæãããŸãã
ã¢ããããŒã
ãšã³ãžã³ãã£ã¬ã¯ããªå ã«ãããã¡ã€ã«ãå šãŠæ¶å»ããæ°ãããã®ã«çœ®ãæããŠãã ããã
è²¢ç®è ã¬ã€ã
VOICEVOX ENGINE ã¯çããã®ã³ã³ããªãã¥ãŒã·ã§ã³ããåŸ
ã¡ããŠããŸãïŒ
詳现㯠CONTRIBUTING.md ãã芧ãã ããã
ãŸã VOICEVOX éå
¬åŒ Discord ãµãŒããŒã«ãŠãéçºã®è°è«ãéè«ãè¡ã£ãŠããŸããæ°è»œã«ãåå ãã ããã
ãªããIssue ã解決ãããã«ãªã¯ãšã¹ããäœæãããéã¯ãå¥ã®æ¹ãšåã Issue ã«åãçµãããšãé¿ãããããIssue åŽã§åãçµã¿å§ããããšãäŒããããæåã« Draft ãã«ãªã¯ãšã¹ããäœæããããšãæšå¥šããŠããŸãã
éçºè ã¬ã€ã
ç°å¢æ§ç¯
Python 3.11.9 ãçšããŠéçºãããŠããŸãã
ã€ã³ã¹ããŒã«ããã«ã¯ãå OS ããšã® C/C++ ã³ã³ãã€ã©ãCMake ãå¿
èŠã«ãªããŸãã
ããã±ãŒãžç®¡çããŒã«ã« uv ã䜿çšããŠããŸãã uv ã®ã€ã³ã¹ããŒã«æ¹æ³ã«ã€ããŠã¯å ¬åŒããã¥ã¡ã³ããåèã«ããŠãã ããã
# å®è¡ç°å¢ã®ã€ã³ã¹ããŒã«
uv sync
# éçºç°å¢ã»ãã¹ãç°å¢ã»ãã«ãç°å¢ã®ã€ã³ã¹ããŒã«
uv sync --all-groups
å®è¡
ã³ãã³ãã©ã€ã³åŒæ°ã®è©³çްã¯ä»¥äžã®ã³ãã³ãã§ç¢ºèªããŠãã ããã
uv run run.py --help
# 補åç VOICEVOX ã§ãµãŒããŒãèµ·å
VOICEVOX_DIR="C:/path/to/VOICEVOX/vv-engine" # 補åç VOICEVOX ãã£ã¬ã¯ããªå
ã® ENGINE ã®ãã¹
uv run run.py --voicevox_dir=$VOICEVOX_DIR
# ã¢ãã¯ã§ãµãŒããŒèµ·å
uv run run.py --enable_mock
# ãã°ãUTF8ã«å€æŽ
uv run run.py --output_log_utf8
# ããã㯠VV_OUTPUT_LOG_UTF8=1 uv run run.py
CPU ã¹ã¬ããæ°ãæå®ãã
CPU ã¹ã¬ããæ°ãæªæå®ã®å Žåã¯ãè«çã³ã¢æ°ã®ååã䜿ãããŸããïŒæ®ã©ã® CPU ã§ãããã¯å
šäœã®åŠçèœåã®ååã§ãïŒ
ãã IaaS äžã§å®è¡ããŠããããå°çšãµãŒããŒã§å®è¡ããŠããå Žåãªã©ã
ãšã³ãžã³ã䜿ãåŠçèœåã調ç¯ãããå Žåã¯ãCPU ã¹ã¬ããæ°ãæå®ããããšã§å®çŸã§ããŸãã
- å®è¡æåŒæ°ã§æå®ãã
uv run run.py --voicevox_dir=$VOICEVOX_DIR --cpu_num_threads=4 - ç°å¢å€æ°ã§æå®ãã
export VV_CPU_NUM_THREADS=4 uv run run.py --voicevox_dir=$VOICEVOX_DIR
éå»ã®ããŒãžã§ã³ã®ã³ã¢ã䜿ã
VOICEVOX Core 0.5.4 以éã®ã³ã¢ã䜿çšããäºãå¯èœã§ãã
Mac ã§ã® libtorch çã³ã¢ã®ãµããŒãã¯ããŠããŸããã
éå»ã®ãã€ããªãæå®ãã
補åç VOICEVOX ãããã¯ã³ã³ãã€ã«æžã¿ãšã³ãžã³ã®ãã£ã¬ã¯ããªã--voicevox_diråŒæ°ã§æå®ãããšããã®ããŒãžã§ã³ã®ã³ã¢ã䜿çšãããŸãã
uv run run.py --voicevox_dir="/path/to/VOICEVOX/vv-engine"
Mac ã§ã¯ãDYLD_LIBRARY_PATHã®æå®ãå¿
èŠã§ãã
DYLD_LIBRARY_PATH="/path/to/voicevox" uv run run.py --voicevox_dir="/path/to/VOICEVOX/vv-engine"
é³å£°ã©ã€ãã©ãªãçŽæ¥æå®ãã
VOICEVOX Core ã® zip ãã¡ã€ã«ãè§£åãããã£ã¬ã¯ããªã--voicelib_diråŒæ°ã§æå®ããŸãã
ãŸããã³ã¢ã®ããŒãžã§ã³ã«åãããŠãlibtorchãonnxruntime (å
±æã©ã€ãã©ãª) ã®ãã£ã¬ã¯ããªã--runtime_diråŒæ°ã§æå®ããŸãã
ãã ããã·ã¹ãã ã®æ¢çŽ¢ãã¹äžã« libtorchãonnxruntime ãããå Žåã--runtime_diråŒæ°ã®æå®ã¯äžèŠã§ãã
--voicelib_diråŒæ°ã--runtime_diråŒæ°ã¯è€æ°å䜿çšå¯èœã§ãã
API ãšã³ããã€ã³ãã§ã³ã¢ã®ããŒãžã§ã³ãæå®ããå Žåã¯core_versionåŒæ°ãæå®ããŠãã ãããïŒæªæå®ã®å Žåã¯ææ°ã®ã³ã¢ã䜿çšãããŸãïŒ
uv run run.py --voicelib_dir="/path/to/voicevox_core" --runtime_dir="/path/to/libtorch_or_onnx"
Mac ã§ã¯ã--runtime_diråŒæ°ã®ä»£ããã«DYLD_LIBRARY_PATHã®æå®ãå¿
èŠã§ãã
DYLD_LIBRARY_PATH="/path/to/onnx" uv run run.py --voicelib_dir="/path/to/voicevox_core"
ãŠãŒã¶ãŒãã£ã¬ã¯ããªã«é 眮ãã
以äžã®ãã£ã¬ã¯ããªã«ããé³å£°ã©ã€ãã©ãªã¯èªåã§èªã¿èŸŒãŸããŸãã
- ãã«ãç:
<user_data_dir>/voicevox-engine/core_libraries/ - Python ç:
<user_data_dir>/voicevox-engine-dev/core_libraries/
<user_data_dir>㯠OS ã«ãã£ãŠç°ãªããŸãã
- Windows:
C:\Users\<username>\AppData\Local\ - macOS:
/Users/<username>/Library/Application\ Support/ - Linux:
/home/<username>/.local/share/
ãã«ã
pyinstaller ãçšããããã±ãŒãžåã«ããããŒã«ã«ã§ãã«ããå¯èœã§ãã
æé ã®è©³çŽ°ã¯ è²¢ç®è
ã¬ã€ã#ãã«ã ã埡芧ãã ããã
GitHub ãçšããå Žåãfork ãããªããžããªã§ GitHub Actions ã«ãããã«ããå¯èœã§ãã
Actions ã ON ã«ããworkflow_dispatch ã§build-engine.ymlãèµ·åããã°ãã«ãã§ããŸãã
ææç©ã¯ Release ã«ã¢ããããŒããããŸãã
ãã«ãã«å¿
èŠãª GitHub Actions ã®èšå®ã¯ è²¢ç®è
ã¬ã€ã#GitHub Actions ã埡芧ãã ããã
ãã¹ãã»éçè§£æ
pytest ãçšãããã¹ããšåçš®ãªã³ã¿ãŒãçšããéçè§£æãå¯èœã§ãã
æé ã®è©³çŽ°ã¯ è²¢ç®è
ã¬ã€ã#ãã¹ã, è²¢ç®è
ã¬ã€ã#éçè§£æ ã埡芧ãã ããã
äŸåé¢ä¿
äŸåé¢ä¿ã¯ uv ã§ç®¡çãããŠããŸãããŸããå°å
¥å¯èœãªäŸåã©ã€ãã©ãªã«ã¯ã©ã€ã»ã³ã¹äžã®å¶çŽããããŸãã
詳现㯠貢ç®è
ã¬ã€ã#ããã±ãŒãž ã埡芧ãã ããã
ãã«ããšã³ãžã³æ©èœã«é¢ããŠ
VOICEVOX ãšãã£ã¿ãŒã§ã¯ãè€æ°ã®ãšã³ãžã³ãåæã«èµ·åããããšãã§ããŸãã ãã®æ©èœãå©çšããããšã§ãèªäœã®é³å£°åæãšã³ãžã³ãæ¢åã®é³å£°åæãšã³ãžã³ã VOICEVOX ãšãã£ã¿ãŒäžã§åããããšãå¯èœã§ãã
ãã«ããšã³ãžã³æ©èœã®ä»çµã¿
VOICEVOX API ã«æºæ ããè€æ°ã®ãšã³ãžã³ã® Web API ãããŒããåããŠèµ·åããçµ±äžçã«æ±ãããšã§ãã«ããšã³ãžã³æ©èœãå®çŸããŠããŸãã ãšãã£ã¿ãŒãããããã®ãšã³ãžã³ãå®è¡ãã€ããªçµç±ã§èµ·åããEngineID ãšçµã³ã€ããŠèšå®ãç¶æ ãåå¥ç®¡çããŸãã
ãã«ããšã³ãžã³æ©èœãžã®å¯Ÿå¿æ¹æ³
VOICEVOX API æºæ ãšã³ãžã³ãèµ·åããå®è¡ãã€ããªãäœãããšã§å¯Ÿå¿ãå¯èœã§ãã VOICEVOX ENGINE ãªããžããªã fork ããäžéšã®æ©èœãæ¹é ããã®ãç°¡åã§ãã
æ¹é ãã¹ãç¹ã¯ãšã³ãžã³æ å ±ã»ãã£ã©ã¯ã¿ãŒæ å ±ã»é³å£°åæã®ïŒç¹ã§ãã
ãšã³ãžã³ã®æ
å ±ã¯ã«ãŒãçŽäžã®ãããã§ã¹ããã¡ã€ã«ïŒengine_manifest.jsonïŒã§ç®¡çãããŠããŸãã
ãã®åœ¢åŒã®ãããã§ã¹ããã¡ã€ã«ã¯ VOICEVOX API æºæ ãšã³ãžã³ã«å¿
é ã§ãã
ãããã§ã¹ããã¡ã€ã«å
ã®æ
å ±ãèŠãŠé©å®å€æŽããŠãã ããã
é³å£°åæææ³ã«ãã£ãŠã¯ãäŸãã°ã¢ãŒãã£ã³ã°æ©èœãªã©ãVOICEVOX ãšåãæ©èœãæã€ããšãã§ããªãå ŽåããããŸãã
ãã®å Žåã¯ãããã§ã¹ããã¡ã€ã«å
ã®supported_featureså
ã®æ
å ±ãé©å®å€æŽããŠãã ããã
ãã£ã©ã¯ã¿ãŒæ
å ±ã¯resources/character_infoãã£ã¬ã¯ããªå
ã®ãã¡ã€ã«ã§ç®¡çãããŠããŸãã
ãããŒã®ã¢ã€ã³ã³ãªã©ãçšæãããŠããã®ã§é©å®å€æŽããŠãã ããã
é³å£°åæã¯voicevox_engine/tts_pipeline/tts_engine.pyã§è¡ãããŠããŸãã
VOICEVOX API ã§ã®é³å£°åæã¯ããšã³ãžã³åŽã§é³å£°åæçšã®ã¯ãšãª AudioQuery ã®åæå€ãäœæããŠãŠãŒã¶ãŒã«è¿ãããŠãŒã¶ãŒãå¿
èŠã«å¿ããŠã¯ãšãªãç·šéããããšããšã³ãžã³ãã¯ãšãªã«åŸã£ãŠé³å£°åæããããšã§å®çŸããŠããŸãã
ã¯ãšãªäœæã¯/audio_queryãšã³ããã€ã³ãã§ãé³å£°åæã¯/synthesisãšã³ããã€ã³ãã§è¡ã£ãŠãããæäœãã®ïŒã€ã«å¯Ÿå¿ããã° VOICEVOX API ã«æºæ ããããšã«ãªããŸãã
ãã«ããšã³ãžã³æ©èœå¯Ÿå¿ãšã³ãžã³ã®é åžæ¹æ³
VVPP ãã¡ã€ã«ãšããŠé
åžããã®ãããããã§ãã
VVPP ã¯ãVOICEVOX ãã©ã°ã€ã³ããã±ãŒãžãã®ç¥ã§ãäžèº«ã¯ãã«ããããšã³ãžã³ãªã©ãå«ãã ãã£ã¬ã¯ããªã® Zip ãã¡ã€ã«ã§ãã
æ¡åŒµåã.vvppã«ãããšãããã«ã¯ãªãã¯ã§ VOICEVOX ãšãã£ã¿ãŒã«ã€ã³ã¹ããŒã«ã§ããŸãã
ãšãã£ã¿ãŒåŽã¯åãåã£ã VVPP ãã¡ã€ã«ãããŒã«ã«ãã£ã¹ã¯äžã« Zip å±éããããšãã«ãŒãã®çŽäžã«ããengine_manifest.jsonã«åŸã£ãŠãã¡ã€ã«ãæ¢æ»ããŸãã
VOICEVOX ãšãã£ã¿ãŒã«ããŸãèªã¿èŸŒãŸããããªããšãã¯ããšãã£ã¿ãŒã®ãšã©ãŒãã°ãåç
§ããŠãã ããã
ãŸããxxx.vvppã¯åå²ããŠé£çªãä»ããxxx.0.vvpppãã¡ã€ã«ãšããŠé
åžããããšãå¯èœã§ãã
ããã¯ãã¡ã€ã«å®¹éã倧ãããŠé
åžãå°é£ãªå Žåã«æçšã§ãã
ã€ã³ã¹ããŒã«ã«å¿
èŠãªvvppããã³vvpppãã¡ã€ã«ã¯vvpp.txtãã¡ã€ã«ã«ãªã¹ãã¢ããããŠããŸãã
äºäŸç޹ä»
voicevox-client @voicevox-client  VOICEVOX ENGINE ã®åèšèªåã API ã©ãããŒ
ã©ã€ã»ã³ã¹
LGPL v3 ãšããœãŒã¹ã³ãŒãã®å
¬éãäžèŠãªå¥ã©ã€ã»ã³ã¹ã®ãã¥ã¢ã«ã©ã€ã»ã³ã¹ã§ãã
å¥ã©ã€ã»ã³ã¹ãååŸãããå Žåã¯ãããã«æ±ããŠãã ããã
X ã¢ã«ãŠã³ã: @hiho_karuta