API Reference


samtts

SAMTTS

A Python port of Software Automatic Mouth Test-To-Speech program.

  • Ported by: Quan Lin
  • License: None

samtts.Reciter

Reciter converts text to phonemes.

Parameters:
  • debug (bool, default: False ) –

    Set or clear debug flag.

samtts.Reciter.text_to_phonemes(input_text)

Convert text to phonemes.

Parameters:
  • input_text (str | bytes | bytearray) –

    The input text to convert.

Returns:
  • bytearray

    The phonemes bytearray.

samtts.Processor

Processor takes phonemes and prepares output parameters.

Parameters:
  • debug (bool, default: False ) –

    Set or clear debug flag.

samtts.Processor.process(input_phonemes)

Process the phonemes and prepare output parameters.

When it is successful, the output parameters are stored in:

  • self.phoneme_index
  • self.phoneme_length
  • self.stress
Parameters:
  • input_phonemes (str | bytes | bytearray) –

    The input phonemes to process.

Returns:
  • bool

    Whether the phonemes are processed successfully.

samtts.Renderer

Renderer takes the phoneme parameters and renders sound waveform.

Parameters:
  • speed (int, default: 72 ) –

    Set speed value.

  • pitch (int, default: 64 ) –

    Set pitch value.

  • mouth (int, default: 128 ) –

    Set mouth value.

  • throat (int, default: 128 ) –

    Set throat value.

  • sing_mode (bool, default: False ) –

    Set or clear sing_mode flag.

  • buffer_size (int, default: 220500 ) –

    Set a large enough buffer size for rendering.

  • debug (bool, default: False ) –

    Set or clear debug flag.

samtts.Renderer.config(speed=None, pitch=None, mouth=None, throat=None, sing_mode=None)

Configure renderer parameters.

Parameters:
  • speed (int | None, default: None ) –

    Set speed value.

  • pitch (int | None, default: None ) –

    Set pitch value.

  • mouth (int | None, default: None ) –

    Set mouth value.

  • throat (int | None, default: None ) –

    Set throat value.

  • sing_mode (bool | None, default: None ) –

    Set or clear sing_mode flag.

samtts.Renderer.render(processor)

Render sound waveform.

When it is successful, the audio data is stored in self.buffer. And the length of the valid data is stored in self.buffer_end.

Parameters:
  • processor (Processor) –

    A Processor instance that has output parameters prepared.

Returns:
  • bool

    Whether the sound waveform are rendered successfully.

samtts.SamTTS

SamTTS combines Reciter, Processor and Renderer together.

Parameters:
  • speed (int, default: 72 ) –

    Set speed value.

  • pitch (int, default: 64 ) –

    Set pitch value.

  • mouth (int, default: 128 ) –

    Set mouth value.

  • throat (int, default: 128 ) –

    Set throat value.

  • sing_mode (bool, default: False ) –

    Set or clear sing_mode flag.

  • buffer_size (int, default: 220500 ) –

    Set a large enough buffer size for rendering.

  • debug (bool, default: False ) –

    Set or clear debug flag.

samtts.SamTTS.get_audio_data(input_data, phonetic=False, speed=None, pitch=None, mouth=None, throat=None, sing_mode=None, sample_rate=22050)

Get audio data from input text or phonemes.

It can only process very short inputs.

Parameters:
  • input_data (str | bytes | bytearray) –

    The input text or phonemes.

  • phonetic (bool, default: False ) –

    The flag indicates if the input is phonemes.

  • speed (int | None, default: None ) –

    Set speed value.

  • pitch (int | None, default: None ) –

    Set pitch value.

  • mouth (int | None, default: None ) –

    Set mouth value.

  • throat (int | None, default: None ) –

    Set throat value.

  • sing_mode (bool | None, default: None ) –

    Set or clear sing_mode flag.

  • sample_rate (int, default: 22050 ) –

    The sample rate of the audio data. It can be one of 5513, 11025 and 22050.

Returns:
  • bytearray

    The rendered audio data bytearray.

samtts.SamTTS.iter_audio_data_from_paragraph(paragraph, phonetic=False, speed=None, pitch=None, mouth=None, throat=None, sing_mode=None, sample_rate=22050, iter_segments_from_paragraph=iter_by_punctuations)

Get audio data from a paragraph segment by segment.

Parameters:
  • paragraph (str) –

    The input string paragraph.

  • phonetic (bool, default: False ) –

    The flag indicates if the input is phonemes.

  • speed (int | None, default: None ) –

    Set speed value.

  • pitch (int | None, default: None ) –

    Set pitch value.

  • mouth (int | None, default: None ) –

    Set mouth value.

  • throat (int | None, default: None ) –

    Set throat value.

  • sing_mode (bool | None, default: None ) –

    Set or clear sing_mode flag.

  • sample_rate (int, default: 22050 ) –

    The sample rate of the audio data. It can be one of 5513, 11025 and 22050.

  • iter_segments_from_paragraph (Callable, default: iter_by_punctuations ) –

    The iter_segments_from_paragraph function whose signature is:

    iter_segments_from_paragraph(paragraph: str) -> Iterable[str]
    
Yields:
  • Iterable[bytearray]

    Audio data.

samtts.SamTTS.save(paragraph, output_file_path, phonetic=False, speed=None, pitch=None, mouth=None, throat=None, sing_mode=None, sample_rate=22050, iter_segments_from_paragraph=iter_by_punctuations, save_audio_data=save_audio_data_in_wav_format)

Save audio data from a paragraph to output file.

Parameters:
  • paragraph (str) –

    The input paragraph.

  • output_file_path (str) –

    The path of the output file.

  • phonetic (bool, default: False ) –

    The flag indicates if the input is phonemes.

  • speed (int | None, default: None ) –

    Set speed value.

  • pitch (int | None, default: None ) –

    Set pitch value.

  • mouth (int | None, default: None ) –

    Set mouth value.

  • throat (int | None, default: None ) –

    Set throat value.

  • sing_mode (bool | None, default: None ) –

    Set or clear sing_mode flag.

  • sample_rate (int, default: 22050 ) –

    The sample rate of the audio data. It can be one of 5513, 11025 and 22050.

  • iter_segments_from_paragraph (Callable, default: iter_by_punctuations ) –

    The iter_segments_from_paragraph function whose signature is:

    iter_segments_from_paragraph(paragraph: str) -> Iterable[str]
    
  • save_audio_data (Callable, default: save_audio_data_in_wav_format ) –

    The save_audio_data function whose signature is:

    save_audio_data(
        audio_data: bytes | bytearray,
        output_file_path: str,
        num_channels: int,
        bytes_per_sample: int,
        sample_rate: int,
    )
    

samtts.SamTTS.play(paragraph, phonetic=False, speed=None, pitch=None, mouth=None, throat=None, sing_mode=None, sample_rate=22050, iter_segments_from_paragraph=iter_by_punctuations, play_audio_data=play_audio_data_with_simpleaudio)

Play audio data from a paragraph.

Parameters:
  • paragraph (str) –

    The input paragraph.

  • phonetic (bool, default: False ) –

    The flag indicates if the input is phonemes.

  • speed (int | None, default: None ) –

    Set speed value.

  • pitch (int | None, default: None ) –

    Set pitch value.

  • mouth (int | None, default: None ) –

    Set mouth value.

  • throat (int | None, default: None ) –

    Set throat value.

  • sing_mode (bool | None, default: None ) –

    Set or clear sing_mode flag.

  • sample_rate (int, default: 22050 ) –

    The sample rate of the audio data. It can be one of 5513, 11025 and 22050.

  • iter_segments_from_paragraph (Callable, default: iter_by_punctuations ) –

    The iter_segments_from_paragraph function whose signature is:

    iter_segments_from_paragraph(paragraph: str) -> Iterable[str]
    
  • play_audio_data (Callable, default: play_audio_data_with_simpleaudio ) –

    The play_audio_data function whose signature is:

    play_audio_data(
        audio_data: bytes | bytearray,
        num_channels: int,
        bytes_per_sample: int,
        sample_rate: int,
    )
    

samtts.SamTTS.async_play(paragraph, phonetic=False, speed=None, pitch=None, mouth=None, throat=None, sing_mode=None, sample_rate=22050, iter_segments_from_paragraph=iter_by_punctuations, async_play_audio_data=async_play_audio_data_with_simpleaudio) async

Async play audio data from a paragraph.

Parameters:
  • paragraph (str) –

    The input paragraph.

  • phonetic (bool, default: False ) –

    The flag indicates if the input is phonemes.

  • speed (int | None, default: None ) –

    Set speed value.

  • pitch (int | None, default: None ) –

    Set pitch value.

  • mouth (int | None, default: None ) –

    Set mouth value.

  • throat (int | None, default: None ) –

    Set throat value.

  • sing_mode (bool | None, default: None ) –

    Set or clear sing_mode flag.

  • sample_rate (int, default: 22050 ) –

    The sample rate of the audio data. It can be one of 5513, 11025 and 22050.

  • iter_segments_from_paragraph (Callable, default: iter_by_punctuations ) –

    The iter_segments_from_paragraph function whose signature is:

    iter_segments_from_paragraph(paragraph: str) -> Iterable[str]
    
  • async_play_audio_data (Awaitable, default: async_play_audio_data_with_simpleaudio ) –

    The async_play_audio_data function whose signature is:

    async_play_audio_data(
        audio_data: bytes | bytearray,
        num_channels: int,
        bytes_per_sample: int,
        sample_rate: int,
    )