SAMTTS

A Python port of Software Automatic Mouth Text-To-Speech program.

Ported by: Quan Lin
License: None

WARNING: This project is not under any open source software license. Use it at your own risk.

It is tested on Windows with Python 3.12.10.

Change

v0.3.0: Switched from simpleaudio to pyaudio backend.

What is SAM?

SAM is the Text-To-Speech (TTS) software SAM (Software Automatic Mouth) for the Commodore C64 published in the year 1982 by Don't Ask Software (now SoftVoice, Inc.).

This project is an unofficial Python port of SAM. It is translated by hand from the adaption to C by Stefan Macke and the refactorings by Vidar Hokstad.

Installation

To install samtts along with pyaudio and CLI:

pip install samtts

To install samtts without pyaudio and CLI:

pip install --no-deps samtts

Usage

Use `samtts` in Python script

The minimum example:

from samtts import SamTTS

SamTTS().play("Hello. My name is Sam.")

A conversation between Sam and Little Robot:

from samtts import SamTTS

# The default config is Sam.
sam = SamTTS()
# Config SamTTS for a different character.
robot = SamTTS(speed=92, pitch=60, mouth=190, throat=190)

sam.play("Hello. Little Robot. How are you today?")
robot.play("Hello! I am functioning well, thank you. How can I assist you today?")
sam.play("Could you hand me the hammer please?")
robot.play("Of course! Here you are.")
sam.play("Thank you very much!")

SamTTS does not pronouce all the words correctly. Sometimes you may want to use phonemes directly. Phonemes are powerful and flexible. But make sure the phonemes are valid, otherwise it will raise exceptions.

from samtts import SamTTS

# Make SamTTS say "Hello. My name is Sam." in phonemes.
SamTTS().play("/HEHLOH3OW. MAY4 NEY4M IHZ SAE4M.", phonetic=True)

Make SamTTS sing:

from samtts import SamTTS

singer = SamTTS(speed=200, mouth=90, throat=90, sing_mode=True)
for pitch in (52, 41, 34, 41, 52):
    singer.play("AHAHAHAHAHAHAHAH", phonetic=True, pitch=pitch)

Save the audio data generated by SamTTS to a wav file:

from samtts import SamTTS

SamTTS().save("Hello. My name is Sam.", "output.wav")

Use SamTTS with asyncio:

import asyncio
from samtts import SamTTS

asyncio.run(SamTTS().async_play("Hello. My name is Sam."))

The core of samtts consists of Reciter, Processor and Renderer. SamTTS is a combination of the three. Reciter converts text to phonemes. Processor and Renderer turns phonemes into audio data in bytearray. But they can only process very short inputs. To work around this, SamTTS splits the input paragraph by punctuations !,.:;?. It works for most of the cases, but not always. You can design your own functions to split the input paragraph.

Make SamTTS read the paragraph word by word:

from samtts import SamTTS

def iter_by_space(paragraph):
    for item in paragraph.split():
        yield item

SamTTS().play(
    "Hello. My name is Sam.",
    iter_segments_from_paragraph = iter_by_space,
)

In case you know your input is very small, you do not have to split it at all:

from samtts import SamTTS

def iter_no_split(paragraph):
    yield paragraph

SamTTS().play(
    "Hello. My name is Sam.",
    iter_segments_from_paragraph = iter_no_split,
)

By default SamTTS saves audio data in wav format. But you can design your own save function to save audio data in other formats:

from samtts import SamTTS

# Make sure this function signature is followed.
def save_audio_data_in_other_formats(
    audio_data: bytes | bytearray,
    output_file_path: str,
    num_channels: int = 1,
    bytes_per_sample: int = 1,
    sample_rate: int = 22050,
):
    ...

SamTTS().save(
    "Hello. My name is Sam.",
    "output.ext",
    save_audio_data = save_audio_data_in_other_formats,
)

By default SamTTS plays audio with pyaudio backend. In case pyaudio does not work for your platform, you can design your own play audio function to play audio with other audio backends:

from samtts import SamTTS

# Make sure this function signature is followed.
def play_audio_data_with_other_backends(
    audio_data: bytes | bytearray,
    num_channels: int = 1,
    bytes_per_sample: int = 1,
    sample_rate: int = 22050,
):
    ...

SamTTS().play(
    "Hello. My name is Sam.",
    play_audio_data = play_audio_data_with_other_backends,
)

The core of samtts (Reciter, Processor and Renderer) does not depend on any 3rd party or even built-in libraries. For finer control, you can use them directly:

import pyaudio
from samtts import Reciter, Processor, Renderer

def pyaudio_play_buffer(
    audio_data: bytes | bytearray,
    num_channels: int = 1,
    bytes_per_sample: int = 1,
    sample_rate: int = 22050,
):
    p = pyaudio.PyAudio()

    try:
        if bytes_per_sample == 1:
            audio_format = pyaudio.paUInt8
        else:
            audio_format = p.get_format_from_width(bytes_per_sample)

        stream = p.open(
            format=audio_format,
            channels=num_channels,
            rate=sample_rate,
            output=True,
        )

        try:
            stream.write(bytes(audio_data))
        finally:
            stream.stop_stream()
            stream.close()

    finally:
        p.terminate()

reciter = Reciter()
processor = Processor()
renderer = Renderer()

input_text = "Hello. My name is Sam. How are you?"
print(f"{input_text = }")

phonemes = reciter.text_to_phonemes(input_text)
print(f"{phonemes = }")

processor.process(phonemes)
renderer.render(processor)

print(f"{renderer.buffer_end = }")
print(f"The first 100 bytes in the buffer: {renderer.buffer[: 100]}")

pyaudio_play_buffer(
    renderer.buffer[: renderer.buffer_end],
    num_channels=1,
    bytes_per_sample=1,
    sample_rate=22050,
)

There are more examples in examples directory.

Use `samtts` with command line interface

To get help information:

python -m samtts

 Usage: python -m samtts [OPTIONS] [INPUT_STRING]

 A Python port of Software Automatic Mouth Text-To-Speech program.
 - If `--phoneme-info` or `--pitch-info` is used, the argument and all the other  
 options are ignored.
 - If `--phonetic` is used, the input must be valid phonemes.

╭─ Arguments ────────────────────────────────────────────────────────────────────╮
│   input_string      [INPUT_STRING]  Input text or phonemes.                    │
╰────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────╮
│ --phoneme-info          --no-phoneme-info             Show phoneme info.       │
│                                                       [default:                │
│                                                       no-phoneme-info]         │
│ --pitch-info            --no-pitch-info               Show pitch info.         │
│                                                       [default: no-pitch-info] │
│ --phonetic              --no-phonetic                 Set phonetic flag.       │
│                                                       [default: no-phonetic]   │
│ --speed                                      INTEGER  Set speed value.         │
│                                                       [default: 72]            │
│ --pitch                                      INTEGER  Set pitch value.         │
│                                                       [default: 64]            │
│ --mouth                                      INTEGER  Set mouth value.         │
│                                                       [default: 128]           │
│ --throat                                     INTEGER  Set throat value.        │
│                                                       [default: 128]           │
│ --sing                  --no-sing                     Set sing mode.           │
│                                                       [default: no-sing]       │
│ --sample-rate                                INTEGER  Set sample rate 11025 or │
│                                                       22050.                   │
│                                                       [default: 22050]         │
│ --wav                                        TEXT     Set output wav file name │
│                                                       or path.                 │
│ --debug                 --no-debug                    Set debug flag.          │
│                                                       [default: no-debug]      │
│ --install-completion                                  Install completion for   │
│                                                       the current shell.       │
│ --show-completion                                     Show completion for the  │
│                                                       current shell, to copy   │
│                                                       it or customize the      │
│                                                       installation.            │
│ --help                                                Show this message and    │
│                                                       exit.                    │
╰────────────────────────────────────────────────────────────────────────────────╯

The minimum example:

python -m samtts "Hello. My name is Sam."

To config its voice:

python -m samtts --speed 92 --pitch 60 --mouth 190 --throat 190 "Hello. My name is Little Robot."

To save to a wav file:

python -m samtts --wav "output.wav" "Hello. My name is Sam."

Useful information

Phonemes

                 Phoneme Information

     VOWELS                             VOICED CONSONANTS
IY           f(ee)t                     R        red
IH           p(i)n                      L        allow
EH           beg                        W        away
AE           Sam                        W        whale
AA           pot                        Y        you
AH           b(u)dget                   M        Sam
AO           t(al)k                     N        man
OH           cone                       NX       so(ng)
UH           book                       B        bad
UX           l(oo)t                     D        dog
ER           bird                       G        again
AX           gall(o)n                   J        judge
IX           dig(i)t                    Z        zoo
                                        ZH       plea(s)ure
   DIPHTHONGS                           V        seven
EY           m(a)de                     DH       (th)en
AY           h(igh)
OY           boy
AW           h(ow)                      UNVOICED CONSONANTS
OW           slow                       S         Sam
UW           crew                       Sh        fish
                                        F         fish
                                        TH        thin
 SPECIAL PHONEMES                       P         poke
UL           sett(le) (=AXL)            T         talk
UM           astron(omy) (=AXM)         K         cake
UN           functi(on) (=AXN)          CH        speech
Q            kitt-en (glottal stop)     /H        a(h)ead

Pitches

                  Pitch Information

PITCH   NOTE    |   PITCH   NOTE    |   PITCH   NOTE
 104     C1     |     52     C2     |     26     C3
  92     D1     |     46     D2     |     23     D3
  82     E1     |     41     E2     |     21     E3
  78     F1     |     39     F2     |     19     F3
  68     G1     |     34     G2     |     17     G3
  62     A1     |     31     A2     |
  55     B1     |     28     B2     |

Characters

DESCRIPTION         SPEED   PITCH   MOUTH   THROAT
Elf                  72      64      160     110
Little Robot         92      60      190     190
Stuffy Guy           82      72      105     110
Little Old Lady      82      32      145     145
Extra-Terrestrial   100      64      200     150
SAM                  72      64      128     128

Limitations

SAM was developed more than 40 years ago. It was advanced in 1980s. But now its sound quality is not comparable to AI based TTS programs.
The core of SAM can only process very short inputs. To work around this, long inputs must be split.
SAM does not pronouce all the words correctly. To work around this, phonemes can be used directly. But make sure the phonemes are valid, otherwise it will raise exceptions.

Further development

This project is meant to be a fairly faithful port of the original SAM. It will not improve upon SAM in any manner, like improving the quality of the sound or breaking the limitations of SAM.

The further development of this project is limited to bug fixing. If anyone is interested in improving it, please fork it and start a new project.

About license

According to Stefan Macke and Vidar Hokstad the status of the original software can be best described as Abandonware.

Neither Stefan Macke nor Vidar Hokstad put their projects under any open source software license. As long this is the case I cannot put my code under any open source software license either. However the software might be used under the "Fair Use" act in the USA.