Whilst shopping for some unneeded things on eBay, I stumbled across a listing
for an antique voice synthesiser chip, the General Instruments SP-0256-AL2,
long since out-of-production. Apparently they’re in demand, because the ones
I watched all got sniped seconds before auction end. Anyway, I remembered
having a few back in the day, so I rummaged, and Lo! and Behold! there they
were, replete with their odd valued crystal. So I decided to strap one on
to my recently resuscitated NP2 and relive the glory days of scarcely
intelligible robotic voice.
The synthesizer works by sending a sequence of ‘allophones’ (context-optimised
variants of phonemes). You are left with the chore of figuring out what
sequence of allophones renders into an intelligible word. This is tolerable
for a fixed vocabulary, but less-than fun if it is anything larger than a
This was intended to be a few hours diversion — and so it was — but diversion
leads to obsession, and so I wanted to make more sounds beyond the intial
‘hello’ I used for testing. There was a companion chip, the CTS-256, which
apparently encoded some research from the US Navy on rule-based transformation.
I do have one of those — never used it — but this seemed like something I
would like to try to do in software.
I did find the US Navy research, as well as some code from the mid-80s that
implemented it, and some additional work to improve the results, so I set
out to make a version that worked on the Netduino.
It was a challenge, because there are a couple thousand rules, and NetMF is
particularly bad at ROM-able data. In C you would typically declare a
static object as const, maybe use some compiler-specific directive to help
out the linker, and all the data would be put in ROM (Flash), not taking
up even a byte of RAM. But no combination of static, const, readonly, would
make NetMF do this. Additionally, a component called the MetaDataProcessor
would fail if you had too many data intialization, and the project would
not even compile, with rather puts a damper on things. So this project
wound up being several days long as I experimented with different techniques
to get even a reduced set to be usable.
Along the way, I did discover these things:
* don’t use too many intializers, MetaDataProcessor will fail. It seems
to be a function of the count of intializers, not of the aggregate size.
* NetMF has some sense about one thing: string literals. The reference
may be stored in RAM, but unless you modify that, the data backing the
object will be kept in Flash.
So, at length, I packed my rules into strings, and I unpack them on-the-fly
back into C# objects as when needed. This reduced the memory from infinite
(couldn’t compile), to horrible (compiled but had to delete a large number
of rules, leaving less than 20K RAM for app), to about 20-25k. Out of
curiosity, I tried making all the rules null, and found no change in memory
usage, so apparently that 25k is just the overhead of having an array of about
2000 strings. This technique (of depersisting the object as needed, then
discarding) is markedly slow, but it doesn’t really matter in my case because
it is still much much faster than the actual speech output, so you don’t
notice since I parallelize those activities in two threads.
For your amusement, here is a link showing (sounding?) that device in action:
Also for your amusement is the source code to the project:
As I said, it’s an antique chip, presumably long out of production, but if
you think it’s fun, then they are to be had on eBay. Also, you can use a
common color-burst crystal instead of the unusual 3.12 MHz one. It will
be higher pitch, and faster, so slightly less intelligible. Some claim the
device is at risk of lockups with the higher frequency crystal.
If I get really obsessed, I’ll try to emulate the digital filter used in the
synthesis chip itself, rendering a purely software implementation. That
should be a challenge….