Categories
Uncategorized

Accessibility On Linux Part 1: Introducing vsss

Hello everyone and welcome back to another edition of Piusbird attempts to build his portfolio.  Also known as somebody please hire me please; I’m competent I swear.  In a previous post I said I would outline how my custom screen reader worked, and more importantly how to get it working in a non-me context. Well ladies and gentlemen the time has come.  Here’s a quick primer on the Very Simple Speech Service.

Design Explained

  Upon cloning the code from git you might be tempted to question my sanity.  It is after all a polyglot program composed of one D-Bus-based micro-demon, a large and seemingly complicated shell script and an optional screen handling routine written in C..  But I promise you this Goldberg-esque madness was all rationally designed.

The Method To My Madness

I needed a speech system that could be deployed on any Linux or UNIX system with X11. Which depended only on those things which I could commonly find on the systems I was using at the time.  With minimal if any required additional package installations and especially no additional python module installations at all. Thus it is written mostly in bash. The second design constraint that dictated I write most of this in shell script was it needed to be adaptable to any environment that I came across as I had no guarantee of root access to any system.  Thus it is possible by simply changing a couple of variables to do without the Python micro service or the fancy screen handling stuff.

The last design challenge which made my odd choice of language reasonable was I only had 12 hours to get the first version out the door. As I remember it had something to do with finals week of 2013  

This means that the version on GitHub is specifically configured for my setup and what we will be doing in the remainder of this post is adapting it to yours or attempting to at least.

Before we get started in earnest I should mention that when I say X11 I really do mean X11.  Logically there is no reason why it shouldn’t work on Wayland. I get odd warnings from weird places when I have tried it and since I see no reason to use Wayland yet I have not looked into it further.

To The Code

I mentioned earlier that my code was highly adaptable for any environment. And while that is true there is one hard and fast dependency. This is of course a software speech synthesizer. It can be any one you’d like, however festival or espeak-ng are commonly installed by most distributions. The remainder of this guide assumes we will be using espeak. Simply because that’s what I have in the VM I’m testing this with.

The first file you’ll need to modify is called vsss.conf.in which looks like.

vsss.conf.in

VOX="Callie" 
audio_bckend="padsp" 
rate=200
spkedit="pluma" 
QT_SELECT=4; 
export QT_SELECT

PIPE_COLOR="1;33;44m"
export PIPE_COLOR
speak_bckend() {

    if [ -f /tmp/vsss.lock ]
    then
	echo "Speech output is currently in use"
	return 0
    else
	touch /tmp/vsss.lock
    fi
    $audio_bckend swift -n $VOX -p "speech/rate=$rate" -f $1 -m text -t | colorize-pipe 
    rm /tmp/vsss.lock
    return 1

}

This File has one and only one job. To define the speak_bckend function and any supporting variables, or other functions it may need. This function is what actually does the speech synthesis, and takes one parameter. A file name which contains the text to be spoken. In my setup this function depends heavily on Cepstral Swift, and it’s quirks. Let’s change it to make it use espeak.

# Espeak Version 
rate=200
spkedit="nano" 
speak_bckend() {

    if [ -f /tmp/vsss.lock ]
    then
	echo "Speech output is currently in use"
	return 0
    else
	touch /tmp/vsss.lock
    fi
    espeak -f $1 -s $rate
    rm /tmp/vsss.lock
    return 1

}

Note a couple of things here. First it is best practice to implement a lockfile mechanism, before allowing the speech synthesizer to execute. Unless your a fan of symphonic chaos of course. Also note i set the speech rate to 200 words per minute. To those unpracticed with text to speech this can seem almost incomprehensibly fast. But keep in mind this is actually 100 words a minute slower than your average adult reading with their eyeballs. If your having trouble understanding the computer slow it down to about 165.

I’ve committed the espeak config file to github so you should be able to just copy it over top of mine, and done. And please note I’m always open to merging pull requests for more back ends.

Rarer Modifications

The next two modifications I will show only apply if you don’t have QT or dbus installed. In this case you will need to comment out line 9 of vsss,

And finally change line 20 in vsss_cmd.sh to fetch your primary clipboard without using my dbus service something like

xsel -b  

Should work fine

Ready, Set Go!

Assuming everything went according to plan and you are running the latest git, from Friday 28th January 2022. You should now be able to run ./vsss and be greeted with something like

Very Simple Speech Shell
Version 0.3.4+test
>>>


Conclusion

I hope this was enough to get the prospective user of my very odd reading software through the process of setting it up/ The last post in the series will cover actually using it to get work done.

Categories
Uncategorized

On Accessibility for Linux Part 0 Computing while Disabled

Note a version of this post first appeared in my Mastodon Feed yesterday. This is an extended version with more detail. And is post one of my attempt at the #100DaysToOffload challenge

The Issue

I use a custom built screen reader-ish program to well read stuff on the computer.  Although screen reader is a bit of a misnomer. It’ really more of a mutant hybrid between a screen reader, and a program meant to aide dislexics. I call it vsss you can download it from my github 

I will make a post on how to get it working in a non me sort of context later possibly tomrrow. In the meantime I’ll say you need to rewrite vsss.conf.in for your system. So far as I’m aware Cepstral Swift is the only speech engine that supports the hooks needed for the fancy on screen graphics

When i upgraded to #Fedora 35 on Thursday it stopped working. Not a problem in my code. I checked. Here’s what went down, and why i’m so mad. Basically you have three options for text to speech on linux. First Espeak, Second Svox, (android tts) Third proprietary software synthesizer. And yes i know about the CMU stuff and hardware options. But for various reasons those aren’t viable in my case

For various reasons i’ve used option 3 for the better part of 15 years now. And changing my computer voice now would be a huge adjustment. So the dirty little secret of most non-free speech synthesizers is they treat Linux/Unix as a third class platform. i.e most of the decently priced ones are still using OSS apis in 2021This hasn’t been a problem as pulseaudio has this nice LD_PRELOAD shim, that turns OSS apps into regular pulse clients.

You wouldn’t think this would be a big deal for pipewire either; it is backward compatible with pulseaudio clients after all. Turns out it’s not. But it turns out that for some reason that was not documented anywhere i could find #Fedora dropped the shim for it in a recent update.

Breaking Changes Strike Again

All the Changelog really says is that OSS, among other things is no longer supported. Without explaining why.  I suspect it’s because almost no one uses OSS APIs anymore. ALSA has been around for 19 years now, and pulseaudio for ten. But I’m stuck with a binary blob compiled in 2012. Which from the tiny bit reverse engineering. I had to do for this project last saw major code changes, in 2007.

A Convoluted and painful Journey

There is no technical reason why legacy OSS apps can’t use the padsp shim to connect to a pipewire server. In fact I have this working. But in order to get it working I had to.

  • Figure out that padsp had been removed from Fedora’s pulseaudio package
  • Attempt to revert the change in the source rpm.– Watch that fail spectacularly
  • Fish through upstream git to determine for a bit, reading the source code of the missing component determine that yes my theory was technically sound
  • Uninstall Fedora’s pulseaudio, try replacing that with upstream git build.
  • Watch as my entire sound system explodes.
  • Revert that
  • Recompile pulseaudio again this time installing it under /opt — play games with the linker so only the programs that actually need the replacement pulseaudio libs can see them
  • IT WORKS

All this with diminished functionality in the reading software i depend on. All told this took 2 days 4 hrs and 21 minutes to figure out.

The Takeaway

Moral of the story. An seemingly inconsequential change to you. May have catastrophic effect, on users with disabilities.

I was literally fired from my first job out of college four months in because of a Computer Accessibility issue I was unable to solve in a timely fashion. Recently I had to drop out of Graduate School for similar reasons.

FOSS has always had the potential to be the great equalizer bringing the digital revolution to the most marginalized and all that uplifting blah blah blah from my youth. And in my case it worked out I was able to totally replace a piece of software which ranges in price from $700 to $10,000 depending on vendor, feature set and so forth. With what is lets face it a radioactive shell script horror. From Ken Thomson’s nightmares.

I was able to free myself from the lifetime of constant hardware and software upgrade costs that are often imposed on neurodivergent folks. In order to do basic things like read and write. Which is good, I doubt I would’ve completed college successfully without it.

But to a disabled person without my skillset the whole movement is a dead letter. Heck even if I was able to replicate my setup for someone. It is fragile as we have seen. We need to get better as a community at not breaking user space as Torvalds might say.

Preemptive Troll Management

Now one might reasonably say “Why are you using Fedora if you need a stable platform”, To which I say why should my disability exclude me from the latest and greatest, Pipewire can do amazing stuff and I was genuinely excited about it. If not for this seemingly random and unnecessary dropping of an essential tool for me, i would be quite happily playing with wireplumber and things if that nature.

Creative Commons License
Except where otherwise noted, the content on this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.