I believe screen reading software stands at a crossroads right now. At Google I/O 2013, Google showed some of the possibilities of the ChromeVox API. What they demonstrated showed some fundamental changes in the ways screen reader software interacts with Web browsers. In this post I will discuss how I see this as a fundamental shift. I’ll discuss both the risks and rewards that I see with this model.
So what’s the big deal?
The first thing to look at is how does screen reading software typically interact with a Web page. Usually the software pulls data out of some model representing the Web page, interprets it, and presents it to the user. The data could be coming directly from the browser and the DOM or through the operating system’s accessibility layer. No matter where it gets that data, the screen reader almost always pulls the data then interprets the data itself based on the semantic markup on the page. The Web page does not usually push data to the screen reader software or tell the software how to interpret the data independent of the semantic markup. This means that when a screen reader user interacts with the page, every time they navigate somewhere or interact with an element, the screen reader is pulling information from the data source, interpreting it, and presenting it to the user.
This is why we tell people to build pages with good semantic structure and all of the other accessibility things we say. This way when a user encounters one of these elements the screen reader software can interpret what it is and present it to the user in a consistent way. So no matter what screen reader software you use, when something is coded as an <h1>, all screen reader software reports to their users that they are reading a heading level 1. Each screen reader application might speak this information differently or have slight variations for how you navigate the items, but there is always consistency within the screen reader application itself. This is good for both the screen reader user and the developer. The screen reader user can know that his heading navigation keys will always get him to a particular heading and to the next and previous headings. The developer doesn’t have to worry about how each screen reader will represent this <h1> to the user – they just know it will work. There is a standard which defines what <h1> means, and everyone agrees to follow that definition.
Now none of that has changed in ChromeVox. An <h1> is still reported as a heading level 1 to the user and the user can still navigate through the headings the same way. What has changed with the ChromeVox API is now the Web page has the ability to modify the way that an <h1> gets interpreted by the screen reading software. In fact, the ChromeVox API allows the Web page to reinterpret ANY semantic markup or even ANY action the screen reader user takes the way the page sees fit. The fundamental shift is from the screen reading software pulling and interpreting the data to the Web application interpreting and pushing the data to the screen reading software.
To see this in action you can either watch the following YouTube videos demonstrating this or you can read the demonstration page using two different screen reading programs, ChromeVox and any other screen reader.
With this example, please keep in mind that I am not an expert on the ChromeVox API. This example is what I cobbled together after watching a presentation at Google I/O and seeing some sample code on their slides. There is not a well documented public API to do all of this yet to my knowledge.
In this example there is a simple page with four headings, some text, and an image. If you use any screen reader software other than ChromeVox the page will behave just as you expect it to. The user can browse the page linearly or jump around from heading to heading.
Page read with JAWS and Internet Explorer
If you read this page with ChromeVox you will have a very different experience because I have used the ChromeVox API to change the way certain semantic elements are presented to the user, and I’ve even overridden the natural flow of the page so unexpected things happen when you browse the page. The two items I have changed are:
- When using the commands to go to the next and previous headings, instead of relying on ChromeVox to announce the heading text and then say “heading level 1”, I have told ChromeVox to say “You have jumped to the next heading which is called <insert heading text>.“ I have redefined the built-in behavior of ChromeVox when it encounters headings when navigating to the next and previous headings.
- When browsing to the next and previous heading, when you try to go between the third and fourth headings, ChromeVox will tell you “You are not ready for the next heading yet. First you must spend time frolicking with the penguins. After that you may go to the next heading. Image. Penguins frolicking.” I have redefined ChromeVox navigation commands to do whatever I want, independent of the semantic structure of the page.
Page read with ChromeVox and Chrome
It seems silly, but there are serious implications
Yes, that example is rather sophomoric, but it proves a point. Despite using <h1> elements, I was able to present those elements to the user in a very non-standard way. Also, despite using a navigation technique that is only supposed to allow me to jump from heading to heading, I was able to force the screen reader user to the image, even though they requested the next or previous heading. I am not doing any keyboard trapping to do this. It’s all done with the ChromeVox API so ChromeVox will behave differently than expected.
So why would they do this?
If you aren’t so mathematically inclined there are other benefits too. If you do have a user interface that is tremendously complex and doesn’t lend itself to navigation by semantic markup, you could make the screen reader do and say whatever you want based on the user’s input. There’s now no reason to tie yourself to semantic navigation or even ARIA attributes for trying to convey richer UI elements. You can in essence write your own screen reader for each application you develop, and just use ChromeVox as the TTS engine to actually speak it.
Is this a bad thing?
Not always, but it definitely opens the door to abuse. Most Web pages and applications can be written accessibly using semantic markup with ARIA attributes, and ChromeVox can still handle those things just fine. In fact, I bet Google will still encourage you to use standards in your Web page. What this opens the door to is creating ChromeVox-only solutions for certain Web pages and applications.
This page best viewed with Internet Explorer 6…
Are we really ready to go back to this, or is Google, as they claim, advancing Web accessibility with features that have never been possible before?
On the positive side, this has the potential to let developers create Web pages and applications accessible to a level that has not been possible before. However, will ARIA not suffice to meet most if not all of our needs?
On the negative side, creating custom user interfaces for one particular group of users means, in essence, creating two sets of code. Will all of the new features in the non-screen reader UI be translated instantly over to the screen reader UI?
Well I heard that screen reader users like it when …
How many times have we heard misinformed developers start a justification for a particular implementation with these words? With great power comes great responsibility. I know Google does not intend for developers to use this API in obnoxious ways, but it’s out there now, and the reality is it will get misused some. Do we want to trust the same developers who just now figured out that “spacer graphic” is never appropriate alt text to be able to define Web page navigation in a way that is “more superior” than just using good heading structure?
So where do we go from here?
If ChromeVox had a bigger market share, this conversation would probably be a little different. ChromeVox does have one advantage over other screen readers though. It is by far the most accessible way to interact with Google Apps. Are we experiencing a market shift? Is Google trying to redefine the way screen reader software should work with Web pages? Is Google promoting it’s own ecosystem as the superior answer to their competitors? It worked for Apple, iTunes, and iOS devices. Are we at that early stage where the benefits of the ecosystem are not yet fully realized? When big players with lots of money start playing, they like to change the rules of the game to give themselves the advantage. That’s the free market and it’s seldom ever a tidy process.
How will the other screen reader vendors respond? Will developers start utilizing this API in ways that make ChromeVox the necessary choice for their application? Is this just JAWS scripts now being implemented by Web developers? Does this fundamentally break the Web? Is this all just a tempest in a teapot?
I believe Google is in it to win it. They don’t see this as a research project or a neat idea. They believe they are advancing the state of Web accessibility. Do we agree with that?