Screen Readers at a Crossroads

I believe screen reading software stands at a crossroads right now. At Google I/O 2013, Google showed some of the possibilities of the ChromeVox API. What they demonstrated showed some fundamental changes in the ways screen reader software interacts with Web browsers. In this post I will discuss how I see this as a fundamental shift. I’ll discuss both the risks and rewards that I see with this model.

So what’s the big deal?

The first thing to look at is how does screen reading software typically interact with a Web page. Usually the software pulls data out of some model representing the Web page, interprets it, and presents it to the user. The data could be coming directly from the browser and the DOM or through the operating system’s accessibility layer. No matter where it gets that data, the screen reader almost always pulls the data then interprets the data itself based on the semantic markup on the page. The Web page does not usually push data to the screen reader software or tell the software how to interpret the data independent of the semantic markup. This means that when a screen reader user interacts with the page, every time they navigate somewhere or interact with an element, the screen reader is pulling information from the data source, interpreting it, and presenting it to the user.

This is why we tell people to build pages with good semantic structure and all of the other accessibility things we say. This way when a user encounters one of these elements the screen reader software can interpret what it is and present it to the user in a consistent way. So no matter what screen reader software you use, when something is coded as an <h1>, all screen reader software reports to their users that they are reading a heading level 1. Each screen reader application might speak this information differently or have slight variations for how you navigate the items, but there is always consistency within the screen reader application itself. This is good for both the screen reader user and the developer. The screen reader user can know that his heading navigation keys will always get him to a particular heading and to the next and previous headings. The developer doesn’t have to worry about how each screen reader will represent this <h1> to the user – they just know it will work. There is a standard which defines what <h1> means, and everyone agrees to follow that definition.

Now none of that has changed in ChromeVox. An <h1> is still reported as a heading level 1 to the user and the user can still navigate through the headings the same way. What has changed with the ChromeVox API is now the Web page has the ability to modify the way that an <h1> gets interpreted by the screen reading software. In fact, the ChromeVox API allows the Web page to reinterpret ANY semantic markup or even ANY action the screen reader user takes the way the page sees fit. The fundamental shift is from the screen reading software pulling and interpreting the data to the Web application interpreting and pushing the data to the screen reading software.

An example

To see this in action you can either watch the following YouTube videos demonstrating this or you can read the demonstration page using two different screen reading programs, ChromeVox and any other screen reader.

With this example, please keep in mind that I am not an expert on the ChromeVox API. This example is what I cobbled together after watching a presentation at Google I/O and seeing some sample code on their slides. There is not a well documented public API to do all of this yet to my knowledge.

In this example there is a simple page with four headings, some text, and an image. If you use any screen reader software other than ChromeVox the page will behave just as you expect it to. The user can browse the page linearly or jump around from heading to heading.

Page read with JAWS and Internet Explorer

If you read this page with ChromeVox you will have a very different experience because I have used the ChromeVox API to change the way certain semantic elements are presented to the user, and I’ve even overridden the natural flow of the page so unexpected things happen when you browse the page. The two items I have changed are:

  1. When using the commands to go to the next and previous headings, instead of relying on ChromeVox to announce the heading text and then say “heading level 1”, I have told ChromeVox to say “You have jumped to the next heading which is called <insert heading text>.“ I have redefined the built-in behavior of ChromeVox when it encounters headings when navigating to the next and previous headings.
  2. When browsing to the next and previous heading, when you try to go between the third and fourth headings, ChromeVox will tell you “You are not ready for the next heading yet. First you must spend time frolicking with the penguins. After that you may go to the next heading. Image. Penguins frolicking.” I have redefined ChromeVox navigation commands to do whatever I want, independent of the semantic structure of the page.

Page read with ChromeVox and Chrome

It seems silly, but there are serious implications

Yes, that example is rather sophomoric, but it proves a point. Despite using <h1> elements, I was able to present those elements to the user in a very non-standard way. Also, despite using a navigation technique that is only supposed to allow me to jump from heading to heading, I was able to force the screen reader user to the image, even though they requested the next or previous heading. I am not doing any keyboard trapping to do this. It’s all done with the ChromeVox API so ChromeVox will behave differently than expected.

So why would they do this?

Does Google have evil intentions with this to trick users? I don’t think so. Google is actually doing some pretty cool things with this. For instance, this is how they are adding additional support for MathML. ChromeVox now has some native support for MathML, but it doesn’t fully implement it. What if you are trying to express math in a way that ChromeVox does not support yet? As a Web page developer, you have the ability to write some JavaScript to access the ChromeVox API that tells ChromeVox to interpret certain MathML symbols differently than it would natively.

If you aren’t so mathematically inclined there are other benefits too. If you do have a user interface that is tremendously complex and doesn’t lend itself to navigation by semantic markup, you could make the screen reader do and say whatever you want based on the user’s input. There’s now no reason to tie yourself to semantic navigation or even ARIA attributes for trying to convey richer UI elements. You can in essence write your own screen reader for each application you develop, and just use ChromeVox as the TTS engine to actually speak it.

Is this a bad thing?

Not always, but it definitely opens the door to abuse. Most Web pages and applications can be written accessibly using semantic markup with ARIA attributes, and ChromeVox can still handle those things just fine. In fact, I bet Google will still encourage you to use standards in your Web page. What this opens the door to is creating ChromeVox-only solutions for certain Web pages and applications.

This page best viewed with Internet Explorer 6…

Are we really ready to go back to this, or is Google, as they claim, advancing Web accessibility with features that have never been possible before?

On the positive side, this has the potential to let developers create Web pages and applications accessible to a level that has not been possible before. However, will ARIA not suffice to meet most if not all of our needs?

On the negative side, creating custom user interfaces for one particular group of users means, in essence, creating two sets of code. Will all of the new features in the non-screen reader UI be translated instantly over to the screen reader UI?

Well I heard that screen reader users like it when …

How many times have we heard misinformed developers start a justification for a particular implementation with these words? With great power comes great responsibility. I know Google does not intend for developers to use this API in obnoxious ways, but it’s out there now, and the reality is it will get misused some. Do we want to trust the same developers who just now figured out that “spacer graphic” is never appropriate alt text to be able to define Web page navigation in a way that is “more superior” than just using good heading structure?

So where do we go from here?

If ChromeVox had a bigger market share, this conversation would probably be a little different. ChromeVox does have one advantage over other screen readers though. It is by far the most accessible way to interact with Google Apps. Are we experiencing a market shift? Is Google trying to redefine the way screen reader software should work with Web pages? Is Google promoting it’s own ecosystem as the superior answer to their competitors? It worked for Apple, iTunes, and iOS devices. Are we at that early stage where the benefits of the ecosystem are not yet fully realized? When big players with lots of money start playing, they like to change the rules of the game to give themselves the advantage. That’s the free market and it’s seldom ever a tidy process.

How will the other screen reader vendors respond? Will developers start utilizing this API in ways that make ChromeVox the necessary choice for their application? Is this just JAWS scripts now being implemented by Web developers? Does this fundamentally break the Web? Is this all just a tempest in a teapot?

I believe Google is in it to win it. They don’t see this as a research project or a neat idea. They believe they are advancing the state of Web accessibility. Do we agree with that?

This entry was posted in news by Greg Kraus. Bookmark the permalink.

About Greg Kraus

I am the University IT Accessibility Coordinator at North Carolina State University. I provide leadership in creating an accessible IT infrastructure by consult on the accessibility of campus projects, working with developers and content creators, provide training, and helping set policy.

18 thoughts on “Screen Readers at a Crossroads

  1. Recently, I replaced my old Windows laptop for a Samsung Chromebook. Originally intended as a “I doubt it will, but let’s try” experiment I thought I’d be running back to my old laptop within a week.

    Now, I can’t think of any other OS I’d rather use.

    Which brings me to ChromeVox. I’m an accessibility consultant in Australia so naturally screenreaders are one of many weapons in my testing arsenal. With ChromeVox built in to the OS and a Chromebook as my daily driver I have been getting to know the ‘Vox quite well.

    You raised some really interesting points about how this screenreader behaves like no other. I also watched the I/O 13 accessibility presentation.. and at first all of this didn’t sink in, but now I get it! Your test page demonstrates it perfectly.

    I definitely enjoyed spending a moment frolicking with the penguins as well!

  2. Though I don’t think the full feature set has migrated over yet, my guess is that where this will really matter is with AndroidVox. When using TalkBack with Chrome on Android, TalkBack is using AndroidVox, a paired down version of ChromeVox. I’m doubtful ChromeOS will get wide use by users with disabilities, but Android already has some traction in that space.

  3. Greg,
    Thank you for the example, this gives a new twist on how to make web technology more accessible. It will be interesting how these competing and complementary technologies play out to make the web more accessible to people with disabilities.


  4. I’d much rather see google participating in W3C/WAI working groups to plug holes in web standards than watch it go off down its own path, forcing already under-resourced efforts at enhancing accessibility down yet another direction.

    The use of semantic code and ARIA covers most scenarios that may once have been use cases for this approach Google is taking. It makes me uncomfortable.

  5. Great blog post, Greg. I like that Google is coming up with accessibility solutions. This particular solution would seem to provide developers with the ability to get beyond accessibility hurdles where complex interfaces might otherwise be inaccessible, even with ARIA. That said, I agree that this opens the door for a lot of web developers to make assumptions – perhaps wrong ones – about how screen reader users should interact with their websites or applications. You mentioned alt text on spacer images as an example of developers having good intentions with incomplete understanding. I’ll add a couple of others examples: Lots of developers now make everything on a page a heading because they’ve heard that headings are good; and I’m seeing growing numbers of developers who add outrageous quantities of hidden text to their web pages, intended to help screen reader users. So, I think we will likely see similar misinformed solutions, but hopefully we’ll see more good ones than bad ones.

    The greater problem for me though is that it’s a Google-only solution. I actually find some positive in that too – it’s exciting that major companies are actively competing for the accessibility market. But vendor-specific solutions and lack of standardization sure make life messy and far more challenging than it should be.

    • If we are going to go down this route, I’d like to see a spec from the W3C on how to manage this communication so we don’t have vendor lock in for particular solutions.

  6. I had similar questions and concerns. ChromeVox does respect standard HTML and ARIA programming and the developers should continue to use standard markup at all times.

    Where this new API seems helpful is when an engineer is starting to look at role=”application”. This is a very complex situation and the ChromeVox API would allow a more sophisticated experience in an otherwise difficult environment.

  7. Many web sites already have multiple versions in the form of iOS and/or Android apps. Examples include Huffington Post, NY Times, and many others. Generally these apps are very list oriented and much easier to navigate with with screen readers VoiceOver and TalkBack that their web counterparts. Even those of us who don’t use screen readers find these apps easier to navigate too. Having multiple versions of the same web content is already a standard. If the ChromeVox API makes it easier to create alternative and more accessible interfaces to the same content, then it could be of great benefit to all of us.

  8. I must say I agree with those who say that Google should work within the existing standards/guidelines community. If the extra ChromeVox information is available to all, I suppose screen readers could add their own brand of support for such but I don’t like the notion that Google is going it alone on a new way to extend what has been carefully developed by a whole lot of really smart people for a decade or so.

    People in the accessibility community have been battling the notion that something is accessible if it “works with JAWS.” As NVDA and VoiceOver seem to have the most rapidly growing market shares, developing for JAWS or ChromeVox specifically seems like a horrible waste of scarce resources. Yes, some sites build special iOS or Android pages but accessibility teams are hard pressed to get alt-text tags let alone do a special set of things for one class of a small sample of their total users.

  9. The title of this post captures the situation perfectly. The way a screen reader works with content and the web browser must change. The old model where a screen reader uses a browser- and platform-specific extension mechanism and talks to the web page DOM via an interface like IAccessibility is just not going to work in the modern HTML5, mobile world. For one thing, browsers are moving away from plugins as they hurt security and performance and just don’t work the modern HTML5 way. In this modern browser world, the page author or website owner determines the content and functionality of the page and the user reads and interacts with it. There is no place for the 3rd party screen reader.

    What is needed is a new model for how the page, user, and screen reader interact. I am not sure ChromeVox is attempting to provide such a model. I think they are just trying to move things forward. The W3C seems the proper place to work out standards for how all these things interact. As far as I know, none of their existing accessibility-related standards address this area.

    My company, Design Science, has MathML-to-speech (and braille) s/w. We would also like to bring our functionality to bear on the problem. Does our s/w or web service talk to the screen reader? While that may seem like the obvious way to do it, it is not that simple. Since MathML is not yet universally implemented in web browsers, there are many ways that mathematics are represented in web pages: MathML, MathJax (using MathML or LaTeX), equation images with MathML or LaTeX attributes. Does the screen reader have to know how to deal with all of this? Or does the page contain JavaScript that does this job? How does the screen reader interact with this code?

    I would love to work this all out with screen reader vendors, browser vendors, etc. under the aegis of the W3C. While there is definitely hard work to be done here, it sounds like an interesting problem with many benefits.

  10. While it seems that something like this provides for a lot of flexibility on the surface, it has some very serious flaws. For a start, it only takes speech users into account and doesn’t consider braille at all. It’s not enough to just braille the same messages used for speech. Using Google Docs as an example, if you press right arrow to mvoe one character to the right, Docs will output that one character to this API. In braille, the entire line needs to be displayed and the cursor needs to be updated. In addition, navigation, cursor routing, etc. commands from the display have to be handled. In short, an API like this only handles one assistive modality.

    Essentially, doing things like this requires every web page that wants such advanced functionality to implement their own assistive technology virtually from scratch; i.e. their own speech output, their own braille input/output, their own magnification, etc. This is extremely complex, web developers may not necessarily have the expertise to implement such specialised support effectively and it means that the wheel is constantly reinvented. Furthermore, it means that the user cannot expect any consistency whatsoever in their interactions with various web pages. New UI developments are important, but even sighted users still prefer some consistency in how particular controls look and behave, etc.

    Some comments have suggested that there is no place for general purpose screen readers in the HTML world. I very much disagree. HTML might be becoming increasingly prevalent, but it’s far from the only UI users experience. The only platforms that use HTML for their entire UI are Chrome OS and Firefox OS. Users should be able to expect some consistency and common control in their usage of both native and HTML UIs, as well as between various web pages.

  11. I don’t have anything to add to the thoughtful comments here, but I did want to add my thanks for a really helpful article in which all implications are clearly set out. It’s a helpful preview to where screen reading interfaces might be headed with the potential risks and benefits. Great to be having this conversation somewhat early on.

  12. Pingback: Blog “Screen Readers at a Crossroads” Greg Kraus, NC State Univ | Web Accessibility at UIC

  13. >>In this modern browser world, the page author or website owner determines the content and >>functionality of the page and the user reads and interacts with it. There is no place for the >>3rd party screen reader.

    Authors / developers have always controlled Web page content and functionality.
    The modern browsing world has not changed this aspect at all.
    A ChromeVox-only solution has illustrated in this blog post is indeed akin to giving developers the power to create something like JAWS scrips.

    James Teh has pointed at the dangers and risks of content developers assuming the role of assistive technology creators and I cannot agree more with him.
    Accessibility is complex and let us not forget that accessible output is the result of what content authors do + what browsers do and what assistive technology does. And this is on top of platform / OS support for exposing accessibility.
    The creation of plugins like MathML plugin for IE is certainly good: it allows access to math at least with one browser than none at all.
    This is just the response to the realities that confront us all: different browser / AT vendors and Web technologies do different things with differing priorities.
    Surely Design Science hoped to make the plugin work across all browsers but understandably there are resource / technical constraints / dependencies that make this difficult.
    The ChromeVox-only solution is similar but can be dangerous as acknowledged by the author:
    In fact in the illustration, a user has lost the ability to tell an h1 from an h2 or ability to navigate only to headings at a particular level.
    Is a non-screenreader user prevented access to / reading content under the third heading without frolicking with the penguins? This customization certainly does not enhance accessibility for screen reader users (I know this is simply to demonstrate an idea).
    That’s why content authors should not assume the role of assistive technology.
    It is like car manufacturers taking on the work of building roads all over and creating and enforcing traffic rules too. Is this desirable?

    >>People in the accessibility community have been battling the notion that something is >>accessible if it “works with JAWS.” As NVDA and VoiceOver seem to have the most rapidly >>growing market shares, developing for JAWS or ChromeVox specifically seems like a horrible >>waste of scarce resources.
    Interesting that Chris should say this.
    For years most if not over 50% of blind users have been able to do things simply because of ‘Job Access With Speech’ i.e. JAWS in the Windows world.
    Yes JAWS and NVDA and Window-Eyeson Windows and now VO on a Mac are popular because they are effective.
    And for something to be accessible one has to test with capabilities of current AT.
    So when certain content is simply not accessible, coding it in a manner so that it works with at least the widely used combinations of browsers / AT for a start is an approach that can hardly be faulted.
    Once some accessibility support can be demonstrated, a company can sell its product / service to the government as a S508 compliant product.
    Or another company can demonstrate that its website is accessible and not be subject to legal action.
    Note: several WCAG 2 techniques are labelled as sufficient even when they work with only some combinations of browsers-assistive technology and not all combinations.
    It is economics in the end.

  14. Pingback: SeroTalk Podcast 158: It Just Ate My Like | SeroTalk

Comments are closed.