Performance and the Accessibility Tree


[Intro music]

>> MICHAEL BECK: Welcome to technica11y, the webinar series dedicated to the technical challenges of making the web accessible. This month our presenter is Eric Bailey, a Boston-based designer and maintainer of the Accessibility Project.

>> MICHAEL BECK: Hello, everyone, and welcome to this July 2019 edition of technica11y. I’m Michael Beck, the operations manager at Tenon and your host and moderator, as usual. Thanks so much for joining us today. Happy 4th of July to our American contingent! Please remember to stay safe this year during any Independence Day festivities you attend; we would like to have you back in August. As noted during the intro we have Eric Bailey with us today. Hello, Eric!

>> ERIC BAILEY: Hey, how is it going!

>> MICHAEL BECK: It’s going quite well! Eric will be talking today about an often overlooked aspect of accessibility and that’s performance. He’ll highlight some opportunities and techniques that designers and developers can use to improve their website or web apps performance by embracing an accessible and inclusive mindset. So, without any further ado take it away Eric.

>> ERIC BAILEY: Thanks, all right. So, let’s set the stage here. This is a talk about performance with performance being the manner in which a mechanism behaves. I was conducting an accessibility audit last year which is the process of performing manual and automated checks to see if a Website can work with assistive technology. The client was a medical education app and it was used to train caregivers. It was built using a single page application framework using Google’s material design library to construct the user interface. And when I learned that, I thought, “Oh, sweet! Google made it. I don’t have to worry as much.”

So, I fire up VoiceOver, which is the screen reader for Mac OS, and I start testing and things are going pretty well. And, then, VoiceOver crashes. I try restarting Safari. I try clearing the cache. I try closing all of the other tabs, try quitting every tab I have. Heck, I even try rebooting! Same result every time. So I asked my boss, Karl, “What’s going on?” And he said, [changing his voice] “Oh, it’s probably problems with the accessibility tree.” That’s my Karl impression. I apologize.


And I go, “The what now?”

So, how do you describe an interface? And before, you know, I get more into this, like, what I want to know is, I’m not saying, “Tell me what you see or hear when you turn your computer on.” I’m saying, “How do you speak computer to a computer?” And the way we do this is with the accessibility tree. Fundamentally, the accessibility tree is a collection of objects, all of the little atomic bits that make up a user interface. People who rely on assistive technology use the accessibility tree to navigate and take action on user interfaces. And without a functioning accessibility tree, there is no way to describe the current state of what an interface is, and consequently, no way for some people to be able to use their computer or smartphone or other device. So, suffice to say, the accessibility tree is a really important piece of technology.

So, how do you build one? How do you build an accessibility tree? First, you have to give every object in the interface a name. Names are short identifiers used to describe purpose. Then, you give it role. Roles are predefined values that tell someone what they can expect to do with the interface object. For example, a role of `button` means someone can press it to activate some pre-defined action. Then, we have properties which are the attributes of the object. Examples of this are its `position` in an interface, its `state` or interactions that are allowed. For our button, that could mean a toggled state to indicate it being pressed in. I think a good example here is the play button on a Media Player. And then finally, we have a `description` which can provide further information about the object, if needed. These accessible objects are then programmatically grouped together to form more complicated user interface components.

So, how do you build these components, let’s say, an operating system alert dialog? That’s a pretty commonplace thing for UI. Starting from the top down, we have a menu bar, that’s the anchor that we’ll attach everything to. And then we have a `title` for the menu bar, so we know what it’s all about and what contents we can expect to find. If you’re using a screen reader, titles are really helpful, as it saves you from having to dig around inside the rest of the component to figure out what it’s for. The `title` here being, “Unsaved Changes.”

We might also need to close this dialog window once we learn what it’s for so we include a close button in the menu bar. And then we add the `body` of the dialog, the place where we can place the bulk of the dialog content. In the dialog’s `body`, there’s another `title`, which asks us if we want to save the changes we made to the thing we were working on. And then we add text to the `body`, which will provide more information about what happens if you choose not to save. Then, we add a save button, which will allow us to commit to saving the changes we made. And then, the accompanying cancel button in case we don’t and presto, we have a dialog component! Because these collected objects have accessible names and are therefore present in the accessibility tree, we can speak computer to computer to get what we want. I can programmatically say, go to the dialog. Then its body. And then activate the save button. And it works. How cool is that?

And this might seem a little pedantic, but, please bear with me for a bit here. We do have the dialog component but there’s more to it than just that. Interface components have to have a place to live and that place is an operating system. Operating systems also include components that allow us to navigate to and access the stuff you store in them. It’s also the place we install and run programs. And all the little doodads and widgets and games we can’t live without. And then, it also has the browser, which is rapidly becoming an operating system in its own right.

The browser contains the mechanisms we used to access the web. And the web is sort of like the Wild West in that you can write more or less whatever you want and it will usually work, which is kind of both a blessing and a curse. But, regardless of how you write what you write, you can’t escape the fact that you have to ultimately create HTML. That HTML then has an appearance and behavior augmented by both JavaScript and CSS and, again, there are many ways to go about doing this, but they are the two requisite parts that make up the whole that is a website or a web app.

The HTML markup augmented by both JavaScript and CSS creates the Document Object Model or DOM tree which is the programming interface for websites. Browsers then read the DOM and all of the information contained within it to draw an interface which is then shown to a user, somebody like you. The user can then take actions on the visually rendered interface, which updates the DOM, which in turn updates what is visually rendered and this allows us to make our websites dynamic, which is really cool. So, running in parallel is the accessibility tree, which is sampled from the generated DOM. And I say sampled because the accessibility tree will use specialized heuristics to only surface things it deems necessary. Modern versions of the accessibility tree are also generated after styles are computed as certain CSS properties such as `display` and `pseudo-element` content will actually affect it.

This sampled version of the DOM is then read by various kinds of assistive technology including, but not limited, to screen readers. And there’s two things I would really like to point out here. First, using a visually rendered interface and assistive technology isn’t mutually exclusive. Not all screen reader users are blind. And many people choose to augment their browsing with devices that rely on the accessibility tree. Secondly, the accessibility tree relies on the user interacting with the DOM to update. It’s effectively a read-only technology in that we can’t directly work with it right now.

Another thing you should be aware of is that it’s more of an accessibility forest than an accessibility tree. There are different implementations of the accessibility tree and each depends on what operating system you’re using and what version of said operating system is running. And this is due to the different ways companies such as Apple and Microsoft have built and updated their operating system’s underlying code throughout the years. And just so we’re clear, the DOM can be a part of the accessibility tree but the accessibility tree is larger and not just limited to the contents of your browser. Because of this, the accessibility tree is more brittle. It has more to pay attention to and its technical architecture was developed before this whole Internet thing really took off, meaning that it didn’t anticipate the sheer amount of information we would be throwing at it. Crashing the accessibility tree and, therefore, breaking assistive technology support is bad, yes. But also, a large amount of information present in the DOM means there’s an accompanying large amount of work the accessibility tree needs to do to figure out what’s what and then report on it. Which. Slows. It. Down.

This can create a lack of synchronization between the current state of the page and what is being reported by assistive technology, meaning that a user may not be working with an accurate model of the current screen and may be taking action on interactive components that are no longer present. This can create the confusing experience of activating the wrong control or navigating to a place that a user didn’t intend to.

So, how do we help prevent this? Start with writing semantic HTML, using things like the `button` element for buttons instead of ARIA slapped onto a div helps lessen the amount of guesswork the accessibility tree has to do when it runs its heuristics to slot meaningful content into the description of the current state of things. This serves to both speed up its calculation and makes what it reports on more reliable.

And, speaking of semantic HTML, here is the raw living DOM tree of an actual website. In fact, it’s the registration page for this webinar event! Even a static performance website contains a lot of information a computer has to chew through. And again utilizing semantic HTML helps to reduce the effort it takes to generate the accessibility tree. And, unfortunately, using semantic HTML is kind of a rare thing these days, which is partly why I’m here giving this talk. So thank you, again, to Tenon for keeping your markup straightforward and semantic.

All right. I think you get the gist of it here. Of the code I showed you, five slides worth, we have only covered two-thirds of just one page. And it’s also worth pointing out that this is a relatively short page. And that, again, this is a lot of information to process.

And keeping the DOM tree example in mind, we now have a simple form. It’s a pair of unstyled radio buttons asking you if you want the chicken or the fish as your meal preference. Here is how that form might be translated into the accessibility tree. We have a `fieldset` and within the `fieldset` there’s a `legend`, say, it’s stating meal preference. There are two `input` elements with a `type` of radio as a radio button. One with the `name` of chicken and one with the `name` of fish. Chicken has been checked as I would wish to have the chicken as my meal. And then there’s a `button` with the `name` of save. And you can see here if we use semantic HTML the names, roles and properties are automatically filled in for us without any additional effort and when I was speaking about how semantic HTML slots in, this is a more granular example of what I’m getting at. Here is an even more high fidelity example, sampled again from the Zoom registration page. I’m using Firefox’s Accessibility Inspector to dig into a `TEXT_NODE` and there’s a ton more information exposed here. You’ve got attributes, states, actions, a child element count relations and available actions. All of this information is used by the accessibility tree to help it determine and describe the shape of things.

And here is an even more high fidelity example. This is a raw text dump of the accessibility tree for this same page and this is an example of what the language might look like when we’re speaking machine directly to machine. We don’t speak machine directly though. Before we had Firefox’s Accessibility Inspector, we had to rely on more specialized tools to be the translaters. The Paciello Group’s aViewer and Apple’s Accessibility Inspector are the two go-to resources here, and you’ll still need them if you want to do some really serious digging or inspect things other than websites.

So let’s make the abstract immediate here what’s going on here and why should you care? With my auditing project, I narrowed the projects down to issues with material designs radio inputs and this is how they appear visually. And here is how they appear in code. To make a material design radio input, you need six HTML elements containing nine attributes with a DOM depth of three. You also need 66 CSS selectors containing 141 properties which weighs in at 10k when minified. You also need 2374 lines of JavaScript which weighs in at 30k when minified. All of this will get you a radio input. But you need more than one radio input to use it properly and oftentimes there’s a few options to select from. Sometimes there’s more than just a few options and sometimes there’s even more than that. In the case of my audit, we had a ton of radio inputs being conditionally injected into the page and the point of this being is that this all adds up.

Google’s Lighthouse project, an Open Source tool that analyzes for performance problems recommends the optimal DOM tree has the following: less than 1500 nodes, a max depth of 32 nodes, and no parent node with more than 60 child nodes. What I would like to call attention to is the max depth of 32 nodes bit. This might seem like a lot of wiggle room at first, but take a moment and think about the templates of the websites you’ve worked on. You know, there’s the `frameset`, there’s wrapper divs, landmarks, component wrappers and if you’re doing it right `fieldsets` in your forms and each digs a little bit more inward.

Another part of accessibility auditing is providing a fix to problems you uncover. It’s tough work, but nobody likes to pay people money to tell them all of the things that are wrong but then offer no solutions on how to fix it. We wound up recommending a radio input pattern that utilized three HTML elements. It’s a pattern developed by my friend, Scott, taken from his excellent a11Y_style_forms_controls project. It’s worth saying Scott puts these patterns through their paces and tests them with an incredibly robust set of operating systems browsers and assistive technology combinations, so a nice side benefit is knowing that I can recommend this pattern with confidence. Visually, it was completely indistinguishable from its original version and to compare the two solutions we have a 50% reduction in HTML elements and we cut the DOM depth down by a third. There’s also a 30% reduction in CSS selectors and properties resulting in the CSS payload for this pattern being reduced by 90%. We have also completely removed 30k of JavaScript which is 30k blocking resource being served.

So, we cobbled up a rough prototype and able to create the environment that mimicked the conditions of the site we were auditing only using the new radio input code and now guess what? It worked! VoiceOver didn’t crash! Awesome!

So, why is this our problem? Why should we care about the fragility of other people’s software? Well, prioritizing developer ergonomics without considering the generated HTML can lead to bad things happening and regardless of our setup, we tend to pile even more things onto our base experience, things like ads, and analytics, and social/marketing engagement tools, and development integration utilities. And then, there’s also this Cold War raging between websites and their users. Here is Facebook splitting a single word into 11 DOM elements to avoid ad blockers. And then, this graph is generated from the median values of how 1.3 million websites are generated with JavaScript claiming the majority of the share. This JavaScript majority means it’s much more time spent blocking the browser from performing other actions, including things like interfacing with assistive technologies.

So, when we throw out what the browser gives us for free, we oftentimes don’t realize that there are very real, very important things we sacrifice in doing so. Here is what Marco Zehe, Mozilla’s senior accessibility QA engineer, has to say about this:

“Nibbling away at those milliseconds it takes for information to reach assistive technologies when requested. My current task is to reduce the number of accessible objects we create for HTML:div elements. This is to reduce the size of the accessibility tree. This reduces the amount of data being transferred across process boundaries. Part of what makes Firefox accessibility slower on Windows since Firefox Quantum is the fact that assistive technologies now no longer have direct access to the original accessible tree of the document. A lot of it needs to be fetched across a process boundary, so the amount of data that needs to be fetched actually matters a lot now. Reducing that by filtering out objects we don’t need is a good way to reduce the time it takes for information to reach assistive technologies. This is particularly true in dynamically updating environments like Gmail, Twitter or Mastodon. My current patch slated to land early in the Firefox 67 cycle shaves off about 10 to 20% of the time it takes for certain calls to return.”

Said patch has landed.

And, you know, note that all of this optimization is only for one browser: Firefox. And that there are a lot of browsers out there. It’s also not all about browsers, either. This is a refreshable Braille display, one of the many other kinds of assistive technology that interfaces with the accessibility tree.

So, someone should do something; the battle cry of bystanders everywhere. Let’s unpack this some and figure out what our available options are. For screen readers, the main ones are JAWS and NVDA (both are Windows screen readers), VoiceOver for Mac OS and iOS, and TalkBack for Android. These are the four big screen readers you’re going to be hearing about. And it’s sort of analogous to have Chrome, Safari, Firefox and IE are the main browsers for the web, in that there’s more screen readers out there, but these cover the main use cases you’re most likely going to deal with. And while not all assistive technology are screen readers, if your site works well for them, chances are it will work well for other assistive technology. Of them, all but one have an open issue tracker, but half have closed source code. That means that while we can file issues, we aren’t really empowered to do much more about it on the assistive technology layer. And it’s also completely unrealistic to expect people to submit and follow code issues or pull requests across all of these issue trackers in addition to working a full-time job.

Our only other realistic option here is to keep the DOM trees on our sites nice and shallow and there are actual tangible benefits to doing this. Here is Marco again weighing in on his optimization efforts:

“Reducing the number of milliseconds it takes for a screen reader to start speaking after a key press from about 140 to 100 here, or from 120 to 100 there, doesn’t matter much on a fast machine. On a slow machine, that reduction is from about 230 to 250 down to 200 or 190 [milliseconds].” And let’s talk about what “slow machines” means. If you are disabled and/or underserved, you face significant barriers to entering and staying in the workforce. This means you may have less access to current technology, especially the more expensive higher quality versions of it. Another factor is some people who rely on assistive technology are reluctant to upgrade it for a very justified fear of breaking the way they used to interact with the world. Slow machines may also not mean an ancient computer, either. Inexpensive Android SmartPhones are a common entry point for emerging markets and with Android comes TalkBack. A slow machine might also come from a place you aren’t expecting, by which I mean you might have a state-of-the-art computer and are thusly doing state-of-the-art things on it. This requires a ton of computational resources which makes fast things slow and with less computational resources to go around we may have unintended consequences, possibly recreating a situation similar to our too many radio inputs problem. Think Electron Apps which are desktop programs built using web technologies. So, accessibility auditing isn’t something people normally do out of the goodness of their own heart; it’s typically performed after a lawsuit has been levied against an organization. And let me be clear: when you create inaccessible experiences, you are denying people their civil rights. The Americans With Disabilities Act guarantees that people cannot be discriminated against based on their disability conditions and this extends to both private organizations and the digital space. This is a good thing. It guarantees protections as more and more of the services necessary to living life go online. You need to work with what you have and not what you hope will be. Part of this means understanding that when we want things to be better, we need to understand that these kinds of changes are really technically complicated under the hood and spread across multiple concerns.

On top of that, accessibility fixes are often viewed as unglamorous and deprioritized when compared to other features. And if you don’t believe me, here is the support matrix for the `title` attribute incorporated into the W3C’s HTML spec in 1993. It’s 26 years later and we still have a ton of interoperability problems. I don’t mean to bum you out and I don’t expect you to become accessibility experts. However, as people who are interested in more than just the surface level of things, you are all uniquely empowered to affect positive change. what I ask of you is to at least incorporate basic accessibility testing into your workflow. if anything just check to see if a screen reader crashes. a bad assistive technology experience is better than none at all. Okay. Whoa, the slide background is yellow now! This is what I call a great segue. I’m a designer by trade and part of that job means coming up with alternate strategies for allowing people to accomplish their goals because, sometimes, the most effective solution isn’t necessarily the most obvious one. With that in mind, we’re going to talk about another definition of performance which is the ability to actually accomplish something. All these little surgical tweaks and optimizations don’t mean squat if people don’t understand how to operate the really fast thing you gave them.

Another aspect of dynamically injecting a ton of radio inputs to a page is it adds more things for a peson to think about. This is called cognitive load and it’s a huge problem. it affects our ability to accomplish tasks and, importantly, our ability to accomplish these tasks accurately. Namely, cognitive load inhibits our memory, our problem-solving skills, our attention, our reading comprehension level, our math comprehension level, and, shockingly, our ability to actually understand what we see. It’s such an important problem that NASA developed the Task Load Index, an assessment tool to help quantify it and, importantly, this isn’t warm fuzzy feelings about empathy. This is a serious attempt by a government agency to refine efficiencies and prevent errors. And one of the most interesting things about the existence of this index is that it’s an admission that disability conditions can be conditional. Think about it. When was the last time you were tired, distracted , or sick, or learning a new skill, or heck, when was the last time you were drunk? Cognitive load is an important thing to track for building rockets, sure, but it also translates to other complicated things like making and using software.

One of the things we can do to lessen cognitive load is to lessen what people have to parse at a given moment. For the medical education app, we could have added an additional step into the userflow and asked a high segmenting question. It’s more friction, sure, but it’s being used strategically as a focused choice to help keep the person on track and the cognitive load lower. This would help to filter down the results you get which makes it both easier for the person and browser to parse. The other big picture question we need to ask is if all this work is necessary. The most performant page is the one you never have to use, by which, I mean how can we side step the issues by using other resources and information previously made available to us.

If you’re interested in this sort of thing, the 2018 Design in Tech Report is a must read piece. One of the things that it revealed was that surprisingly very few companies conduct qualitative research which is the practice of asking people how they use a feature and how they feel while they do it. That’s 12% for early-stage startups, 32% for mid-stage startups, and 46% for late-stage startups conducting qualitative user research. You know, I’m not a numbers person, but there does seem to be a trend going on here. Another interesting thing is barely any companies conduct qualitative testing for features after they launch them, so, we are all collectively throwing features into the larger ever-evolving ecosystems that are our products and not checking to see how it will actually affect things.

As consumers, we also need to remember that we’re only seeing those companies that beat the odds. The market is built on top of the corpses of failed businesses who all poured their cash and resources into the wrong things. So the second big ask from this talk is really just repeating the first one. It’s one thing to read about something and believe it to be true. But it’s another thing entirely to put it out into the world without verifying that it works. The web and more importantly the people who use it are too important not to.

So thank you for tuning in and thank you to Michael, Karl, and technica11y for this opportunity the slides and presentations are available on and my contact information is available on my website [Title:]. So, if you have any questions feel free to reach out and thank you again.

>> MICHAEL BECK: All right thank you, Eric, if anybody has any questions they can go ahead and toss those up in the chat, as usual. I literally laughed outloud during the segue moment and people in the office were looking at me. (Chuckles).

>> ERIC BAILEY: It was a calculated bet.

>> MICHAEL BECK: It worked. It paid dividends in full.

>> ERIC BAILEY: Yeah, I didn’t have the benefit of an audience reaction so I sure hoped this worked.

>> MICHAEL BECK: Anyone any questions? Ah, tools that count node depth?

>> ERIC BAILEY: Hm, that’s actually interesting. I think you can query in the console, using JavaScript, but off the top of my head, I’m not sure. Typically, what you’ll want to do when you are thinking about this kind of thing is to work on a per-component basis. That way you don’t have to step through the whole DOM as a way to kind of do remediation work as that’s just a lot of information. I wish I had a better answer for you off the top of my head, but I do believe you can query it using a JavaScript kind of count like array method.

>> MICHAEL BECK: Dennis in the chat just suggested trying Lighthouse for DOM depth and performance auditing.

>> ERIC BAILEY: All right. Ignore me. Listen to Dennis. Thank you, Dennis!

>> MICHAEL BECK: Someone would like you to talk a little bit more about cognitive load and accessibility, which I think that was a great point that you made and I think that’s a really important issue to dive a little deeper in.

>> ERIC BAILEY: Yeah, it’s something I’ve personally been turning my attention to and it’s a big hairy ball of a topic. There’s a lot of different ways to slice it. So, I mean for cognitive load, where I’m thinking about it, is basically I think, there’s a lot of correlation with something having your primary attention and then what lives in your head about how to operate something. So, considering say this interface here for Zoom, there’s things that are similar to other things on a computer that I know how to use, so, like, little buttons and stuff like that. It’s nice they have labels. That helps me understand purpose. But, the more that you start to deviate away from these standards that we have built for ourselves, the more I have to do guesswork and the more guesswork I have to do, the more I can potentially not understand how to operate something, the more frustrated I get in a situation where you’re trying to put a product out into the world for people to use.

I think the average app evaluation time is like two seconds for somebody to figure out if this is for them or not. So that’s closing a tab or deleting an app that they just downloaded. So, I think it’s one of those things that’s very, very important and we kind of tend to overlook in our excitement to jump into looking at the actual code. I don’t know if that answers your question. I believe the W3C’s Task Force has a really good intro on cognitive accessibility as well as WebAIM has a really good initial resource.

I just saw Gordon (his last name is truncated in the chat) and he has a really good point about information architecture, so, I’m going to totally steal his idea. Thank you! Which is like, if you’re designing something, figuring out the big buckets of where things live because when you show up to a website, there’s an implied structure: Home, Services, Products, About Us. And that’s kind of big picture what information architecture is all about, which is where you believe things to be and then where people that will use your website think that things will be. So, user testing is a great way to figure out where people expect stuff to be, because if I go to resources thinking that there will be products there and there aren’t, I just close the tab because I didn’t find what I’m looking for, so, you start to ask questions of where would you put this and then seeing if there’s any trends that bubble up.

This is especially actually really important for both low literacy and low technological literacy as well as foreign language situations where there’s the potential of cultural misunderstandings. So, if you do any work with internationalization that’s also a really big important thing to think about.

>> MICHAEL BECK: Luna asks about any knowledge that you have about Outlook crashing with screen readers?

>> ERIC BAILEY: First of all, I am so sorry! We use Gmail here. I wish I had a better answer for you. I know Outlook has a browser or a web renderer within it which is weirdly Microsoft Word which is something we won’t get into because therein lies the — yeah.

>> MICHAEL BECK: That’s really weird. (Chuckles).

>> ERIC BAILEY: So it may be not rendering the web as the web, it might be using a completely different kind of set of processes to do that. And I’m actually unsure of how the accessibility tree interacts with that. But, I would be happy to check it out because that’s really interesting.

>> MICHAEL BECK: Yeah, that’s a good research topic…


>> MICHAEL BECK: …to go into. Charles Hall has mentioned that there is some good recent research on short-term memory allocation and, to start with I’m probably butchering her name, she’s a Norwegian scientist named Ylva Østby.


>> MICHAEL BECK: If you’re interested in that sort of thing check that out that’s Y-L-V-A and her last name is Ø-S-T-B-Y.

>> ERIC BAILEY: Thank you, Charles.

>> MICHAEL BECK: All right. Do we have anything else from anyone? Nope. All right. Well, thank you very much Eric for that very interesting presentation. It was superb, as far as I’m concerned. Coming up, our next technica11y will be on August 7th, highly anticipated with Rian Rietveld who will be discussing creating an accessible card, as in the construction of the title, the thumbnail, date, the excerpt, the tags, categories, the read more link, the whole shebang. Again that will be on Wednesday, August 7th, 11AM Eastern. Thank you again, Eric, for joining us today and Clare from ACS for the captioning and once again all of you for joining us. And hope you all have a wonderful July. And I will see you next month.

>> ERIC BAILEY: Thank you, everyone, see ya!

Eric Bailey

About Eric Bailey

Eric is a designer at thoughtbot, with a focus on accessible and inclusive design. He’s a member of the A11y Project, an occasional author at CSS-Tricks, and recovering curmudgeon.