Inspiration

There are an estimated 36 million blind and 217 million people worldwide with severe visual impairments. This represents a significant potential customer base that is currently being under-served by products and services which target sighted users and heavily rely on a visual medium like web pages and visual input devices like mice and touch screens on smartphones.

In North America alone, 54 million people have disabilities, and they have more than $220 billion in discretionary income. ... According to the International Center for Corporate Accountability, that’s a bigger demographic than Latino, >LGBTQ2, and African-American markets combined—and people with disabilities have twice the spending power as >the coveted market of tweens and teens. ... Disregarding this market is a lost revenue opportunity, one that CMOs can no longer afford to ignore.

The increasingly complex and intricate web and mobile and desktop high-resolution GUI interfaces used today for accessing and managing technology products may seem natural and intuitive to sighted users, but the experience of visually impaired or differently-abled customers with products and services delivered through computing devices and interfaces requiring a high degree of visual learning and understanding, will always be sub-par.

People who are blind or visually impaired or differently-abled stood to benefit the most from the computer and information revolution and the explosion of hardware devices and mobile computer form-factors. But for these people the reality has fallen far-short of expectations. It is true that GUI operating environments like Windows and MacOS and Android, and document formats like HTML and PDF have made huge strides in being accessible to visually impaired and disabled users. But user interfaces today still make fundamental assumptions about the medium of presentation of information and the mode of interaction with users, which can make using desktop or Web applications and forms or documents frustrating and time-consuming for sight-impaired users. GUI applications may conform to desktop accessibility guidelines and web-sites and pages and documents can follow recommendations like WCAG and ARIA to make applications and content accessible to a wider spectrum of users.. But navigation through desktop and web applications is still spatially oriented on a persistent visual surface and visual users can see and immediately memorize and rank in importance navigation elements like windows, menus, trees, buttons, text, headers et.al, while using a visual input marker like a mouse cursor to select the desired element or content they need. Information like calendars or tables or forms when presented visually use the visual layout as an important part of the meaning and applications rely on a user's ability to quickly understand how the visual layout of elements prioritizes information and the steps needed to complete a task or process.

Non-visual users who rely on assistive technology like screen-readers must often wade through a sea of elements and text before finding the desired function or content, and must rely on slow trial-and-error, repetition, and memory to be able to efficiently navigate a GUI. The increasing complexity of desktop and mobile GUIs today may benefit and seem intuitive to experienced visual users, but can also leave non-visual or differently-abled users far worse off than older interfaces. Today's complex and intricate GUIs make assumptions about the sight acuity or dexterity or short-term information processing abilities of users that can end up excluding a significant proportion of users or potential customers.

Conversational user interfaces are among the easiest and most accessible forms of human-computer interaction and have seen a revival today on desktop and mobile devices powered by sophisticated natural language understanding and machine learning services running in the cloud. AI-powered voice-activated assistants like Alexa and Siri have finally given visually-impaired and elderly computers users an interface that feels natural and efficient to use.

But most assistants and chatbots and CUIs today still assume that the user can see the active conversation or activity or skill on-screen and can easily navigate and click on buttons or windows or other widgets when needed to complete an interaction. CUIs today used to access information like customer service may simply act as a director to widgets like calendars or web pages that are still heavily dependent on the assumption of a visual medium for presentation of information. For visually impaired users, a web page or calendar or task widget may cause a screen reader to flood the user with information with no way to filter or narrow down what the user needs actually needs. The closed-source nature of many of these cloud-based assistants means hacking on the software can only happen in a walled-garden that cannot fundamentally alter how the assistant works.

System administration and computer programming are popular career choices for visually-impaired people and many of these people learn to work at speed using screen-readers to interface with modern GUI development tools, but may heavily favor using command-line interfaces and console based editors and tools as alternatives to GUI-based tools. However command-line tools still have a far steeper learning curve for non-sighted users and require these users to be able to remember and input the exact syntax required for commands while learning to navigate and process a large amount of text output that may be generated that still use visual layouts for conveying meaning. Command-line tools used to administer complex tech stacks like RedHat's OpenShift are far more accessible than the GUI alternatives but still require memorization and precise input and navigation of large text output buffers and still still tend to stress the weaknesses of non-visual users rather than their strengths.

To adequately serve the millions of blind and visually-impaired potential customers, organizations need to look beyond mere accessibility into open, truly inclusive interfaces that cater directly to the strengths of non-visual users while minimizing the weaknesses. Auditory user interfaces, like Emacspeak and other work pioneered by T.V. Raman can fundamentally change the customer experience for millions of non-visual users and organizations investing in this technology may find an untapped market for their products and services and a new source of brand and customer loyalty especially among non-visual technology workers.

What it does

Victor CX is a 100% open source client-server system which provides a multi-modal conversational user interface client and server back-end for interacting with an organisation's technology products and online services like product management and administration, product documentation, customer service and support, business processes like applying for a loan or for admission to a school, and other customer experience applications that traditionally rely on a user's ability to navigate a complex GUI and on the visual presentation of documents and forms.

Victor CX is specifically designed for users who are blind or sight-impaired or who are otherwise not able to effectively use a traditional GUI with mouse or touchscreen input, and must rely on assistive technologies like screen-readers or braille displays. The client uses a simplified, conversation-driven interface powered by natural language understanding that can run either on a character-based terminal or as a browser as a web application. Victor produces line-by-line output that is easily read by screen readers and line-driven interactive input that can be easily entered via any kind of keyboard or character input device, or via speech recognition for users who cannot use such devices.

The back-end consists of a scalable chatbot server which contains the CUI logic for different bots and tracks and persists the conversation state for each client interaction with a bot, as well as as a web API which is called by CUI bots to retrieve content or run operations or interact with an organization's existing services.

Victor CX lets organizations create auditory user interfaces for customer experience that integrate with existing business processes and services and content and which also satisfy the 7 inclusive design principles

  • Provide comparable experience

  • Consider situation

  • Be consistent

  • Give control

  • Offer choice

  • Prioritise content

  • Add value

Victor CX provides a 100% open-source alternative to proprietary CUI services like Google's Dialogflow, Microsoft's LUIS, IBM's Watson, Amazon's Alexa, wit.al, et.al. Unlike existing open-source chatbot projects like Rasa, Victor CX is designed around microservices and RedHat's OpenShift Container Platform and open-source enterprise-grade servers and frameworks like MongoDB, .NET Core, and Java EE, and can be scaled to reliably handle millions of conversations. The client program is designed to work efficiently with screen-readers and other assistive technology and can be run on computers without any GUI environments installed like traditional *nix and BSD systems.

Non-visual software developers and administrators can use a Victor CX feature called the Voice Interactive Shell (Vish) which provides a CUI using natural language understanding for system administration that offers an alternative to using complex heavy Web GUIs or command-line tools that require precise syntax and extensive memorization of options. Using Vish users can express intents in natural language (like show me all running pods) without having to memorize an exact command syntax and options and the CUI will format and break up the command output into manageable pieces that can be easily read via a screen reader and navigate through the output using the keyboard.

How I built it

Client

The CLI client is a .NET console application that can run on .NET Framework or Mono or .NET Core, on Windows, Linux, MacOS and RPi.

Name Description
.NET Core/Mono Open-source cross-platform managed application runtime
Julius Open-source cross-platform continuous speech recognition
Snips NLU Open-source cross-platform fast accurate natural language understanding engine

The CLI client is used both to access the CX and to test client features and to administer back-end services. The CLI has it's own embedded NLU and automatic speech-recognition libraries and does not rely on any remote NLU or ASR servers and can run completely offline if required. Although the client can use the Mimic text-to-speech engine if required the best option for TTS is to use the operating system's narrator or the user's installed screen reader like NVDA or JAWS.

A Web UI client that runs in the browser using botui is also planned.

Server

The Victor CX server is designed around microservices all running on the OpenShift Container Platform.

Name Description
E.D.D.I Open-source chatbot server written in Java
MongoDB Stores E.D.D.I CUI artifacts like bot definitions and conversations
Victor NLU .NET Core service which provides NLU service to the web UI using the Snips NLU library
Victor WebAPI .NET Core service which provides the public API that CX clients and bots talk to
Red Hat Decision Manager Low-code way to implement business rules and logic for bots

E.D.D.I

E.D.D.I is a modular scalable open-source chatbot server that uses a REST-oriented interface for conversations and administration. Victor has a CLI for administering E.D.D.I bots and artifacts eddi

Victor WebAPI

The WebAPI server provides back-end integration for bots and services with an organization's existing processes and business rules and content. It is implemented as an ASP.NET Core 2 application and automatically deployed from GitHub commits as a container on OpenShift via S2I and build configs. ASP.NET Core is a high-performance web server runtime that uses best-practices like dependency injection e.g. the client library that talks to RHDM Server 7.4 is injected into each API controller as a transient:

            services.AddTransient((provider) =>
            {
                var config = new Configuration();
                config.BasePath = "https://victor-kieserver-evals25-shared-7daa.apps.hackathon.rhmi.io/services/rest/";
                config.Username = Api.Config("KIE_ADMIN_USER");
                config.Password = Api.Config("KIE_ADMIN_PWD");
                config.ApiClient.RestClient.Authenticator = new RestSharp.Authenticators.HttpBasicAuthenticator(config.Username, config.Password);
                return new KIESessionAssetsApi(config);
            }); 

.NET has very extensive community-supported tooling which makes it a great choice for an integration layer that must talk to different kinds of servers and apps. We were able to generate .NET bindings to the server REST APIs we needed to talk to via C# code generation tools for Swagger e.g. the entire OpenShift REST API bindings were auto-generated for Victor via a community open-source tool

Victor NLU server

This server uses the same NLU library as the client but is designed for Web UI clients.

RedHat Decision Manager

VictorCX uses RHDM on the back-end as a way for organizations to implement the business rules and logic that processes like applying for a loan must follow. Bots can call the REST API of the KIE server to execute business rules with user input. Since there aren't any .NET clients for the KIE server REST API I auto-generated most of the C# bindings from the Swagger spec.I loaded some of the example containers like the loan application to act as rules for the bots I will develop.

Implementation

Victor CX implements CUIs as hierarchical packages which group small sets of related tasks.

There are 3 package categories:

  • Vish - Voice Interactive Shell packages for non-visual users performing system administration tasks like managing an OpenShift cluster
  • Services - Tasks that do not require much interactivity like checking product news
  • Bots - Bots are conversational agents that help you with tasks like filling out complex forms or completing complex multi-step processes and workflows that require a lot of interactivity.

Victor CX Controllers have the task of interfacing between CUI packages and the user's display and input devices and NLU and ASR components. The only Controller implemented right now is the CLI

Challenges I ran into

This was my first time working extensively with Red Hat OpenShift technology and there was definitely a learning curve coming up to speed with the different concepts. My experiences struggling to get a grip on the different pieces made me think of how much more difficult it must be for visually impaired users and why a shell like Vish could be invaluable to these people.

Accomplishments that I'm proud of

I was able to learn how to build and deploy an entire OpenShift project and learnt a lot of cool things about CX for visually impaired and disabled users.

What's next for Victor CX

This is a very interesting project and I'd like to get feedback from the visually impaired user communities on what they think should be improved or added.

Built With

Share this project:

Updates

posted an update

Extract from T.V. Raman's book Auditory User Interfaces about shortcomings of using screen readers with GUIs:

"Though often cited as an advantage of the screen-reading approach, producing spoken output without any help from the user application causes the screen-reading paradigm to break down when confronted with more and more complex visual displays. Rich visual interfaces enable application designers to produce complex dialogues that exploit the power of the visual mode of interaction. The meaning and intent of such dialogues is often represented implicitly in the visual layout, rather than explicitly in the form of text appearing on the screen. A screen-reading program attempting to meaningfully speak these displayed dialogues very quickly runs into the impedance mismatch described in Sec. 1.3. Even when the screen-reading application succeeds in truthfully speaking the entire displayed content, the listener is stilI left to guess at the meaning and intent. Thus, as visual interfaces have evolved into more and more complex interactions, the corresponding access solutions based on retrofitting speech output have lagged further and further behind. As a case in point, the World Wide Web (WWW) is only partially accessible using today's screen-reading applications. Thus, the primary shortcoming with screen-readers is their inability to convey the meaning and structure present in visually displayed information."

Log in or sign up for Devpost to join the conversation.

posted an update

This video is a great example of how a blind user interacts with GUIs today on desktops to access online services like email or respond to YouTube comments:

https://youtu.be/TiP7aantnvE?t=317

Even though iOS and MacOS are very accessible with VoiceOver, using the GUI on her laptop is still time-consuming and slower than sighted-users. Being able to type into Spotlight and search things using text is much faster.

Log in or sign up for Devpost to join the conversation.