We might take language for granted, but it's actually an extremely complex activity. One of the big challenges of language technology design is predicting the type and style of language used within a particular context, says Rudnicky. We tend to talk in particular ways in specific instances. We also have a very large variability of language-we can talk about the same things in different ways. Most of us speak one way at home and another at work, and then that can also differ depending on who we're speaking to-colleague? boss?-and what about. What researchers used to do is simulate a context and conversation, and use the resulting transcript as language and grammar models for systems to learn from. 'Unfortunately,' says Rudnicky, 'that would be something that never really ends because there's always some other way of saying something.'

There are newer techniques that are more streamlined, using the distance between the old and new iterations to understand meaning. In other words, matching the intent and the correct language. While machine learning, for example, has minimized the challenge of data analysis for language models, 'you still need to know what people are talking about and their intent,' he says, which means general conversation remains a challenge (see 'How Tech Learns to Talk').

How Tech Learns to Talk

Giving voice to ones and zeros requires a combination of methods.

Teaching tech to talk is no small thing. Here are the major elements that make it happen:

  • Natural language processing. NLP is a field in AI that sits at the intersection of computer science and computational linguistics.
  • Statistical models. Researchers used to go by a set of rigid language rules embodied in grammars but now use a more flexible statistical models approach that can assign probabilities to different interpretations of voice-in other words, a more realistic way of thinking about language and how we use it.
  • Language components. To improve recognition accuracy and devise appropriate spoken responses, voice tech analyzes various aspects of how we talk, including grammar, syntax, word choice, sentiment analysis, semantics, vocabulary, use in context, and error identification and correction.
  • Conversational interfaces. These are systems that can manage an interaction with a human.
  • Natural language understanding. One of the biggest challenges for AI, NLU needs to deal with the messiness of language-all the slang, mistakes, and new words we exchange and invent.
  • Interactive language learning. A newer approach and a move away from statistical models, interactive learning uses interactions with humans to teach AI.
  • The Turing test. Computer scientist Alan Turing's original test judges whether a machine is good enough to fool people into thinking it's human.
  • The Winograd Schema Challenge. This update to Turing's test, launched in 2016, is a multiple-choice quiz for machine learning intelligence. At the inaugural test, the highest score was 58%.

Call centers have been using voice recording and mining and sentiment analysis for years, says Plakias, but it only works because it deals with a limited range of conversations. Moving from call centers to business meetings is the next challenge. A voice system needs to identify speakers, what's important, what's chatter, what directions are given, and many other variables. Plus, as Plakias points out-no surprise here-it's not unusual for people to be highly distracted in meetings.

'The thing with meetings is they're not like fixed tasks that you can predict,' says Rudnicky. 'People talk about whatever they're going to talk about, and trying to understand what happens in a meeting is a more difficult problem.'

Right now, voice tech can listen in and take some commands. The next step will be voice tech with the ability to summarize an entire meeting on its own. That's really difficult to do. 'Most AI experts will say that kind of level of reasoning is years away,' says Plakias.

And if to err is human, the good news is that our errors are valuable to researchers like Rudnicky. 'Errors are really important because they keep happening,' he says.

But errors are difficult for AI and voice tech because they a) need to be identified and b) need to be corrected. Implicit confirmation-a verbal repetition, like how a waiter repeats an order back to us before heading off to the kitchen-is one method of working with errors. 'You want people to have some idea of what's going on in the machine's mind just like you do when you're talking with somebody,' says Rudnicky. During a conversation with another human, 'you want to keep track of what they're thinking, which you do by basically inferring from what they're saying.' The AI running voice tech needs to be able to do the same thing.

Designing voice for the best user experience is unfamiliar territory because we're so used to seeing and touching, and UX has been designed around those actions. Information on a screen is presented with context-text, graphics, and so on-which helps anticipate what the user might want to do next, or guide the next action, or provide some sort of focus. But voice is like a blank canvas, so user design must work around those cues.

The point of voice technology is to make working better-more efficient, less stressful, safer, and maybe even fun-and coming innovations in voice will be part of the future of work.

Discerning a user's true intent is tricky. We're used to telling systems what to do, but we shouldn't reflexively go to the other extreme where the system takes the lead and anticipates every action. Microsoft's Clippy serves as an early example of something that thought it knew what you wanted to do and rarely did (and thus became a fail meme ahead of its time). There's middle ground that takes into account that machines are much better at learning than before (and continue to improve) but humans are very, very good at learning. Researchers recently judged that the smartest AI has the IQ of a six-year-old.

Sara Holoubek, CEO of New York-based innovation and strategy consultancy Luminary Labs, thinks the skills that are available and popular with consumer voice tech, like travel planning and information retrieval, could lead to the development of enterprise skills. Today, Alexa, which has over 10,000 skills, might manage a playlist; tomorrow it might manage a company's digital asset management system or sort through many resumes to find candidates.

Voice tech will make it easier to file a report or make a request, particularly in an environment, such as healthcare or construction, where you need to keep both hands free. Physicians are leaders in using voice in the workplace, developing smart speakers for information on symptoms, treatments, and patient records (although it should be noted that consumer voice devices are not yet HIPAA compliant). Hospitals, including Boston Children's Hospital and Beth Israel Deaconess Medical Center, have voice initiatives that are looking for ways to use voice to help patients during their hospital stays. And it's already being used in ambulances to help medics determine treatment protocols on the way to the ER, for example. 'Any type of search function inside the organization would benefit greatly from voice,' says Holoubek.

Voice is also moving beyond its grindingly annoying role as gatekeeper of call tree hell. It is becoming a more active and helpful part of the customer experience with voice-enabled products and services, like helping with product information. It will also reduce the friction of querying a database, like a CRM system, because search by voice will deliver more results, enable complex data dives, and perform more quickly than by keyboard. Using voice for search is also a very natural thing to do-it is, after all, how we ask questions of each other.

'It's going to take a lot of work and improvement for voice technology to get to the point where we can try an application like database searching, which is precisely why new technology always starts on the consumer side,' says Holoubek. 'There's a lot less risk in piloting something with a consumer base.'

One problem is our world of data. It's like the final warehouse scene in the movie Indiana Jones. So many boxes-financial, logistics, supply chain, CRM all isolated in their own crates. Today, entire analytics teams or data operation teams work on data to make it useful. For voice to work, data will need to be open access and organized in a way that meaning can be made for a variety of searches and applications.

Different consumer devices already have a 'walled garden' of ecosystems, says Dan Miller of Opus Research, which specializes in voice technology, with skills developed for specific operating systems. He thinks the creation of systems that enable integration and customization with 'killer skills' will likely come from outside the current big players. Earlier this year, Amazon relaunched its developer skills console to make it easier for developers to create and test skills. Remember 'There's an app for that'? Now, skills are where it's at.

Fair Playback

Bias and emotional intelligence are other challenges for voice tech.

We already know that AI can have a lot of biases because it's learning from its creators and users. So enterprises need to think about how that could manifest in the workplace, says Holoubek. Recruiters requesting résumés, for example, might overlook a particular group of people because of bias.

The source of machine-based emotional intelligence (EI) has traditionally been sentiment data from call center analytics, says Plakias. But in that scenario people tend to be either happy or upset, a binary emotional landscape. Regular interactions between people don't tend to be quite so clear cut. Voice tech's AI will need more emotional intelligence to respond correctly, personalize interactions, and, simply, encourage employees to interact with it. It's about comfort. We're only in the beginning phases of creating AI with EI, but a first step, an emotionally intelligent chatbot, was revealed last year.

See Me, Hear Me

But what will really take voice tech to the next level is combining it with artificial and virtual reality. The potential of VR or AR with conversational user experience could create great experiences without the interference of the mouse and keyboard.

Imagine onboarding new employees using voice tech and AR. Two weeks before they start work, the employer sends AR headsets to their houses and whenever they want, they can tell the system to give them a tour of their new buildings, new offices, and do a walk-through of the entire space-'Take me to the coffee station!' The new employees will be able to become familiar with the new environment before they've stepped a (real) foot inside.

It could also introduce coworkers, enable training sessions, expedite workspace setup, and remove the dreaded first-day discomfort and confusion (no one really likes asking where the bathrooms are). It will be a completely new experience that will be more efficient and even interesting and exciting for the new employee.

Maintenance projects could be completed more quickly and safely with the use of AR and voice tech. An AR headset could let a maintenance person find out which machine isn't working properly, diagnose the problem, virtually find and try out replacement parts, find out if they're on hand, and generate a purchase order for them if not. The entire process could take a handful of minutes.

Privacy and Security Challenges

Yet as voice gathers momentum, it arrives on the scene at a time of heightened concerns about security and how data is used. As Candid Wüest, principal threat researcher at Symantec, says, 'If you build it, they will hack it.' Already, fingerprint and voice security systems have been tricked. Spoofing a voice is more difficult, he says, but not impossible. Since doing so would require a voice sample, public-facing executives would be more likely targets than rank-and-file employees. 'It is a risk that has to be considered,' he says.

And as with other biometrics, once a voice pattern is copied, there's no going back. The security priority, says Wüest, is to implement systems that can distinguish between a live voice and a recording. Using randomization-randomly generated voice snippets or phrases used only once-is a good idea. Most current voice applications are for authentication-using voice in sensitive environments might call for security combinations like PIN and voice, he says.

Will security and privacy concerns mean the return of the private office? Apart from the noise factor, enterprises need to consider where sensitive information is accessed via voice. Microphones have gotten smaller and better, and open-plan offices reign.

It will be challenging to try and determine what is private and what can or should be shared, both internally and with outside vendors and clients.

Virtual agents work better the more they're used (that is, the more they know about an individual), which means considering how information is used, guarded, and owned, says Miller. 'We're mapping to treat the spoken word as an asset.'

The new General Data Protection Regulation rules that went into effect in May contain rules on biometric patterns. Data that can be used to identify an individual must be secure, and sensitive personally identifiable data has more stringent security requirements around access and storage, says Wüest. 'This hopefully will increase the security around how this information is stored and handled,' he says.

Transparency is the best policy, adds Wüest, which means being open about when microphones are on or off; what's stored, for how long, and so forth. Both employees and clients might be sensitive to the storage of their voice (only a few attributes of their voice are actually stored, but still). 'A very important part is that they inform all the users and clients openly,' he says. 'Telling them what they are going to store and how it's going to be used because if it's just kept secretly then everyone will have their suspicions and kind of think, 'Oh, they're recording everything I say.''

It Pays to Speak Up

The ultimate test for enterprise voice tech might not be the high-level, big-picture applications, but something simpler yet just as important-employee satisfaction. Voice could help restore some much-needed work-life balance by helping employees become much more proactive, better able to plan out their work day, and work more productively. The hope is that as voice assistants become smarter, they'll do many quotidian tasks, and integrations across data siloes will improve cross-organizational communication, cooperation, and efficiency. No more late nights slogging over data reports, in other words.

The point of voice technology is to make working better-more efficient, less stressful, safer, and maybe even fun-and coming innovations in voice will be part of the future of work. 'Voice should not be used as a way to engineer humans out,' says Holoubek. 'It should be a way to lift them up and to use all that is great about humanity to do business and do business well.'

Attachments

  • Original document
  • Permalink

Disclaimer

SAP SE published this content on 13 July 2018 and is solely responsible for the information contained herein. Distributed by Public, unedited and unaltered, on 13 July 2018 13:24:03 UTC