{"id":1016,"date":"2018-06-29T09:01:20","date_gmt":"2018-06-29T09:01:20","guid":{"rendered":"http:\/\/nyxditech.com\/blog_staging\/?p=1016"},"modified":"2018-07-08T07:38:14","modified_gmt":"2018-07-08T07:38:14","slug":"alexa-whats-technology-behind","status":"publish","type":"post","link":"https:\/\/nyxditech.com\/blog_staging\/alexa-whats-technology-behind","title":{"rendered":"Alexa, What\u2019s The Technology Behind You?"},"content":{"rendered":"<p>It was the early 90\u2019s when speech recognition first took off with significant DARPA funding being invested in various research projects conducted by the top universities. However, the data available at that time was simply not sufficient for the technology to grow on. But that changed over the last decade due to substantial digital advancements that led to the availability of data at hand to train models. The biggest milestone achieved in the sphere of voice recognition technology is the Amazon Alexa that has used extremely complex machine learning processes to revolutionize the way we conduct everyday tasks.<\/p>\n<p>Last year, Amazon opened up its technology behind Alexa, called the Lex. The system combines natural language understanding technology with automatic speech recognition and can be used by developers who want to build their own conversational applications like a chatbot. According to Amazon, the technology can be used to serve a variety of purposes especially web and mobile applications.<\/p>\n<p><img fetchpriority=\"high\" decoding=\"async\" class=\"alignnone size-full wp-image-1020\" src=\"http:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/alexa-voice-text.jpg\" alt=\"alexa-voice-text\" width=\"768\" height=\"535\" srcset=\"https:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/alexa-voice-text.jpg 768w, https:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/alexa-voice-text-300x209.jpg 300w\" sizes=\"(max-width: 768px) 100vw, 768px\" \/><br \/>\n<em>image:techcrunch.com<\/em><\/p>\n<h2>So How Does This Technology Works?<\/h2>\n<h2>1. Signalling Process<\/h2>\n<p>It all starts with the signal processing. This allows the device to make sense of the audio input by cleaning the signal. It\u2019s one of the most crucial challenges in far-field audio. The primary objective is to recognise the target signal which can only be done by first identifying and minimizing ambient noises like the TV or the dishwasher. These issues are handled through beamforming which uses seven microphones to identify the source of the signal so that the device can focus on it. Moreover, acoustic echo cancellation knows when it\u2019s playing and can eliminate that signal so only the important signal is left.<\/p>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-1019\" src=\"http:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/alexa-signalling-process.jpg\" alt=\"alexa-signalling-process\" width=\"768\" height=\"432\" srcset=\"https:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/alexa-signalling-process.jpg 768w, https:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/alexa-signalling-process-300x169.jpg 300w\" sizes=\"(max-width: 768px) 100vw, 768px\" \/><\/p>\n<h2>2. Wake Word Detection<\/h2>\n<p>Once the signalling is done the next task is Wake Word Detection. It determines if the user has said one of the words for which the device is programmed to turn on, such as &#8216;Alexa&#8217;. This is very important as voice commands could be picked from conversations around and may result in accidental purchases and angry customers. Furthermore, it needs to identify pronunciation differences as well and that too with limited CPU power and everything needs to be quick, so it requires high accuracy and low latency.<\/p>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-1021\" src=\"http:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/alexa-word-detection-.jpg\" alt=\"alexa-word-detection\" width=\"768\" height=\"400\" srcset=\"https:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/alexa-word-detection-.jpg 768w, https:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/alexa-word-detection--300x156.jpg 300w\" sizes=\"(max-width: 768px) 100vw, 768px\" \/><\/p>\n<h2>3. Audio To Text Conversion<\/h2>\n<p>If the wake word is detected, the signal is then sent to the speech recognition software in the cloud. Here the audio is converted into text format. This process is performed by converting a binary classification problem into a sequence-to-sequence problem. It needs to browse through all the words in the English language to determine the input and produce the desired output. That\u2019s huge, and the cloud is the only technology capable of scaling it sufficiently. The input is not a one-word query. It can be any possible question and therefore you need the context of it.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1017\" src=\"http:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/alexa-conversion.jpg\" alt=\"alexa-conversion\" width=\"768\" height=\"400\" srcset=\"https:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/alexa-conversion.jpg 768w, https:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/alexa-conversion-300x156.jpg 300w\" sizes=\"(max-width: 768px) 100vw, 768px\" \/><\/p>\n<h2>4. Natural Language Understanding (NLU)<\/h2>\n<p>Natural Language Understanding (NLU) is used to convert text into a meaningful representation. So, let\u2019s say you ask for the weather in New York. Here, the intent would be weather and New York. But, the problem pops up with cross-domain intent classification. For example, if someone said \u2018play remind me\u2019, this is very different to remind me to go play. However, it could easily be misinterpreted. Additionally, there are a lot of commonly used words which sound the same but has completely different meanings. Like \u2018BY\u2019 can be misinterpreted as \u2018BUY\u2019 leading to unwanted consequences. Out-of-domain utterances are also eliminated at this stage if they don\u2019t make any sense. This prevents the device from mistakenly hearing commands from televisions and the likes.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1018\" src=\"http:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/alexa-natural.jpg\" alt=\"alexa-natural\" width=\"768\" height=\"400\" srcset=\"https:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/alexa-natural.jpg 768w, https:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/alexa-natural-300x156.jpg 300w\" sizes=\"(max-width: 768px) 100vw, 768px\" \/><em>image:techcrunch.com<\/em><\/p>\n<p>Researchers are working constantly to improve the speech recognition software. Further improvements will see Alexa better hold a conversation, remember what a person has said previously, and applying that knowledge to consequent interactions. <a href=\"http:\/\/nyxditech.com\/blog_staging\/index.html#contact-section\"><strong>Get into a discussion<\/strong><\/a> with our technology experts and analyse the future of Alexa.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It was the early 90\u2019s when speech recognition first took off with significant DARPA funding being invested in various research projects conducted by the top universities. However, the data available at that time was simply not sufficient for the technology to grow on. But that changed over the last decade due to substantial digital advancements that led to the availability of data at hand to train models. The biggest milestone achieved in the sphere of voice recognition technology is the Amazon Alexa that has used extremely complex machine learning processes to revolutionize the way we conduct everyday tasks. Last year, Amazon opened up its technology behind Alexa, called the Lex. The system combines natural language understanding technology with automatic speech recognition and can be used by developers who want to build their own conversational applications like a chatbot. According to Amazon, the technology can be used to serve a variety of purposes especially web and mobile applications. image:techcrunch.com So How Does This Technology Works? 1. Signalling Process It all starts with the signal processing. This allows the device to make sense of the audio input by cleaning the signal. It\u2019s one of the most crucial challenges in far-field audio. The primary objective is to recognise the target signal which can only be done by first identifying and minimizing ambient noises like the TV or the dishwasher. These issues are handled through beamforming which uses seven microphones to identify the source of the signal so that the device can focus on it. Moreover, acoustic echo cancellation knows when it\u2019s playing and can eliminate that signal so only the important signal is left. 2. Wake Word Detection Once the signalling is done the next task is Wake Word Detection. It determines if the user has said one of the words for which the device is programmed to turn on, such as &#8216;Alexa&#8217;. This is very important as voice commands could be picked from conversations around and may result in accidental purchases and angry customers. Furthermore, it needs to identify pronunciation differences as well and that too with limited CPU power and everything needs to be quick, so it requires high accuracy and low latency. 3. Audio To Text Conversion If the wake word is detected, the signal is then sent to the speech recognition software in the cloud. Here the audio is converted into text format. This process is performed by converting a binary classification problem into a sequence-to-sequence problem. It needs to browse through all the words in the English language to determine the input and produce the desired output. That\u2019s huge, and the cloud is the only technology capable of scaling it sufficiently. The input is not a one-word query. It can be any possible question and therefore you need the context of it. 4. Natural Language Understanding (NLU) Natural Language Understanding (NLU) is used to convert text into a meaningful representation. So, let\u2019s say you ask for the weather in New York. Here, the intent would be weather and New York. But, the problem pops up with cross-domain intent classification. For example, if someone said \u2018play remind me\u2019, this is very different to remind me to go play. However, it could easily be misinterpreted. Additionally, there are a lot of commonly used words which sound the same but has completely different meanings. Like \u2018BY\u2019 can be misinterpreted as \u2018BUY\u2019 leading to unwanted consequences. Out-of-domain utterances are also eliminated at this stage if they don\u2019t make any sense. This prevents the device from mistakenly hearing commands from televisions and the likes. image:techcrunch.com Researchers are working constantly to improve the speech recognition software. Further improvements will see Alexa better hold a conversation, remember what a person has said previously, and applying that knowledge to consequent interactions. Get into a discussion with our technology experts and analyse the future of Alexa.<\/p>\n","protected":false},"author":1,"featured_media":1022,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[141,278,140,280,281,279],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Alexa, What\u2019s The Technology Behind You? - NYX Ditech<\/title>\n<meta name=\"description\" content=\"Researchers are working constantly to improve the speech recognition software, particularly in terms of sensing the emotion in a person\u2019s voice. Further improvements will see Alexa better hold a conversation, remember what a person has said previously, and applying that knowledge to consequent interactions\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/nyxditech.com\/blog_staging\/alexa-whats-technology-behind\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Alexa, What\u2019s The Technology Behind You? - NYX Ditech\" \/>\n<meta property=\"og:description\" content=\"Researchers are working constantly to improve the speech recognition software, particularly in terms of sensing the emotion in a person\u2019s voice. Further improvements will see Alexa better hold a conversation, remember what a person has said previously, and applying that knowledge to consequent interactions\" \/>\n<meta property=\"og:url\" content=\"https:\/\/nyxditech.com\/blog_staging\/alexa-whats-technology-behind\" \/>\n<meta property=\"og:site_name\" content=\"NYX Ditech\" \/>\n<meta property=\"article:published_time\" content=\"2018-06-29T09:01:20+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-07-08T07:38:14+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/amazon-alexa.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"768\" \/>\n\t<meta property=\"og:image:height\" content=\"400\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Alexa, What\u2019s The Technology Behind You? - NYX Ditech","description":"Researchers are working constantly to improve the speech recognition software, particularly in terms of sensing the emotion in a person\u2019s voice. Further improvements will see Alexa better hold a conversation, remember what a person has said previously, and applying that knowledge to consequent interactions","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/nyxditech.com\/blog_staging\/alexa-whats-technology-behind","og_locale":"en_US","og_type":"article","og_title":"Alexa, What\u2019s The Technology Behind You? - NYX Ditech","og_description":"Researchers are working constantly to improve the speech recognition software, particularly in terms of sensing the emotion in a person\u2019s voice. Further improvements will see Alexa better hold a conversation, remember what a person has said previously, and applying that knowledge to consequent interactions","og_url":"https:\/\/nyxditech.com\/blog_staging\/alexa-whats-technology-behind","og_site_name":"NYX Ditech","article_published_time":"2018-06-29T09:01:20+00:00","article_modified_time":"2018-07-08T07:38:14+00:00","og_image":[{"width":768,"height":400,"url":"https:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/amazon-alexa.jpg","type":"image\/jpeg"}],"author":"admin","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/nyxditech.com\/blog_staging\/alexa-whats-technology-behind#article","isPartOf":{"@id":"https:\/\/nyxditech.com\/blog_staging\/alexa-whats-technology-behind"},"author":{"name":"admin","@id":"https:\/\/nyxditech.com\/blog_staging\/#\/schema\/person\/33d0b3d1f48eed4a537608abd5f401ee"},"headline":"Alexa, What\u2019s The Technology Behind You?","datePublished":"2018-06-29T09:01:20+00:00","dateModified":"2018-07-08T07:38:14+00:00","mainEntityOfPage":{"@id":"https:\/\/nyxditech.com\/blog_staging\/alexa-whats-technology-behind"},"wordCount":651,"commentCount":0,"publisher":{"@id":"https:\/\/nyxditech.com\/blog_staging\/#organization"},"image":{"@id":"https:\/\/nyxditech.com\/blog_staging\/alexa-whats-technology-behind#primaryimage"},"thumbnailUrl":"https:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/amazon-alexa.jpg","keywords":["AI","Amazon Alexa","Artificial Intelligence","big data","deep learning","virtual assistant"],"articleSection":["Technology"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/nyxditech.com\/blog_staging\/alexa-whats-technology-behind#respond"]}]},{"@type":"WebPage","@id":"https:\/\/nyxditech.com\/blog_staging\/alexa-whats-technology-behind","url":"https:\/\/nyxditech.com\/blog_staging\/alexa-whats-technology-behind","name":"Alexa, What\u2019s The Technology Behind You? - NYX Ditech","isPartOf":{"@id":"https:\/\/nyxditech.com\/blog_staging\/#website"},"primaryImageOfPage":{"@id":"https:\/\/nyxditech.com\/blog_staging\/alexa-whats-technology-behind#primaryimage"},"image":{"@id":"https:\/\/nyxditech.com\/blog_staging\/alexa-whats-technology-behind#primaryimage"},"thumbnailUrl":"https:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/amazon-alexa.jpg","datePublished":"2018-06-29T09:01:20+00:00","dateModified":"2018-07-08T07:38:14+00:00","description":"Researchers are working constantly to improve the speech recognition software, particularly in terms of sensing the emotion in a person\u2019s voice. Further improvements will see Alexa better hold a conversation, remember what a person has said previously, and applying that knowledge to consequent interactions","breadcrumb":{"@id":"https:\/\/nyxditech.com\/blog_staging\/alexa-whats-technology-behind#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/nyxditech.com\/blog_staging\/alexa-whats-technology-behind"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/nyxditech.com\/blog_staging\/alexa-whats-technology-behind#primaryimage","url":"https:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/amazon-alexa.jpg","contentUrl":"https:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2018\/07\/amazon-alexa.jpg","width":768,"height":400,"caption":"amazon-alexa"},{"@type":"BreadcrumbList","@id":"https:\/\/nyxditech.com\/blog_staging\/alexa-whats-technology-behind#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/nyxditech.com\/blog_staging\/"},{"@type":"ListItem","position":2,"name":"Alexa, What\u2019s The Technology Behind You?"}]},{"@type":"WebSite","@id":"https:\/\/nyxditech.com\/blog_staging\/#website","url":"https:\/\/nyxditech.com\/blog_staging\/","name":"NYX Ditech","description":"","publisher":{"@id":"https:\/\/nyxditech.com\/blog_staging\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/nyxditech.com\/blog_staging\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/nyxditech.com\/blog_staging\/#organization","name":"NYX Ditech","url":"https:\/\/nyxditech.com\/blog_staging\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/nyxditech.com\/blog_staging\/#\/schema\/logo\/image\/","url":"https:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2024\/03\/nyxditech_logo.png","contentUrl":"https:\/\/nyxditech.com\/blog_staging\/wp-content\/uploads\/2024\/03\/nyxditech_logo.png","width":173,"height":70,"caption":"NYX Ditech"},"image":{"@id":"https:\/\/nyxditech.com\/blog_staging\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/nyxditech.com\/blog_staging\/#\/schema\/person\/33d0b3d1f48eed4a537608abd5f401ee","name":"admin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/nyxditech.com\/blog_staging\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/9ea8d51a35e986075ec6e097c1dc4446?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/9ea8d51a35e986075ec6e097c1dc4446?s=96&d=mm&r=g","caption":"admin"},"url":"https:\/\/nyxditech.com\/blog_staging\/author\/admin"}]}},"post_mailing_queue_ids":[],"_links":{"self":[{"href":"https:\/\/nyxditech.com\/blog_staging\/wp-json\/wp\/v2\/posts\/1016"}],"collection":[{"href":"https:\/\/nyxditech.com\/blog_staging\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nyxditech.com\/blog_staging\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nyxditech.com\/blog_staging\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nyxditech.com\/blog_staging\/wp-json\/wp\/v2\/comments?post=1016"}],"version-history":[{"count":5,"href":"https:\/\/nyxditech.com\/blog_staging\/wp-json\/wp\/v2\/posts\/1016\/revisions"}],"predecessor-version":[{"id":1042,"href":"https:\/\/nyxditech.com\/blog_staging\/wp-json\/wp\/v2\/posts\/1016\/revisions\/1042"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/nyxditech.com\/blog_staging\/wp-json\/wp\/v2\/media\/1022"}],"wp:attachment":[{"href":"https:\/\/nyxditech.com\/blog_staging\/wp-json\/wp\/v2\/media?parent=1016"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nyxditech.com\/blog_staging\/wp-json\/wp\/v2\/categories?post=1016"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nyxditech.com\/blog_staging\/wp-json\/wp\/v2\/tags?post=1016"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}