用 80 行 Javascript 代码构建自己的语音助手

在本教程中，我们将使用 80 行 JavaScript 代码在浏览器中构建一个虚拟助理（如 Siri 或 Google 助理）。你可以在这里测试这款应用程序，它将会听取用户的语音命令，然后用合成语音进行回复。

你所需要的是：

Google Chrome （版本 25 以上）
一款文本编辑器

由于 Web Speech API 仍处于试验阶段，该应用程序只能在受支持的浏览器上运行：Chrome（版本 25 以上）和 Edge（版本 79 以上）。

我们需要构建哪些组件？

要构建这个 Web 应用程序，我们需要实现四个组件：

一个简单的用户界面，用来显示用户所说的内容和助理的回复。
将语音转换为文本。
处理文本并执行操作。
将文本转换为语音。

用户界面

第一步就是创建一个简单的用户界面，它包含一个按钮用来触发助理，一个用于显示用户命令和助理响应的 div、一个用于显示处理信息的 p 组件。

const startBtn = document.createElement("button");
startBtn.innerHTML = "Start listening";
const result = document.createElement("div");
const processing = document.createElement("p");
document.write("<body><h1>My Siri</h1><p>Give it a try with 'hello', 'how are you', 'what's your name', 'what time is it', 'stop', ... </p></body>");
document.body.append(startBtn);
document.body.append(result);
document.body.append(processing);

语音转文本

我们需要构建一个组件来捕获语音命令并将其转换为文本，以进行进一步处理。在本教程中，我们使用 Web Speech API 的 SpeechRecognition。由于这个 API 只能在受支持的浏览器中使用，我们将显示警告信息并阻止用户在不受支持的浏览器中看到 Start 按钮。

const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
if (typeof SpeechRecognition === "undefined") {
  startBtn.remove();
  result.innerHTML = "<b>Browser does not support Speech API. Please download latest chrome.<b>";
}

我们需要创建一个 SpeechRecognition 的实例，可以设置一组各种属性来定制语音识别。在这个应用程序中，我们将 continuous 和 interimResults 设置为 true，以便实时显示语音文本。

const recognition = new SpeechRecognition();
recognition.continuous = true;
recognition.interimResults = true;

我们添加一个句柄来处理来自语音 API 的 onresult 事件。在这个处理程序中，我们以文本形式显示用户的语音命令，并调用函数 process 来执行操作。这个 process 函数将在下一步实现。

function process(speech_text) {
    return "....";
}
recognition.onresult = event => {
   const last = event.results.length - 1;
   const res = event.results[last];
   const text = res[0].transcript;
   if (res.isFinal) {
      processing.innerHTML = "processing ....";
      const response = process(text);
      const p = document.createElement("p");
      p.innerHTML = `You said: ${text} </br>Siri said: ${response}`;
      processing.innerHTML = "";
      result.appendChild(p);
      // add text to speech later
   } else {
      processing.innerHTML = `listening: ${text}`;
   }
}

我们还需要将用户界面的 button 与 recognition 对象链接起来，以启动 / 停止语音识别。

let listening = false;
toggleBtn = () => {
   if (listening) {
      recognition.stop();
      startBtn.textContent = "Start listening";
   } else {
      recognition.start();
      startBtn.textContent = "Stop listening";
   }
   listening = !listening;
};
startBtn.addEventListener("click", toggleBtn);

处理文本并执行操作

在这一步中，我们将构建一个简单的会话逻辑并处理一些基本操作。助理可以回复“hello”、“what's your name？”、“how are you？”、提供当前时间的信息、“stop”听取或打开一个新的标签页来搜索它不能回答的问题。你可以通过使用一些 AI 库进一步扩展这个 process 函数，使助理更加智能。

function process(rawText) {
   // remove space and lowercase text
   let text = rawText.replace(/\s/g, "");
   text = text.toLowerCase();
   let response = null;
   switch(text) {
      case "hello":
         response = "hi, how are you doing?"; break;
      case "what'syourname":
         response = "My name's Siri.";  break;
      case "howareyou":
         response = "I'm good."; break;
      case "whattimeisit":
         response = new Date().toLocaleTimeString(); break;
      case "stop":
         response = "Bye!!";
         toggleBtn(); // stop listening
   }
   if (!response) {
      window.open(`http://google.com/search?q=${rawText.replace("search", "")}`, "_blank");
      return "I found some information for " + rawText;
   }
   return response;
}

文本转语音

在最后一步中，我们使用 Web Speech API 的 speechSynthesis 控制器为我们的助理提供语音。这个 API 简单明了。

speechSynthesis.speak(new SpeechSynthesisUtterance(response));

就是这样！我们只用了 80 行代码就有了一个很酷的助理。程序的演示可以在这里找到。

// UI comp
const startBtn = document.createElement("button");
startBtn.innerHTML = "Start listening";
const result = document.createElement("div");
const processing = document.createElement("p");
document.write("<body><h1>My Siri</h1><p>Give it a try with 'hello', 'how are you', 'what's your name', 'what time is it', 'stop', ... </p></body>");
document.body.append(startBtn);
document.body.append(result);
document.body.append(processing);
// speech to text
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
let toggleBtn = null;
if (typeof SpeechRecognition === "undefined") {
 startBtn.remove();
 result.innerHTML = "<b>Browser does not support Speech API. Please download latest chrome.<b>";
} else {
 const recognition = new SpeechRecognition();
 recognition.continuous = true;
 recognition.interimResults = true;
 recognition.onresult = event => {
  const last = event.results.length - 1;
  const res = event.results[last];
  const text = res[0].transcript;
  if (res.isFinal) {
   processing.innerHTML = "processing ....";
   const response = process(text);
   const p = document.createElement("p");
   p.innerHTML = `You said: ${text} </br>Siri said: ${response}`;
   processing.innerHTML = "";
   result.appendChild(p);
   // text to speech
   speechSynthesis.speak(new SpeechSynthesisUtterance(response));
  } else {
   processing.innerHTML = `listening: ${text}`;
  }
 }
 let listening = false;
 toggleBtn = () => {
  if (listening) {
   recognition.stop();
   startBtn.textContent = "Start listening";
  } else {
   recognition.start();
   startBtn.textContent = "Stop listening";
  }
  listening = !listening;
 };
 startBtn.addEventListener("click", toggleBtn);
}
// processor
function process(rawText) {
 let text = rawText.replace(/\s/g, "");
 text = text.toLowerCase();
 let response = null;
 switch(text) {
  case "hello":
   response = "hi, how are you doing?"; break;
  case "what'syourname":
   response = "My name's Siri.";  break;
  case "howareyou":
   response = "I'm good."; break;
  case "whattimeisit":
   response = new Date().toLocaleTimeString(); break;
  case "stop":
   response = "Bye!!";
   toggleBtn();
 }
 if (!response) {
  window.open(`http://google.com/search?q=${rawText.replace("search", "")}`, "_blank");
  return `I found some information for ${rawText}`;
 }
 return response;
}
×
Drag and Drop
The image will be downloaded

作者介绍：

Tuan Nhu Dinh，Facebook 软件工程师。

原文链接：

https://medium.com/swlh/build-your-own-hi-siri-with-80-lines-of-javascript-code-653540c77502

产品

案例

文档

IM即时通讯云

即时推送

MQTT消息云

客服云

客服机器人

部署方式

产品方案

核心优势

行业应用

开发文档

下载中心

生态伙伴

用 80 行 Javascript 代码构建自己的语音助手

你所需要的是：

我们需要构建哪些组件？

用户界面

语音转文本

处理文本并执行操作

文本转语音

相关推荐

周排行

申请试用

提交后工作人员会尽快与您联系进行功能演示

技术咨询已转移到管理后台，请先登录

注册享福利、赢好礼