OpenWakeWord

Overview

OpenWakeWordNode is a ROS 2 node that performs keyword spotting (wake word detection) using models trained with OpenWakeWord. It supports inference using ONNX or TensorFlow Lite (TFLite) frameworks. The node subscribes to raw audio data, detects wake words in real time, and publishes detection events.

What is OpenWakeWord?

OpenWakeWord is an open-source wake word detection toolkit that enables training custom wake word models (e.g., "frida", "no", "stop", "yes") and running inference efficiently. It supports exporting models to ONNX and TFLite formats for deployment on various platforms.

Use Case

This node is intended to detect wake words during the @Home competition to trigger specific functions

Architecture and Implementation

ROS Parameters

Parameter	Type	Default Path / Value	Description
`model_path`	string	`/workspace/src/hri/packages/speech/assets/oww`	Directory containing ONNX wake word models (`.onnx` files)
`inference_framework`	string	`onnx`	Framework used for inference (`onnx` or `tflite`)
`audio_topic`	string	`/rawAudioChunk`	Topic where raw audio chunks are received
`WAKEWORD_TOPIC`	string	`/speech/oww`	Topic to publish detected wake words
`chunk_size`	int	`1280`	Number of samples processed per inference
`detection_cooldown`	float	`1.0`	Minimum time (in seconds) between detections to avoid repeated triggers
`SENSITIVITY_THRESHOLD`	float	`0.5`	Confidence threshold required for detection

Model Init

Feature	Description
Load Wakeword Models	Loads all `.onnx` models from `model_path`. Logs number of models loaded.
Default Model Fallback	Loads default wake word model if no directory or models are found.
Feature Extractor Download	Downloads `melspectrogram.onnx` and `embedding_model.onnx` if missing.
Path Setup	Copies models to the correct execution path if not already present.
ROS Publisher/Subscriber	Creates publisher (`WAKEWORD_TOPIC`) and subscriber (`audio_topic`).
Cooldown Timer	Initializes internal timer to prevent repeated wake word detections.

Audio Inference Flow

Step	Description
Receive Audio	Listens to `AudioData` messages from `audio_topic`.
Convert to NumPy	Converts audio byte stream to `np.int16` NumPy array.
Run Inference	Feeds the audio to OpenWakeWord model for prediction.
Monitor Prediction Buffer	Checks latest confidence score for each keyword.
Detection Condition	If `score > SENSITIVITY_THRESHOLD` and cooldown has passed, a detection is triggered.
Publish Detection	Sends a `std_msgs/String` message like `{"keyword": "frida", "score": 0.87}`.

Parameters

Parameter	Type	Default	Description
`model_path`	string	`/workspace/src/hri/packages/speech/assets/oww`	Directory containing wake word `.onnx` model files
`inference_framework`	string	`onnx`	Framework used for model inference (`onnx` or `tflite`)
`audio_topic`	string	`/rawAudioChunk`	ROS topic subscribing to raw microphone audio data
`WAKEWORD_TOPIC`	string	`/speech/oww`	ROS topic publishing detected wake word messages
`chunk_size`	int	`1280`	Size of audio chunks processed per prediction
`detection_cooldown`	float	`1.0`	Minimum seconds between repeated detections
`SENSITIVITY_THRESHOLD`	float	`0.5`	Confidence score threshold to confirm wake word detection

Message Types

Input: frida_interfaces/msg/AudioData — raw audio buffer (16-bit PCM)
Output: std_msgs/msg/String — JSON string with detected keyword and confidence score, e.g. {"keyword": "frida", "score": 0.87}