Skip to content

RoBorregos@Home Docs

Speech and NLP pipeline upgrades

RoBorregos@Home Docs

Welcome to RoBorregos @Home
Overview
Overview
Areas
Areas
Team Members
2024
2024
- Achievements from 2024
- Computer Vision
  Computer Vision
  - Computer Vision
  - Human Analysis
    Human Analysis
    
    Face detection and recognition
    
    Person Counting and Finding
    
    Person Description
    
    Person Tracking
  - Object Detection
    Object Detection
    
    Dataset generation
    
    Seat detection
    
    Shelf Object detection
  - Utils
    Utils
    
    ZED_Simulation
- Human Robot Interaction
  Human Robot Interaction
  - HRI 2024 Summary
  - Areas
    Areas
    
    Keyword Spotting (KWS)
    
    Local command extraction
    
    Local TTS
    
    Respeaker
    
    RoboMetrics
    
    Improved speech-to-text module
    
    Speech and NLP pipeline upgrades
    
    Local Entities Similarity
- Integration
  Integration
  - Integration
  - Project Structure
  - Troubleshooting
    Troubleshooting
    
    Javier AGX Flashing - Board ID not recognized
- Manipulation
  Manipulation
  - Manipulation
- Navigation
  Navigation
  - Navigation
2023
2023
- Achievements from 2023
- Team Members 2023-2024
- Computer Vision
  Computer Vision
  - Computer Vision
- Electronics and Control
  Electronics and Control
  - Index
- Human Robot Interaction
  Human Robot Interaction
  - Human Robot Interaction
  - Human Physical Analysis
    Human Physical Analysis
    
    Face following
  - Robot Interface
    Robot Interface
    
    Display
  - Speech
    Speech
    
    NATURAL LANGUAGE PROCESSING (NLP)
    
    Human Speech Processing
- Integration and Networks
  Integration and Networks
  - Integration and Networks
- Manipulation
  Manipulation
- Mechanics
  Mechanics
  - Mechanics
- Navigation
  Navigation
  - Navigation
2022
2022
- Achievements from 2022 - June 2023
- Team Members 2022-2023
- Computer Vision
  Computer Vision
  - Computer Vision
  - Human Analysis
    Human Analysis
    
    Overview
    
    Pose Estimation with MediaPipe
  - Object Detection
    Object Detection
    
    Overview
    
    Dataset Automatization
    
    Custom Models
    Custom Models
    
    TensorFlow Lite Model Maker
    
    YOLOv5
- Electronics and Control
  Electronics and Control
  - Control
  - Electronics
  - Boards
    Boards
    
    Boards
- Human Robot Interaction
  Human Robot Interaction
  - Human Robot Interaction
  - Speech
    Speech
    
    Overview
    
    GPT-3 API
    
    Speech To text
    
    Text To Speech
- Integration and Networks
  Integration and Networks
- Mechanics
  Mechanics
  - DashGO x ARM
    DashGO x ARM
    
    Dash Go + xARM
  - RBGS
    RBGS
    
    Base Omnidireccional
2025
2025
- Computer Vision
  Computer Vision
  - Computer Vision
  - Architecture Overview
  - Vision Exercises
  - OnBoarding
  - Human Analysis
    Human Analysis
    
    Clothing Detection
    
    Face Recognition
    
    Person Tracking
    
    Poses and Gestures
  - Object Detection
    Object Detection
    
    Dataset Generation Pipeline
    
    Shelf Detection
    
    Zero-Shot Object Detector
  - VLM
    VLM
    
    Moondream
- Human Robot Interaction
  Human Robot Interaction
  - HRI 2025 Summary
  - Areas
    Areas
    
    Command Interpreter
    
    Local TTS
    
    OpenWakeWord
    
    Speech pipeline upgrades
    
    Display
    
    Embeddings
    
    RAG
- Manipulation
  Manipulation
  - Architecture
  - Manipulation Onboarding Guide
Development
Development
- Development
- HRI
  HRI
  - Weekly Spotlights
- Electronics
  Electronics
  - Weekly Spotlights
- Integration
  Integration
  - Integration Overview
  - Weekly Spotlights
  - Task Breakdown
    Task Breakdown
    
    Tasks per area
    
    Clean Table
    
    Enhanced General Purpose Service Robot
    
    Receptionist
    
    Restaurant
    
    Serve Breakfast
    
    Stickler for the Rules
    
    Storing Groceries
    
    Give me a Hand
    Give me a Hand
    
    Description
    
    To Do Tasks - Give me a Hand
    
    Gpsr
    Gpsr
    
    General Purpose Serivce Robot
    
    Functions for GPSR
    
    Command Break Down
    
    Proposed API for GPSR
- Manipulation
  Manipulation
  - Area Overview
  - Weekly Spotlights
- Mechanics
  Mechanics
  - Overview
  - Weekly Spotlights
- Navigation
  Navigation
- Omnibase
  Omnibase
- Vision
  Vision
  - Node Overview
  - Weekly Spotlights
Resources
Resources
- Codelabs
  Codelabs
  - @Home Codelabs
  - ROS2 @Home Guide
  - General
    General
    
    Tailscale Installation & Usage Guide
    
    Using Tmux
  - Hri
    Hri
    
    HRI Display Guide
- Onboarding
  Onboarding
  - Onboarding
  - Vision
    Vision
    
    Architecture Overview
    
    OnBoarding

Speech and NLP pipeline upgrades

Speech Pipeline: Optimized for robustness and computational efficiency.
- Keyword Spotting: Implemented using Porcupine, a fast detector commonly used to spot 'Frida', our robot's name.
- Voice Activity Detection: Captures audio using Silero VAD to ensure a voice is present.
- Speech-to-Text: Audio inferred with Faster-whisper.
  - Outputs raw text which is to be interpreted as commands by an LLM.
  - A benchmark was made to compare whisper with faster-whisper alternatives which improved transcription time by half in most cases.
  - This model allows the use of hot words which are high probability words to be interpreted depending on the context of the task that the robot is performing, resulting in higher transcription accuracy.
NLP Pipeline:
- Processes interpreted text into robot-executable commands.
- Embeds actions for semantic matching when exact matches are unavailable.