Skip to content

RoBorregos@Home Docs

Improved speech-to-text module

RoBorregos@Home Docs

Welcome to RoBorregos @Home
Overview
Overview
Areas
Areas
Team Members
2024
2024
- Achievements from 2024
- Computer Vision
  Computer Vision
  - Computer Vision
  - Human Analysis
    Human Analysis
    
    Face detection and recognition
    
    Person Counting and Finding
    
    Person Description
    
    Person Tracking
  - Object Detection
    Object Detection
    
    Dataset generation
    
    Seat detection
    
    Shelf Object detection
  - Utils
    Utils
    
    ZED_Simulation
- Human Robot Interaction
  Human Robot Interaction
  - HRI 2024 Summary
  - Areas
    Areas
    
    Keyword Spotting (KWS)
    
    Local command extraction
    
    Local TTS
    
    Respeaker
    
    RoboMetrics
    
    Improved speech-to-text module
    
    Speech and NLP pipeline upgrades
    
    Local Entities Similarity
- Integration
  Integration
  - Integration
  - Project Structure
  - Troubleshooting
    Troubleshooting
    
    Javier AGX Flashing - Board ID not recognized
- Manipulation
  Manipulation
  - Manipulation
- Navigation
  Navigation
  - Navigation
2023
2023
- Achievements from 2023
- Team Members 2023-2024
- Computer Vision
  Computer Vision
  - Computer Vision
- Electronics and Control
  Electronics and Control
  - Index
- Human Robot Interaction
  Human Robot Interaction
  - Human Robot Interaction
  - Human Physical Analysis
    Human Physical Analysis
    
    Face following
  - Robot Interface
    Robot Interface
    
    Display
  - Speech
    Speech
    
    NATURAL LANGUAGE PROCESSING (NLP)
    
    Human Speech Processing
- Integration and Networks
  Integration and Networks
  - Integration and Networks
- Manipulation
  Manipulation
- Mechanics
  Mechanics
  - Mechanics
- Navigation
  Navigation
  - Navigation
2022
2022
- Achievements from 2022 - June 2023
- Team Members 2022-2023
- Computer Vision
  Computer Vision
  - Computer Vision
  - Human Analysis
    Human Analysis
    
    Overview
    
    Pose Estimation with MediaPipe
  - Object Detection
    Object Detection
    
    Overview
    
    Dataset Automatization
    
    Custom Models
    Custom Models
    
    TensorFlow Lite Model Maker
    
    YOLOv5
- Electronics and Control
  Electronics and Control
  - Control
  - Electronics
  - Boards
    Boards
    
    Boards
- Human Robot Interaction
  Human Robot Interaction
  - Human Robot Interaction
  - Speech
    Speech
    
    Overview
    
    GPT-3 API
    
    Speech To text
    
    Text To Speech
- Integration and Networks
  Integration and Networks
- Mechanics
  Mechanics
  - DashGO x ARM
    DashGO x ARM
    
    Dash Go + xARM
  - RBGS
    RBGS
    
    Base Omnidireccional
2025
2025
- Computer Vision
  Computer Vision
  - Computer Vision
  - Architecture Overview
  - Vision Exercises
  - OnBoarding
  - Human Analysis
    Human Analysis
    
    Clothing Detection
    
    Face Recognition
    
    Person Tracking
    
    Poses and Gestures
  - Object Detection
    Object Detection
    
    Dataset Generation Pipeline
    
    Shelf Detection
    
    Zero-Shot Object Detector
  - VLM
    VLM
    
    Moondream
- Human Robot Interaction
  Human Robot Interaction
  - HRI 2025 Summary
  - Areas
    Areas
    
    Command Interpreter
    
    Local TTS
    
    OpenWakeWord
    
    Speech pipeline upgrades
    
    Display
    
    Embeddings
    
    RAG
- Manipulation
  Manipulation
  - Architecture
  - Manipulation Onboarding Guide
Development
Development
- Development
- HRI
  HRI
  - Weekly Spotlights
- Electronics
  Electronics
  - Weekly Spotlights
- Integration
  Integration
  - Integration Overview
  - Weekly Spotlights
  - Task Breakdown
    Task Breakdown
    
    Tasks per area
    
    Clean Table
    
    Enhanced General Purpose Service Robot
    
    Receptionist
    
    Restaurant
    
    Serve Breakfast
    
    Stickler for the Rules
    
    Storing Groceries
    
    Give me a Hand
    Give me a Hand
    
    Description
    
    To Do Tasks - Give me a Hand
    
    Gpsr
    Gpsr
    
    General Purpose Serivce Robot
    
    Functions for GPSR
    
    Command Break Down
    
    Proposed API for GPSR
- Manipulation
  Manipulation
  - Area Overview
  - Weekly Spotlights
- Mechanics
  Mechanics
  - Overview
  - Weekly Spotlights
- Navigation
  Navigation
- Omnibase
  Omnibase
- Vision
  Vision
  - Node Overview
  - Weekly Spotlights
Resources
Resources
- Codelabs
  Codelabs
  - @Home Codelabs
  - ROS2 @Home Guide
  - General
    General
    
    Tailscale Installation & Usage Guide
    
    Using Tmux
  - Hri
    Hri
    
    HRI Display Guide
- Onboarding
  Onboarding
  - Onboarding
  - Vision
    Vision
    
    Architecture Overview
    
    OnBoarding

Improved speech-to-text module

Migration to Faster-Whisper after benchmarking:
Improved speed (halved translation time).
Higher accuracy in noisy environments.
Dynamic Integration of "Hot Words":
Context-specific vocabulary dynamically adjusted.
Increases robustness and accuracy for uncommon terms.

STT Benchmark

File (10s)	Size (MB)	Faster-whisper accuracy	Time (s)	Whisper accuracy	Time (s)
test1.wav	1.22	85.7%	0.64	71.4%	1.25
test2.wav	1.22	77.8%	0.71	33.3%	1.44
test3.wav	1.22	71.4%	0.66	57.1%	1.13
test4.wav	1.22	80%	0.70	60%	1.36
test5.wav	1.53	71.4%	4.68	71.4%	4.5
test6.wav	1.83	42.9%	0.63	28.6%	1.03
test7.wav	1.83	90%	0.64	90%	0.87
test8.wav	1.83	83.3%	0.61	66.7%	0.99
test9.wav	1.83	100%	0.62	100%	0.94
test10.wav	1.83	100%	0.58	100%	0.77