Skip to content

RoBorregos@Home Docs

Dataset generation

RoBorregos@Home Docs

Welcome to RoBorregos @Home
Overview
Overview
Areas
Areas
Team Members
2024
2024
- Achievements from 2024
- Computer Vision
  Computer Vision
  - Computer Vision
  - Human Analysis
    Human Analysis
    
    Face detection and recognition
    
    Person Counting and Finding
    
    Person Description
    
    Person Tracking
  - Object Detection
    Object Detection
    
    Dataset generation Dataset generation
    Table of contents
    
    Pipeline
    
    Dataset transformation
    
    Seat detection
    
    Shelf Object detection
  - Utils
    Utils
    
    ZED_Simulation
- Human Robot Interaction
  Human Robot Interaction
  - HRI 2024 Summary
  - Areas
    Areas
    
    Keyword Spotting (KWS)
    
    Local command extraction
    
    Local TTS
    
    Respeaker
    
    RoboMetrics
    
    Improved speech-to-text module
    
    Speech and NLP pipeline upgrades
    
    Local Entities Similarity
- Integration
  Integration
  - Integration
  - Project Structure
  - Troubleshooting
    Troubleshooting
    
    Javier AGX Flashing - Board ID not recognized
- Manipulation
  Manipulation
  - Manipulation
- Navigation
  Navigation
  - Navigation
2023
2023
- Achievements from 2023
- Team Members 2023-2024
- Computer Vision
  Computer Vision
  - Computer Vision
- Electronics and Control
  Electronics and Control
  - Index
- Human Robot Interaction
  Human Robot Interaction
  - Human Robot Interaction
  - Human Physical Analysis
    Human Physical Analysis
    
    Face following
  - Robot Interface
    Robot Interface
    
    Display
  - Speech
    Speech
    
    NATURAL LANGUAGE PROCESSING (NLP)
    
    Human Speech Processing
- Integration and Networks
  Integration and Networks
  - Integration and Networks
- Manipulation
  Manipulation
- Mechanics
  Mechanics
  - Mechanics
- Navigation
  Navigation
  - Navigation
2022
2022
- Achievements from 2022 - June 2023
- Team Members 2022-2023
- Computer Vision
  Computer Vision
  - Computer Vision
  - Human Analysis
    Human Analysis
    
    Overview
    
    Pose Estimation with MediaPipe
  - Object Detection
    Object Detection
    
    Overview
    
    Dataset Automatization
    
    Custom Models
    Custom Models
    
    TensorFlow Lite Model Maker
    
    YOLOv5
- Electronics and Control
  Electronics and Control
  - Control
  - Electronics
  - Boards
    Boards
    
    Boards
- Human Robot Interaction
  Human Robot Interaction
  - Human Robot Interaction
  - Speech
    Speech
    
    Overview
    
    GPT-3 API
    
    Speech To text
    
    Text To Speech
- Integration and Networks
  Integration and Networks
- Mechanics
  Mechanics
  - DashGO x ARM
    DashGO x ARM
    
    Dash Go + xARM
  - RBGS
    RBGS
    
    Base Omnidireccional
2025
2025
- Computer Vision
  Computer Vision
  - Computer Vision
  - Architecture Overview
  - Vision Exercises
  - OnBoarding
  - Human Analysis
    Human Analysis
    
    Face Recognition
    
    Person Tracking
    
    Poses and Gestures
  - Object Detection
    Object Detection
    
    Dataset Generation Pipeline
    
    Shelf Detection
    
    Zero-Shot Object Detector
  - VLM
    VLM
    
    Moondream
- Human Robot Interaction
  Human Robot Interaction
  - HRI 2025 Summary
  - Areas
    Areas
    
    Command Interpreter
    
    Local TTS
    
    OpenWakeWord
    
    Speech pipeline upgrades
    
    Display
    
    Embeddings
    
    RAG
- Manipulation
  Manipulation
  - Architecture
  - Manipulation Onboarding Guide
Development
Development
- Development
- HRI
  HRI
  - Weekly Spotlights
- Electronics
  Electronics
  - Weekly Spotlights
- Integration
  Integration
  - Integration Overview
  - Weekly Spotlights
  - Task Breakdown
    Task Breakdown
    
    Tasks per area
    
    Clean Table
    
    Enhanced General Purpose Service Robot
    
    Receptionist
    
    Restaurant
    
    Serve Breakfast
    
    Stickler for the Rules
    
    Storing Groceries
    
    Give me a Hand
    Give me a Hand
    
    Description
    
    To Do Tasks - Give me a Hand
    
    Gpsr
    Gpsr
    
    General Purpose Serivce Robot
    
    Functions for GPSR
    
    Command Break Down
    
    Proposed API for GPSR
- Manipulation
  Manipulation
  - Area Overview
  - Weekly Spotlights
- Mechanics
  Mechanics
  - Overview
  - Weekly Spotlights
- Navigation
  Navigation
- Vision
  Vision
  - Node Overview
  - Weekly Spotlights
Resources
Resources
- Codelabs
  Codelabs
  - @Home Codelabs
  - ROS2 @Home Guide
  - General
    General
    
    Tailscale Installation & Usage Guide
    
    Using Tmux
- Onboarding
  Onboarding
  - Onboarding
  - Vision
    Vision
    
    Architecture Overview
    
    OnBoarding

Dataset generation

Pipeline

A pipeline was developed that produces a dataset with the necessary encoding to be trained with YoloV8. This pipeline utilizes two main technologies. The first is GroundingDINO, which enables us to detect objects of our interest and obtain their bounding boxes. The second is SAM (Segment Anything Model), which we use to crop images from their original backgrounds and create images containing only the desired objects. To achieve this, SAM requires the bounding boxes of the objects we want to segment, which are provided by GroundingDINO.

Dataset transformation

Additionally, a module was developed to transforms any object detection dataset into an object segmentation dataset. This is achieved using SAM, leveraging the YOLO bounding box annotations and re-annotating the labels with a segmentation notation.