Home - Mir-Fahad-Abdullah/Walking-Assistant-Robot-for-Blind-People GitHub Wiki
The Walking Assistant Robot
Abstract—
This paper presents the design and development of a low-cost, autonomous walking assistant robot intended to aid visually impaired individuals in navigating complex and resource-constrained environments, particularly in Bangladesh To provide dependable obstacle detection and responsive navigation, the system combines real-time sensor data fusion, natural language processing for Bangla voice assistance, and lightweight object recognition using Deep Learning. The robot uses effective voice synthesis models that are adapted for Bangladesh to improve user communication. Stability and terrain adaptation, including stair traversal, are made possible via a rocker-bogie suspension system. By providing a locally adaptive, computationally efficient, and linguistically inclusive mobility aid, the suggested approach overcomes significant drawbacks of traditional assistive technologies, including their high cost, English-only interaction, and not being fully autonomous. The robot is capable of providing precise, real-time guidance that is appropriate for daily use in Bangladeshi cities.
Keywords— Assistive robotics, object detection, Bangla NLP, MobileNet-SSD, Tiny-YOLOv4, EfficientDet-Lite, Jetson Orin Nano, visual impairment, voice navigation, real-time system.
Introduction
According to the 2020 Nationwide Blindness Survey in Bangladesh, about 1% of people over the age of 30 are visually impaired. For many of them, navigating daily life without assistance can be difficult and, at times, even dangerous. While canes and other traditional mobility aids are useful but limited because they are unable to detect impediments above ground, such as tree branches or hanging signs. The majority of individuals in Bangladesh cannot afford guide dogs, despite the fact that they provide a superior alternative.
The reality on the ground in Bangladesh is quite harsh. Overcrowded sidewalks, unpredictable road conditions, and noisy environments make it extremely challenging for visually impaired individuals to move around safely. Although artificial intelligence (AI) and natural language processing (NLP) technologies have demonstrated promise in helping blind people around the world, the majority of these solutions depend on expensive hardware or reliable internet access, neither of which are always available or inexpensive in our environment.
AI-powered robotics and assistance navigation have a strong foundation thanks to earlier research. To enable autonomous service robots, for example, Ekvall et al. showed how object detection and mapping with SLAM may work in tandem \cite{4}. Their technology, however, was tested in controlled indoor circumstances and relied on the availability of a robust robotic platform with pan-tilt-zoom cameras and laser scanners—hardware that is far too costly and sophisticated for everyday usage in low-resource contexts. Similarly, while the work of Saini and Joseph \cite{5}. discusses the role of NLP in robotics and even touches on emotional intelligence and sentiment analysis, it remains largely conceptual and lacks practical, deployable implementations in real-world assistive scenarios.
Furthermore, these studies usually focus on English-based interactions while ignoring the linguistic and cultural contexts of non-Western countries. This is a big problem in places like Bangladesh, where a lot of people feel more comfortable speaking Bangla. If language localization is not used, even the most advanced assistive technology could end up being ineffective for the very people it is meant to help.
We provide a practical and cost-effective solution to fill these gaps: a completely autonomous walking assistant robot designed for Bangladesh. Compared to current systems, our robot can navigate in Bangla by speech and is built to work offline. It uses computationally efficient, lightweight object detection techniques that work well with embedded systems, making it affordable and widely available. Our ultimate objective is to provide visually impaired people with a tool that respects their surroundings while simultaneously improving their mobility.
Literature Review
A key component of robotic assistance systems for the blind and visually impaired is object detection. Traditional CNN-based techniques, such as R-CNN, Faster R-CNN, YOLO, and Mask R-CNN, are still in use because they can provide real-time performance with minimal accuracy loss. Particularly noteworthy for striking the appropriate balance between speed and accuracy, YOLO is perfect for mobile robotics \cite{1}. Notwithstanding these advantages, there are still several obstacles to overcome for real-world deployment in settings like Bangladeshi city streets, including occlusions, conflicting perspectives, dim illumination, and congested scenes \cite{1}.
Building on this, Ekvall et al. suggested a more comprehensive framework for robotic perception by combining Simultaneous Localization and Mapping (SLAM) with object identification to produce semantically rich maps of indoor spaces \cite{4}. In order to facilitate autonomous navigation and job planning, their system employed Receptive Field Co-occurrence Histograms (RFCHs) to recognize objects and combine those detections with spatial data. With the use of this method, robots may not only find themselves but also recall the location of particular objects. For example, they can recall the location of the kitchen by looking for typical objects like cups or rice bags. Human-robot interaction is greatly enhanced and more sophisticated autonomy is supported by the synergy between geometric mapping and semantic comprehension.
In the meantime, how robots and humans communicate is also changing as a result of the convergence of computer vision and natural language processing (NLP). The integration of CV and NLP enables robots to not only "see" but also "describe" and "explain" what they see, as noted by Wiriyathammabhum et al \cite{2}. This is crucial for applications such as visual question answering and scene captioning. One of the main factors facilitating more natural human-robot interactions is the capacity to include contextual semantics, such as object connections and qualities, into robotic systems.In the meantime, how robots and humans communicate is also changing as a result of the convergence of computer vision and natural language processing (NLP). The integration of CV and NLP enables robots to not only "see" but also "describe" and "explain" what they see, as noted by Wiriyathammabhum et al \cite{2}. This is crucial for applications such as visual question answering and scene captioning. One of the main factors facilitating more natural human-robot interactions is the capacity to include contextual semantics, such as object connections and qualities, into robotic systems.
Extending this notion, Saini and Joseph \cite{5} examined how NLP can be leveraged in robotic systems for emotional and contextual understanding. Their work highlights the potential of NLP techniques such as sentiment analysis, stemming, and TF-IDF in training robots to interpret human speech beyond basic commands. For visually impaired users, this means a robot could not only guide them through a hallway but also respond empathetically to voice tone and phrasing. The authors emphasize that machine learning allows robots to learn from user interactions, reducing the need for continuous reprogramming and improving adaptability over time.
Mazzei et al.'s bibliometric assessment of NLP's function in assistive systems in the social robotics area revealed a discernible shift toward "soft" human-robot interaction, where language and emotional engagement are valued more highly than physical reactions \cite{3}. Their results support the worry that existing systems are less usable in linguistically diverse areas because they are primarily designed for English-speaking users. The use and usefulness of such assistive technologies are restricted, especially in Bangladesh, by the absence of support for Bangla-language natural language processing.
All things considered, even if there has been a lot of advancement, there is still a long way to go before these technologies are accessible, flexible, and useful in non-Western, low-resource settings. In order to effectively meet the demands of visually impaired users in Bangladesh and other comparable contexts, a low-cost, autonomous robotic assistant that combines lightweight object identification with Bangla voice command navigation has been proposed.