VisualNav: Visually Grounded Natural Language Crawler Robot Navigation
Authors: Lakmina P. Gamage, Haritha Weerathunga, Vishwani Geeganage, Chinthaka Premachandra, B. H. Sudantha
See moreSee less
Abstract
This paper compares two vision-language-action architectures for natural-language crawler robot navigation. It evaluates an end-to-end vision-language model against a modular system that separates intent extraction, open-vocabulary visual grounding, and control. The modular approach demonstrates stronger zero-shot generalization and lower inference latency, offering a practical route for embodied AI on resource-constrained edge devices.
- Robot Navigation
- Robotics
- Vision Language Models
- Vision-Language-Action Models
- Visual Grounding