This Ph.D. thesis presents novel design, optimal tuning, and parallel real-time implementation of Tightly-Coupled (TC) Visual-Inertial Navigation (VIN) systems integrated with the Global Navigation Satellite System (GNSS) for autonomous vehicle applications. Although the cost and size efficiency of the Visual-Inertial-GNSS Navigation (VIGN) systems offer great commercialization potentials, achieving high accuracy, robustness, and real-time performance on embedded computing platforms are still challenging. Accuracy and robustness in VIGN systems heavily rely on two factors: first, a TC fusion scheme that harvests the deep inter-modality correlations in the sensory data, and second, a well-tuned fusion model that sufficiently characterizes the actual behaviour of the VIGN system in practice. However, fusing multi-modal sensory data in TC fashion scheme, as in the VIGN system, inevitably imposes a high computational burden due to the large state space and diverse volumes of visual processing, making it quite challenging to satisfying strict real-time constraints on embedded computing platforms. To address these challenges, this Ph.D. thesis proposes novel design and optimization approaches that have resulted in three main contributions. This Ph.D. research initially develops an enhanced VIN system based on Multi-State Constraint Kalman Filter (MSCKF). Further, it proposes a novel systematic design and automatic tuning framework to adjust its design parameters for optimal state estimation using an evolutionary technique based on Particle Swarm Optimization (PSO) algorithm. The second contribution takes a step further and reformulates the combined problem of designing and tuning a VIGN system as a single end-to-end learning problem. A novel Deep Convolutional Recurrent Neural Network (DCRNN) architecture has been proposed to automatically train a model-free and calibration-free VIGN system. Finally, this thesis takes the VIN system developed in the first contribution and proposes a parallel and real-time implementation on embedded computing hardware. To satisfy strict real-time constraints, the computationally burdensome modules of the developed VIN's pipeline have been parallelized and distributed on the multi-core and many-core architectures of an embedded CPU-GPU-enabled computing platform. The accuracy, efficacy, and real-time performance of the proposed solutions have been evaluated on real datasets supported by a hardware-in-the-loop simulation setup.