Research


Table of Contents

Medical image segmentation

This research proposal aims to explore distributed learning in healthcare by comparing the performance and security of Federated Learning (FL), Split Learning (SL), and SplitFed Learning (SFL) methods on 3D liver images using the LiTS dataset. The study will focus on applying these distributed learning techniques to segmentation models, particularly those based on U-Net architecture, and evaluate their effectiveness in terms of accuracy, privacy preservation, and communication efficiency. The expected outcomes include theoretical insights, methodological frameworks, and practical applications that could enhance patient data management and expand the use of distributed learning in medical image analysis beyond liver segmentation.

Speech emotion recognition

This research proposal outlines a study aimed at enhancing Speech Emotion Recognition (SER) through the integration of multi-feature fusion and a Graph-LSTM architecture. The researchers plan to combine diverse acoustic and linguistic features to create a comprehensive representation of emotional expressions in speech, leveraging Graph Neural Networks to model complex relationships between emotional cues and Long Short-Term Memory networks to capture temporal dependencies. The expected outcomes include improved accuracy and robustness in SER, enhanced interpretability of the model, and a more comprehensive representation of emotional cues in speech.

Handwritten mathematical expressions recognition

This research proposal outlines a study on Handwritten Mathematical Expressions (HME) Recognition, aiming to improve structural analysis for enhanced recognition accuracy. The project focuses on addressing challenges in understanding hierarchical and spatial relationships between mathematical symbols using deep learning techniques. Two main approaches are proposed: direct feature extraction from images and a two-step process involving symbol detection and spatial relationship analysis. The study aims to develop a robust structural analysis model that outperforms current baselines in recognizing complex and nested mathematical structures across various handwriting styles.

Vietnamese traffic sign recognition

This study presents an approach to Vietnamese traffic sign recognition designed for the IOT device, utilizing the YOLOv8 Nano model to achieve real-time performance. The researchers introduce a new dataset called VTSDB100, which includes 100 different classes of traffic signs captured in diverse locations within Ho Chi Minh City, Vietnam. The researchers conduct experiments using various object detection methods on the VTSDB100 dataset, including YOLOv8, YOLOv9, YOLOv10, YOLOX, RetinaNet, and Faster R-CNN. The paper also describes techniques for deploying deep learning models on resource-constrained devices, such as using TensorRT and quantization to optimize inference speed. Finally, the authors propose a workflow for a real-time traffic sign recognition system on the Jetson Nano 2GB, which includes object detection, tracking, and a class filter algorithm to minimize false predictions.