UROP Proceeding 2023-24

School of Engineering Department of Computer Science and Engineering 89 Video Analytics and IoT People/Asset Sensing for Smart City Applications Supervisor: CHAN Gary Shueng Han / CSE Student: CHOI Cheuk Woon / COGBM Course: UROP 1100, Spring The project aims to provide a smart-city application that evaluates a similarity score between a professional executing a specific pose, such as shooting a basketball, and a real-time video of a person performing the same action. The user will then obtain a real-time evaluation score based on how similar their pose compared to a professional. This evaluation is facilitated through a machine learning model that extracts skeleton keypoints to compute a similarity score. This report presents research conducted to identify suitable models for the task, focusing on video analytics and pose estimation in the context of sports and exercise. The primary objective of the research is to identify optimal models for skeleton extraction task, selecting the most appropriate model through testing and exploring various methods for generating similarity scores. Prior to testing the models, in-depth analysis was conducted on several models, including UniPose, YOLONAS-POSE and AlphaPose, to gain better insights into state-of-the-art models and approaches. Following that, three models – UniPose, PCT and Yolo-v8-pose – were tested. While running UniPose, challenges were faced during compilation due to inadequate documentation and faulty code, they were later resolved by modifying code, but unsatisfactory results were generated. PCT model, after code enhancements, allowed multi-frame processing, and has yielded satisfactory results. Ultimately, Yolo-v8-pose was chosen for its efficient processing time, user-friendly library and inherent support for multi-frame processing without code modification. Furthermore, various methods for similarity scoring were assessed, which are categorized into static single image and multi-frames scoring. Euclidean distance and Scale-invariant feature transform (SIFT) are used to compute similar scores for single images. After testing, Euclidean distance is used along with Procrustes Analysis to standardize and normalize the images for more accurate scoring. Dynamic Time Warping (DTW) was adopted for comparing similarity across multiple frames and sequence of keypoints as it is known to be a standard technique. Overall, this study delves into various models, training methodologies, model testing, data transformation techniques, and the underlying principles of utilizing distinct similarity scoring methods. The insights can offer potential advancements and directions in video analytics applications within the specialized domain of sports analysis.

RkJQdWJsaXNoZXIy NDk5Njg=