1 63 Table of Contents 65 280

UROP Proceeding 2024-25

School of Science Department of Mathematics 45 Department of Mathematics Low-Rank and Sparsity Reconstruction in Data Science Supervisor: CAI Jianfeng / MATH Co-Supervisor: YE Guibo / MATH Student: JIN Zhibo / MATH-IRE Course: UROP 1100, Fall UROP 2100, Spring This report investigates the optimization landscape of generalized phase retrieval problems, focusing on signal recovery from magnitude-only measurements. We examine non-convex optimization formulations, including the amplitude flow model and matrix factorization approaches, where the problem is reformulated as recovering a rank-one matrix from linear measurements. Key considerations include the convergence behavior of gradient descent algorithms under spectral initialization, the role of sample complexity in ensuring stable recovery, and the application of concentration techniques to control gradient and Hessian properties. We will see that previous works have established convergence guarantees for gradient descent algorithms with sample complexity of ≥ with being the rank of solution and be the dimension; under the similar Guassian assumptions, the landscape is benign with a sample complexity of ≥ log . Also, the rank relaxation has been considered by previous work that improving the rank do improve the landscape on the optimization problem embedded on the Stiefel manifold. Low-Rank and Sparsity Reconstruction in Data Science Supervisor: CAI Jianfeng / MATH Co-Supervisor: YE Guibo / MATH Student: LIU Yunxin / DSCT Course: UROP 1100, Spring Sparsity enhances AI model efficiency by activating only a few of parameters during computation. This report examines sparsity in Mixture-of-Experts (MoE) models, which send inputs to choose experts and reduce FLOPs without losing capacity. Empirical scaling laws shows that higher sparsity improves training efficiency, particularly for large language models, by optimizing the trade-off between FLOPs and active parameters. The main results show that sparse models have lower pretraining loss when the amount of computing power is fixed, and that the best level of sparsity increases with model size. However, problems like memory overhead still exist, so we need to make better hardware and do more research on how to allocate parameters. Sparsity also fits with how biological neurons work, which could have bigger effects on AI architecture. Sparsity also aligns with biological neural processing, suggesting broader implications for AI architecture. As AI language models grow, sparsity will be important for balancing performance and efficiency. Future work should explore sparsity in diverse architectures and optimize memory usage. This study shows the significance of sparsity is for making AI systems that work well and can be used in daily usage.

Made with FlippingBook

RkJQdWJsaXNoZXIy NDk5Njg=