School of Engineering Department of Computer Science and Engineering 107 Trustworthy Machine Learning Supervisor: CHENG Minhao / CSE Student: KWOK Kin Wai / COMP Course: UROP 1100, Fall Trustworthy machine learning is a study aimed at researching and improving the security, fairness and integrity of machine learning, instead of just focusing on the accuracy. Among different research areas in this topic, this report will focus on the test-time integrity, which is a major concern currently due to effectively misleading adversarial examples, while the attacks making use of these examples are called adversarial attacks. This report will first introduce trustworthy machine learning and some related previous research. Different attacking techniques and algorithms will be analysed and compared. The effectiveness of different techniques will be summarized at the end of this report. Trustworthy Machine Learning Supervisor: CHENG Minhao / CSE Student: LI Yuxin / COMP Course: UROP 1100, Fall This report is a summary of basic knowledge related to construct a linear logistic regression model. It consists of three parts, mathematic basic knowledge, algorithm basic, and Pytorch coding. The mathematic basic knowledge is the perquisite to the model implementation. It starts with basic rules and operations in calculus, then introduces the linear algebra and probability concepts. Algorithm model builds on mathematic basics. The linear logistic regression model and other related models are basic and important algorithms in statistics. And Pytorch puts machine learning models into practice. It is a knowledge summary of what author learnt to get start in machine learning and a summarized general path for beginners in machine learning. Trustworthy Machine Learning Supervisor: CHENG Minhao / CSE Student: WONG Erastus / COMP Course: UROP 1100, Fall The safety of neural networks has become relevant in recent years due to their black-box nature. In this project, we investigate two critical AI safety issues, backdoor attacks and universal adversarial perturbations. We aim to understand their fundamental nature and distinguish them despite their great similarity. We try to perform backdoor trigger inversion from a clean model, and cross-validate triggers on different poisoned and clean models. We show that we may get universal adversarial perturbations instead of backdoor triggers when we use backdoor detection methods to regenerate trigger patterns. We also show the transferability of inverted triggers across different models. However, the mechanisms of both attacks still need to be explored to understand the surprising result of the project.
RkJQdWJsaXNoZXIy NDk5Njg=