Guarding Federated LLM Fine-tuning: Detecting Unsafe Clients That Break Alignment

Location

Golisano Hall (GOL/070) - Atrium GOL-1940, 1st Floor

The exhibit will be a poster presentation about navigating the safety challenges in training large language models in federated settings. In the era of foundation models becoming multipurpose, a key safety concern is the risk of these models fulfilling harmful requests from end-users. Federated learning provides a paradigm to train multiple clients together to obtain a global model, while preserving privacy in data from individual clients. Visitors will learn about how a fraction of clients being malicious can easily compromise the safety of the overall federated training process and about the challenges in identifying such malicious behavior. We will then present our proposed solution to identify malicious clients to mitigate the safety issues.