This project provides code and results to reproduce research on how refusal in language models is...

Tokens:2,264
Snippets:4
Trust Score:8.4
License:Apache-2.0
Update:1 month ago
Tokens:
Raw