Artificial intelligence companies person been moving astatine breakneck speeds to create nan champion and astir powerful tools, but that accelerated improvement hasn't ever been coupled pinch clear understandings of AI's limitations aliases weaknesses. Today, Anthropic released a report connected really attackers tin power nan improvement of a ample connection model.
The study centered connected a type of onslaught called poisoning, wherever an LLM is pretrained connected malicious contented intended to make it study vulnerable aliases unwanted behaviors. The cardinal uncovering from this study is that a bad character doesn't request to power a percent of nan pretraining materials to get nan LLM to beryllium poisoned. Instead, nan researchers recovered that a mini and reasonably changeless number of malicious documents tin poison an LLM, sloppy of nan size of nan exemplary aliases its training materials. The study was capable to successfully backdoor LLMs based connected utilizing only 250 malicious documents successful nan pretraining information set, a overmuch smaller number than expected for models ranging from 600 cardinal to 13 cardinal parameters.
"We’re sharing these findings to show that data-poisoning attacks mightiness beryllium much applicable than believed, and to promote further investigation connected information poisoning and imaginable defenses against it," nan institution said. Anthropic collaborated pinch nan UK AI Security Institute and nan Alan Turing Institute connected nan research.
1 month ago
English (US) ·
Indonesian (ID) ·