Red-team, Blue-team
Red-team or Blue-team
a posts I came across today: Mathstodon: Terence Tao: we use/design LLM as blue-team
Tao argued in his post that, this should be the reverse:
- to maximise strengths for both parties, AI (where I would call LLMs) is currently better suited for red-team tasks since it’s not always 100% reliable, rather than directly as generators (blue-team).
A minimax game; for each party, to reach the equilibrium, they need to choose:
- the best action for red-team is to attack the weakest link in blue-team’s output
- the best action for blue-team is to eliminate red-team’s strongest possible move
in other words:
- “the output of a red-team is only as strong as its strongest possible attack”
- “the output of a blue-team is only as strong as its weakest output”
or:
- “the red-team’s value is determined by its best (strongest) attack”
- “the blue-team’s value is determined by its worst (weakest) defence”
Which one you think is harder? playing as the red-team or the blue-team?
- to play as the red-team, you need strong prior knowledge and must give everything you’ve got at every split second.
- to play as the blue-team, you might only need to focus on your weaknesses.
LLMs are massively parallel-compute programs. By design, they can pay attention to any nuance in context or from their colossal training corpus.
- We human could be trained, at the very best, ~160 petabytes of sensory data (Data and “tokens” a 30 year old human “trains” on), yet we are not able to remember what we ate for dinner 39 days ago.
- If we place LLMs in the blue-team and a single human play as the red-team; this is another way to say “vibe-coding”.
Couldn’t relate more.
- as the red team, you only need a
0.1%breach rate. - but as the blue team, you need
100%defensibility.
If currently LLMs at best offer 99.9% of defensive strength (reliability) as blue-team, that 0.1% attack surface (unreliability) may compound.
And you don’t want that to happen.