Controls
Click on grid cells to add/remove gold tiles (reward tiles). Observe how high-reward tiles can lure the agent away from the true goal.
Welcome to Confuse the Lost Bot, a demo on reinforcement learning (RL) and its attack vectors for the Secure and Trustworthy AI Systems course. In this demo, we have a reinforcement learning powered bot agent that tries to get to its goal tile. But by tweaking the controls it might be led astray. Try out different variations to understand how adversarial manipulation can influence RL behavior. This interactive demo showcases two attack vectors for reinforcement learning agents: reward hacking (gold tiles that can lure the agent away from its goal) and adversarial observation manipulation (fake goals and noisy perceptions). Click on tiles to add/remove reward tiles, adjust sliders, and observe how the agent's path changes.
Click on grid cells to add/remove gold tiles (reward tiles). Observe how high-reward tiles can lure the agent away from the true goal.
Impact: The agent exploits high-reward shortcuts and ignores the true objective.
Mitigation: Improve reward design, add constraints, and audit behavior.
Impact: The agent follows incorrect or manipulated inputs.
Mitigation: Validate inputs, use redundancy, and train for robustness.
Impact: Causes instability and unpredictable behavior.
Mitigation: Train with noise, filter inputs, and use robust policies.