Confuse the Lost Bot

Confuse the Lost Bot! A Reinforcement Learning Attack Demo

Welcome to Confuse the Lost Bot, a demo on reinforcement learning (RL) and its attack vectors for the Secure and Trustworthy AI Systems course. In this demo, we have a reinforcement learning powered bot agent that tries to get to its goal tile. But by tweaking the controls it might be led astray. Try out different variations to understand how adversarial manipulation can influence RL behavior. This interactive demo showcases two attack vectors for reinforcement learning agents: reward hacking (gold tiles that can lure the agent away from its goal) and adversarial observation manipulation (fake goals and noisy perceptions). Click on tiles to add/remove reward tiles, adjust sliders, and observe how the agent's path changes.

Controls

Click on grid cells to add/remove gold tiles (reward tiles). Observe how high-reward tiles can lure the agent away from the true goal.

Legend:

Agent

True Goal

Reward Tile

Perceived/Fake Goal

Priority Tile

Confuse the Lost Bot! A Reinforcement Learning Attack Demo

Controls

Understanding the Attacks