Establishing The Framework's Structure
My current CSGO bot is a hand-crafted behavior tree. (Roughly) a behavior tree is a tree-structured FSM with behavior generation models at each leaf node. Currently, all the behavior generators are hand-crafted heuristics. Based on feedback from Reddit users, I have identified two leaf nodes that are the most problematic. The nodes are problematic because their behavior generation models produce the most obviously inhuman/unfun behavior.
The two nodes are Region-Scale Agent Navigation and Combat Crosshair Control. The navigation controller guides the bots movement between large, labeled regions of the map like LongDoors and LongA. The crosshair controller moves the crosshair during the bot's one-to-five second engagements with an enemy. The controllers are problematic for the same reason: both are too general and fail to specialize to the context of the current game state. If I wanted to improve these models using a hand-crafted approach, I could create many situation-specific behavior generation heuristics. However, there are tons of situations, so I'd never cover all of them with heuristics.
In this post, I'll provide a framework for improving the behavior generators: (1) provide an intuition for the generators by defining important human behavior characteristics to emulate, (2) provide an example of pro behavior with those characteristics, (3) define the controller's game state input, (4) define the controller's behavioral output schema and frequency, and (5) define the metrics for evaluating the model's performance, player-specific style, and unpredictability.
Please Note: this blog post relies heavily on the terminology introduced in the discretization section of my v0.1 release post. Before proceeding, please review that prior post section if you aren't familiar with it.
Region-Scale Agent Navigation
The Region-Scale Agent Navigation behavior generator controls a player's movement between large regions of the map. The controller generates movement in response to the trajectories of a player's teammates and the possible locations of enemy players. There are three important characteristics of the resulting behavior:
Dynamic Buffer Spacing - A player moves relative to his teammates who
are on similar trajectories. The player must dynamically adjust the buffer of distance between them and their teammates
based on the following conflicting, situation-specific constraints:
- Don't Block - A player must follow their teammates with a large enough buffer so that they avoid impeding their teammates' trajectories. Each player is responsible for their pushing teammates in front of them on similar trajectories. A player must stand far enough behind their pushing teammates so that the pushers can move in any direction without bumping into the player. In CSGO, if the teammate runs into the player, the teammate's velocity will immediately drop to zero and the pusher will be unable to perform important navigation tasks, like running away from an enemy.
- Don't Bait - A player must follow their teammates with a small enough buffer so they can support their teammates by shooting enemies when the enemies become visible. Each player is responsible for supporting the pusher in front of them and not baiting, allowing the pusher to die without support. In CSGO, you support a teammate by standing near them so you can see and shoot at an enemy that can see your teammate. Since enemies can hold very acute angles on far away wall corners, a player is unlikely to see their pusher's killer if they are far away from their pusher.
- Maximal Distance to Assigned Danger Areas - A danger area is any map nav area where an enemy may appear. When a player moves with teammates, each one is responsible for an angular range of danger areas. If an enemy appears in one of the danger areas, it is likely that the enemy will react quicker than the player since the enemy was sitting and waiting for the player. To maximize odds of survival so the player can return fire, the player maximizes distance to their assigned danger areas and moves perpendicularly to them so the player is a smaller and faster target.
- Path Randomization - An enemy can easily kill the player if the enemy can predict the player's trajectory instead of reacting to it. Players must vary their trajectories to prevent the enemies from predicting and slow the enemies' reaction times.
Pro Navigation Demonstrating Characteristics
In the above video, k0nfig plots a trajectory through de_dust2 in response to teammates positions and enemies' possible positions. Different parts of the video demonstrate key characteristics of navigation behavior:
- Large Buffer Spacing to Enable Retreat - During the first 7 seconds of the video, k0nfig keeps a large buffer of space between himself and his pusher. The pusher has just entered a region where an enemy is likely to appear from a danger area, so k0nfig provides space for his teammate to retreat without being blocked.
- Small Buffer Spacing to Support - After the first 7 seconds of the video, k0nfig shrinks the buffer of space between himself and his pusher. k0nfig recognizes that his pusher has committed to fighting any enemies that appear without retreating to cover. k0nfig decreases the buffer to his pusher so that he can see any enemies that appear from his pusher's danger areas. This decreases the likelihood of a baiting incident.
- Maximizing Distance To Pit Danger Areas - Roughly at 22 seconds into the video, k0nfig becomes responsible for covering the Pit danger areas. He looks directly at Pit at 24 seconds into the video. He takes a trajectory to LongA that maximizes distance to Pit so that he is as small as possible if an enemy appears in the Pit danger areas.
- Path Randomization - At 27 seconds into the video, k0nfig has determined that Pit is clear and now begins moving randomly. This makes him harder to hit if an enemy appears from a danger area that k0nfig isn't responsible for watching, like those in LongA. k0nfig jumps (shown in the visualization by the T becoming larger, larger T/O in the visualization means larger z value) and runs around in circles.
Game State Input To Navigation Controller
The controller will receive a discrete representation of the world in order to simplifying learning/optimization. This discretized input is:
- Map Nav Cells - The navmesh areas are too coarse to navigate while seeming human. The image on the right demonstrates this coarseness. The controller will receive the map after each area is split into contiguous 16x16 regions. 16 units is half a player's width.
- Starting Cell and Ending Region - The starting cell for the trajectory (the bot's current cell) and all the valid cells in the ending region.
- Teammate Cells and Enemy Danger Cells Distribution - The minimap always shows a player's teammates, so I will allow the bot to always know its teammates' cells. I will use a model to predict the distributions of enemies' possible positions.
Navigation Controller Behavioral Output Schema and Frequency
The model will output a human-like trajectory of (x,y,z) points that the bot can walk along for the next five seconds. The bot can deviate from this trajectory if an enemy appears and the navigation controller doesn't attempt to flee.
The navigation planning will be rerun every 100 milliseconds and every time the bot enters a new nav mesh area. Each nav mesh area is a convex region where the bot can travel without becoming stuck on a wall. Transitions between areas are likely times for a trajectory to become invalidated and a bot to become stuck.
Navigation Metrics for Performance, Style, and Unpredictability
The two performance metrics are:
- Distance to Teammates - This metric measures distance to teammates that are no more than 10 seconds away when running at top speed (with knife out). This metric measures how well a bot avoids blocking and baiting. It is limited to a 10 second distance threshold to avoid considering teammates who don't impact a trajectory.
- Distance to Assigned Danger Area - This metric measures distance to danger areas that teammates haven't looked at in the last five seconds. "Looked at" means that the danger area was visible to the teammate and their view vector intersected the danger area AABB. It is limited to a five second threshold since assigned danger areas fluctuate over time.
The style metrics are median, min, and max of trajectory lengths. This metric measures the types of paths a bot likes to take.
The randomization metric is a histogram of cell utilization frequency in paths across the map. This metric measures how well a bot randomizes its movement along different trajectories.
Combat Crosshair Control
The Combat Crosshair behavior generator controls a player's movement during combat: the one to five seconds when a player engages an enemy by shooting at them. Each engagement is short because weapons are lethal: either a player kills his enemy or vice-versa after a few seconds. There are three important characteristics of this behavior:
- Start at the Head and Pull Down - Players start by aiming at the enemy head. This maximizes the odds of scoring a headshot (which does the most damage) with the initial bullet. The head is small, but the first bullet in a spray is the most accurate, so players go for max damage reward with this bullet. Subsequent bullets are less accurate, so players aim for the upper torso with these. This maximizes odds of the bullets hitting the enemy, while also enabling high misses to hit the enemy's head.
Jerky, 100ms Ballistic Curves For Spatially Constrained Crosshair
Tasks - As predicted by the BUMP
model, players move their crosshair in 100ms ballistic curves when performing spatially
constrained movements: crosshair movements to a target position that are as fast as possible. In
CSGO, spatially constrained movements occur when initially moving a crosshair to a target's head or
when adjusting aim to track an unpredictable target.
- Note: Unlike the BUMP model, there are frequently periods of no crosshair movement while the player reviews the monitor's rendered image, reacquires the target, and plans their next movement. For the readers familiar with computer architecture, the data does not support the BUMP model's hypothesis that the brain is a perfectly pipelined processor of different sense -> plan -> act threads. Rather, the data shows that the brain frequently has pipeline stalls where it cannot sense or plan while a thread is using the act ALU.
- Smooth, 500ms Roughly Constant Velocity Curves For Temporally Constrained Crosshair Tasks - Also predicted by the BUMP model, players move their crosshair along smoother curves with longer horizons when performing temporally constrained movements: crosshair movements that take a specific amount of time rather than the minimum time. In CSGO, temporally constrained movements occur when controlling spray patterns or tracking an easily predictable target (like a static one) as the aiming player moves.
Pro Crosshair Control Demonstrating Characteristics
In the above video, grim controls his crosshair to engage an enemy. The left graph in the video shows the crosshair speed. The x axis units in this graph are seconds. The y axis units are view angle degrees per second. The right graph below the map shows the delta between grim's current view angle (his crosshair) and his ideal view angle (aiming at the target's head). Both axes have the same units: normalized degrees for yaw/pitch relative to the ideal view angle. The units are normalized by distance to the target so that -1.0 is aiming at the target's feet and 1.0 is aiming that same amount above the head. The image on the right demonstrates this scaling. The larger dots in the graphs are ticks when grim hits his target.
The graphs demonstrate two key characteristics of crosshair movement:
- Start at the Head and Pull Down - During the first 4 seconds of the video/0.9 seconds of the event (see the Time Since Event Start x-axis on the left chart in the video), grim is aiming above the enemy's head. He pulls his crosshair down to the enemy's head. This is likely when grim fires his initial shots (though I don't show missed shots). Then, grim lowers his crosshair further over time to maximize the probability of at least hitting the torso during a spray of bullets.
Jerky, 100ms Ballistic Curves For Spatially Constrained Crosshair
Tasks - There are two different situations when the graphs demonstrate this characteristic.
- Initial Movement To Target Head - Between 0.5 and 1.0 seconds into the event, grim is trying to aim for the enemy's head. At this time, his mouse velocity has multiple, roughly 100ms ballistic curves. In between the curves, he stops his crosshair to review the monitor's rendered image, reacquire the target, and plan his next crosshair movement.
- Tracking a Difficult-To-Predict Victim - During the entire event, the victim (the red V next to grim) unpredictably moves forwards and backwards. After grim has reached the victim's head (after 1.0 seconds), he must still produce jerky crosshair movements (100ms spikes in crosshair speed) as he continuously fixes his mouse tracking of the victim's upper torso/lower head.
The graphs demonstrate the characteristic of Smooth, 500ms Roughly Constant Velocity Curves For Temporally Constrained Crosshair Tasks. ELiGE knows the victim (the red V) is behind a wall, so ELiGE keeps his crosshair on the wall's corner as he walks around it. Since the corner is a static object, it's very easy for ELiGE to predict its position as he moves. This prediction enables him to keep a nearly constantly mouse velocity from 0.4 to 0.8 s in the event.
Game State Input To Crosshair Controller
The controller will receive a continuous representation of the player's position and the enemy's position.
- History of Normalized Delta View Angles From Player To Enemy's Head - This records how far a player is from the enemy's head during the entire engagement up to the current tick.
- Min/Max Enemy Velocity - This tracks how unpredictable the enemy's movement has been.
- Enemy Visibility - This tracks if aiming at the enemy or aiming at a static corner near the enemy's most likely location.
I will initially ignore the world's geometry, since map geometry has limited impact during combat engagements (excluding rare events where an enemy is only partially visible)
Crosshair Controller Behavioral Output Schema and Frequency
The model will output the desired (yaw,pitch) crosshair values for the next tick. This will occur every tick.
The crosshair values are desired rather than actual because crosshair movements are occasionally dropped. CSGO requires an input every 7.8125 ms (for a 128 server). System latency occasionally results in a missed frame. The crosshair controller must be resilient to occasional dropped frames. I need to specify this because some controllers involving feedback loops (send crosshair value, get view angle delta as a result) become unstable when their outputs are ignored.
Crosshair Metrics for Performance, Style, and Unpredictability
The performance metrics are:
- Crosshair Trajectory Max Velocity and Duration - This metric measures the distribution of velocity curves to ensure the bots have jerky movement in spatially constrained tasks and smooth movement in temporally constrained tasks.
- Bullet Accuracy per Enemy Distance and Body Part - This metric measures accuracy after normalizing for shot difficulty. It works by bucketing accuracy (hits/shots) by distance and body part hit.
The style metrics are median, min, and max of shots and hits in a spray bucketed by distance. This metric measures the firing and aiming patterns that a player likes.
The randomization metric is a histogram of delta view angle vectors. This metric ensures the bots and humans have similar, unpredictable distributions of errors.
Request For Feedback
If you have questions or comments about this analysis, please email me at firstname.lastname@example.org.