The reason for STP
The reason the STP protocol was created was because of loops on switches. What is a loop? The definition of a loop is :
Bridging loop (Switching loop) - A state in a network in which frames are sent endlessly between switches connected to the same network segment.
From the definition, it is clear that creating a loop causes big problems - it overloads the switches and makes that segment of the network inoperable. How does a loop occur? The picture below shows the topology in which a loop will occur in the absence of any protection mechanisms:
Occurrence of a loop under the following conditions :
1. Any of the hosts sends a fordcast frame :
- For example, VPC5 sends a packet with a fordcast destination address.
- Switch1, after receiving this packet, must send it through all ports except the port from which the packet came. The packet will be sent through ports Gi0/0, Gi1/0.
- Switches Switch2, Switch3 having received this packet will also have to send it out the packet. So Switch2, which received the packet from Switch1 will send it to Switch3, and Switch3 will send it to Switch2.
- Next, Switch2 receiving a packet from Switch3 will send it to Switch1, and Switch3 receiving a packet from Switch2 will also send it to Switch1. Thus, we come to step 1) and it will continue indefinitely. It is also made worse by the fact that in step 4), Switch1 will already have two instances of the frame, since it will receive them from both Switch2 and Switch3.
Steps 1) - 4) will repeat indefinitely and on the switches it happens in a split second. This also means that the macro address table on the switches will keep changing and the macro address of the VPC5 sender will be constantly mapped to Gi0/0, then Gi1/0 or Gi0/2 interfaces (if VPC5 is sending other packets at that moment). This cycle will cause the network and all switches to work incorrectly. Sending fordcast packets to hosts is common and ARP is an example of this.
2. A loop can also be formed without sending a fordcast frame.
- For example, VPC5 sends a frame with a unicast destination mac address.
- It is possible that the destination mac address is not in the switch mac address table. In this case, the switch will forward the packet through all ports except the port from which it received this frame. This results in the same situation as with the forwarding frame.
- Below we will look at the STP protocol on Cisco switches. They use STP separately for each vlan, the PVST+ protocol. We only have one vlan, so the point doesn't change.
STP basics
This protocol works on the principle that all redundant channels between switches are logically blocked and no traffic is sent through them. To build a topology without redundant channels, a tree (mathematical graph) is built. To build such a tree, first it is necessary to determine the root of the tree, from which the graph will be built. Therefore, the first step of the STP protocol is to determine the root switch (Root Switch). To determine the Root Switch, the switches exchange BPDU messages. In general, the STP protocol uses two types of messages: the BPDU, which contains information about the switches, and the TCN, which notifies about changes in the topology. Consider BPDU in more detail. About TCN in more detail we will talk below. When STP is enabled on the switches, the switches start sending BPDU messages. These messages contain the following information :
The BPDU frame has the following fields :
- STA protocol version identifier (2 bytes). Switches must support the same version of the STA protocol
- STP protocol version (1 byte)
- BPDU type (1 byte). There are 2 types of BPDUs - configuration and reconfiguration notification
- Flags (1 byte)
- Идентификатор корневого коммутатора (8 байт)
- Root Path Cost (8 bytes)
- Bridge ID (8 bytes)
- Port ID of the port this packet is coming from (2 bytes)
- Message lifetime (2 bytes). Measured in units of 0.5 s, it is used for detecting obsolete messages
- Maximum message lifetime (2 bytes). If a BPDU frame has a lifetime greater than the maximum, the frame is ignored by the switches
- Hello interval (2 bytes), the interval at which BPDU packets are sent
- Delayed state change (2 bytes). Minimum time for the switch to enter the active state
The main fields that require special attention are the following :
- Sender ID (Bridge ID)
- Root Bridge ID
- The Port ID of the port this packet was sent from (Port ID)
- Root Path Cost (Root Path Cost)
The switch ID is used to identify the root switch. The Bridge ID is an 8-byte number that consists of the Bridge Priority (priority, 0 to 65535, the default is 32768) and the MAC address of the device. The root switch selects the switch with the lowest priority, if the priorities are equal, the MAC addresses are compared (character by character, the smaller one wins).
Here is the output of the Bridge ID information from Switch1 from the first picture. Priority is 32769 (default is 32768 + Vlan Id), MAC addresses are Address 5000.0001.0000:
Let's imagine the picture, the switches have just turned on and are now starting to build a loop-free topology. Once the switches have booted up, they start sending out a BPDU informing everyone that they are the root of the tree. In the BPDU as the Root Bridge ID, the switches specify their own Bridge ID. For example, Switch1 sends a BPDU to Switch3, and Switch3 sends to Switch1. BPDU from Switch1 to Switch3:
BPDU from Switch3 to Switch1:
As you can see from the Root Identifier, both switches tell each other that it is the Root switch.
Root switch selection
Until the STP topology is built, normal traffic is not transmitted because of the special port states discussed below. So, Switch3 receives the BPDU from Switch1 and examines this message. Switch3 looks in the Root Bridge ID field and sees that it has a different Root Bridge ID than the message that Switch3 sent. He compares the Root Bridge ID in this message with his own Root Bridge ID and sees that even though Priority is the same, the MAC address of this switch (Switch1) is better (smaller) than his. Therefore, Switch3 takes the Root Bridge ID from Switch1 and stops sending its BPDUs and only listens to the BPDUs from Switch1. The port on which the best BPDU was received becomes the Root Port. Switch1 also receives a BPDU from Switch3 and does a comparison, but in this case, Switch1's behavior does not change because the BPDU received contains a worse Root Bridge ID than Switch1's. Thus, a root switch has been defined between Switch1 and Switch3. A similar pattern is used to select the root switch between Switch1 and Switch2. The Gi0/0 ports on Switch2 and Switch3 become the Root Port - the port that leads to the root switch. Through this port, Switch2 and Switch3 receive BPDUs from the Root Bridge. Now let's understand what happens to the channel between Switch2 and Switch3.
Blocks redundant channels
As we see from the topology, the channel between Switch2 and Switch3 must be blocked to prevent loops. How does STP handle this?
After the Root Bridge is selected, Switch2 and Switch3 stop sending BPDUs via Root Ports, but they send BPDUs received from the Root Bridge via all their other active ports, changing only the following fields in the BPDU data :
- Sender ID (Bridge ID) - replaced by its own ID.
- The Port ID from which this packet is sent (Port ID) - is changed to the port ID from which the BPDU will be sent.
- Root Path Cost - calculates the cost of the route relative to the switch itself.
Thus, Switch2 receives the following BPDU from Switch3:
And Switch3 gets this BPDU from Switch2:
After exchanging such BPDUs, Switch2 and Switch3 realize that the topology is redundant. Why do the switches realize that the topology is redundant? Both Switch2 and Switch3 report the same Root Bridge in their BPDUs. This means that there are two paths to the Root Bridge, relative to Switch3, through Switch1 and Switch2, and this is the redundancy we are fighting against. There are also two ways for Switch2, through Switch1 and Switch3. To get rid of this redundancy
we need to block the channel between Switch3 and Switch2. How does this happen?
Selecting which switch to block the port on is done as follows :
- Smaller Root Path Cost.
- Smaller Bridge ID.
- Smaller Port ID.
In this scheme, Root Path Cost plays a more important role than Bridge ID. I used to think that this selection was similar to the Root switch selection and was surprised that, for example, in this topology it would not be the port on the switch with the worst priority that would be blocked :
Here it appears that the Gi 0/1 port on the Sw2 switch will be blocked. In this vote, the Root Path Cost becomes decisive. Back to our topology. Since the path to Root Bridge is the same, Switch2 wins in this choice, since its priorities are equal, the Bridge IDs are compared. Switch2 has 50:00:00:00:02:00:00, Switch3 has 50:00:00:00:03:00:00. Switch2 has a better (smaller) MAC address. After the selection is made, Switch3 stops forwarding any packets on this port - Gi1/0, including BPDU, but only listens for BPDU from Switch2. This state of the port in STP is called Blocking(BLK). The Gi1/0 port on Switch2 works in normal mode and sends different packets when needed, but Switch3 discards them immediately, listening only to the BPDU. So in this example we have built a topology without redundant channels. The only redundant channel between Switch2 and Switch3 was blocked by moving the Gi1/0 port on Switch3 to a special blocking state - BLK. Now let's look at the STP mechanisms in more detail.
Port states
We said above that, for example, port Gi1/0 on Switch3 goes into a special blocking state - Blocking. In STP there are the following port states :
Blocking - blocking. In this state, no frames are sent through the port. Used to avoid topology redundancy.
Listening . - Listening. As we said above, until a Root Switch is selected, the ports are in a special state where only BPDUs are transmitted, no data frames are transmitted or received in this case. The Listening state does not go to the next state even if the Root Bridge is defined. This port state lasts for a Forward delay timer, which, by default, is 15. Why do you always have to wait 15 seconds? This is due to the caution of the STP protocol so that the wrong Root Bridge is not accidentally selected. After this period, the port goes to the next state - Learning.
Learning - learning. In this state, the port listens and sends BPDUs, but does not send data information. The difference between this state and Listening is that data frames that come to the port are learned and MAC address information is written to the switch's MAC address table. Moving to the next state also takes the Forward delay timer.
Forwarding - forwarding. This is the normal state of the port, where both BPDU packets and normal data frames are sent. Thus, if we go through the pattern when the switches have just loaded, we get the following pattern :
- The switch puts all its connected ports in Listening state and starts sending BPDUs where it announces itself as the root switch. During this time period, either the switch remains root unless it receives a better BPDU, or it selects the root switch. This lasts for 15 seconds.
- Then it goes into the Learning state and learns the MAC addresses. 15 seconds.
- Determines which ports should be Forwarding and which should be Blocking.
Port roles
In addition to port states, you also need to define port roles in STP. This is done so that which port should expect BPDUs from the root switch and through which ports to transmit copies of BPDUs received from the root switch. The port roles are as follows :
Root Port - the root port of the switch. When the root switch is selected, the root port is also defined. This is the port through which the root switch is connected. For example, in our topology, the Gi0/0 ports on Switch2 and Switch3 are the root ports. Through these ports, Switch2 and Switch3 do not send BPDUs, but only listen to them from the Root Bridge. That begs the question - how is the root port selected? Why is port Gi1/0 not selected? You can communicate with the switch through this port too, right? To determine the root port in STP uses a metric that specifies in the BPDU field - Root Path Cost (the cost of the route to the root switch). This cost is determined by the link speed.
Switch1 puts 0 in its BPDUs in the Root Path Cost field because it is a Root Bridge itself. But when Switch2, when it sends a BPDU to Switch3, it changes this field. It puts Root Path Cost equal to the channel cost between itself and Switch1. In the BPDU picture from Switch2 and Switch3, you can see that the Root Path Cost in this field is 4, because the channel between Switch1 and Switch2 is 1 Gbps. If there are more switches, then each next switch will add up the Root Path Cost. Root Path Cost table.
Designated Port - The designated port of a segment. For each network segment, there must be a port that is responsible for connecting that segment to the network. Conventionally speaking, a network segment can refer to the cable that makes the connection for that segment. For example, ports Gi0/2 on Switch1, Switch3 connect individual network segments to which only that cable leads. Also, for example, the ports on the Root Bridge cannot be blocked and are all assigned ports on the segment. After this explanation, you can provide more stringent definitions for assigned ports :
Designated Port - A non-root bridge port between network segments that receives traffic from the corresponding segment. Each network segment can have only one designated port. The root switch has all ports that are assigned.
It is also important to note that port Gi1/0 on Switch2 is also assigned even though this link is blocked on Switch3. Conventionally speaking, Switch2 has no information that the port on the other end is blocked.
Nondesignated Port - A non-designated port on a segment. Non-designated Port - A port that is not a root port or designated port. Transmission of data frames through such a port is forbidden. In our example, port Gi1/0 is a non-designated port.
Disabled Port - port that is in the disabled state.
Timers and convergence of STP protocol
After STP has completed a loop-free topology, the question remains - How to detect and respond to changes in the network? The BPDU messages with which STP operates are sent out by Root Bridge every 2 seconds, by default. This timer is called the Hello Timer. The other switches receive this message through their root port and send it through all the assigned ports. Above it is said in more detail what changes happen to BPDU when it is forwarded to the switches. If during the time specified by the Max Age timer (by default - 20 seconds) the switch has not received a single BPDU from the root switch, then this event is treated as a loss of communication with the Root Bridge. To describe protocol convergence more accurately we need to change our topology and put hubs between the switches. We added hubs so that if one switch fails or the link fails, the other switches would not detect it by a dropped link, but would use timers:
Before we start it is also important to detail another type of STP message - TCN. The TCN is sent out by switches in case of topology changes - as soon as the topology on any switch has changed, for example, the state of the interface has changed. The TCN is sent by the switch only through the Root Port. As soon as the root switch receives the TCN, it will change the MAC address table time from 300 seconds to 15 seconds (see below) and in the next BPDU, the Root Switch sends a TCA ( Topology Change Acknledgement ) which tells the switch that sent the TCN to notify it that the TCN was received. Once the TCN reaches the Root Bridge it sends out a special BPDU which contains the TCN flag on all
interfaces to the other switches. The picture shows the structure of the TCN:
TCN was included in the STP so that the root switches could notify about changes in the network. They cannot do this with normal BPDUs because the root switches do not send BPDUs. As you can see the TCN structure does not carry any information about what has changed and where, it just tells you that something has changed somewhere. Now let's move on to the issue of STP convergence.
Let's see what happens if we disable the Gi0/1 interface on Switch1 and see what mechanisms rearrange the STP tree. Switch2 will stop receiving BPDU from Switch1 and will not receive BPDU from Switch3 because Switch3 has this port blocked. It will take Switch2 20 seconds (Max Age Timer) to realize the loss of communication with the Root Bridge. Until that time, Gi0/0 on Switch2 will be in Forwarding state with the Root Port role. Once the Max Age Timer expires and Switch2 realizes it has lost communication, it will rebuild the STP tree and as is typical of STP, it will assume it is a Root Bridge. It will send a new BPDU where it specifies itself as Root Bridge through all active ports, including Switch3. But the Max Age timer that expired on Switch2 also expired on Switch3 for the Gi1/0 interface. This port has not received a BPDU for 20 seconds and this port will go into LISTENING state and send a BPDU with Switch1 as its Root Bridge. Once Switch2 accepts this BPDU, it will no longer consider itself a Root Bridge and will select the Gi1/0 interface as the Root Port. At this point, Switch2 will also send TCN through Gi1/0, since this is the new Root Port. This will cause the MAC address
storage time on the switches to be reduced from 300 seconds to 15 seconds. But this will not restore the full functionality of the network, you need to wait for the Gi1/0 port on Switch3 to go through the Listening and then Learning state. This will take time equal to two periods of Forward delay timer - 15 + 15 = 30 seconds. What we get - when the loss of connection Switch2 waits until the timer Max Age = 20 seconds, re-selects the Root Bridge through another interface and waits another 30 seconds until the previously blocked port goes into Forwarding state. Altogether we get that the connection between VPC5 and VPC6 will be broken for 50 seconds. As mentioned a few sentences above, when the Root Port was changed from Gi0/0 to Gi1/0, a TCN was sent to Switch2. If this did not happen, all MAC addresses learned through the Gi 0/0 port would remain bound to Gi0/0. For example, the MAC address of VPC5 and VPC7 despite the fact that the STP would complete convergence in 50 seconds, the connection between VPC6 and VPC5, VPC7 would not have been restored, since all packets destined for VPC5, VPC7 were sent through Gi0/0. You would have to wait not 50 seconds, but 300 seconds for the MAC address table to rebuild. With TCN, the storage time was changed from 300 seconds to 15 and while the Gi1/0 interface on Switch3 passed the Listening states, and then Learning and the MAC address data would be updated.
Another interesting question is what happens if we re-enable the Gi0/1 interface on Switch1? When we turn on the Gi0/1 interface, it will go into Listening state and start sending BPDUs as it should. As soon as Switch2 receives BPDU on the Gi0/0 interface, it will immediately select its Root Port, as this is the lowest Cost and will start sending traffic through the Gi0/0 interface, but we have to wait until the interface Gi0/1 goes through the Listening, Learning to Forwarding state. And the delay will not be 50 seconds, but 30.
In the STP protocol, also thought of the various technologies to optimize and secure the STP protocol. In more detail in this article I will not consider them, materials on them you can find in abundance on various sites.