In this post, I will go over the most common concepts and terminology you need to know about Border Gateway Protocol (BGP).
As an enterprise infrastructure route-switch engineer, you need to understand BGP because at some point, you’ll need to configure redundant BGP routers to the Internet via one or two Internet Service Providers (ISPs) or to one or two MPLS lines. And, you need to excel at it!
If you’re ready, let’s get started.
Table of Contents
- What is BGP?
- BGP Operation
- BGP Message Types
- BGP Path Attributes
- BGP Decision Process
- BGP Configuration
- Final Thoughts
What is BGP?
BGP is an inter-domain routing protocol capable of accurately differentiating multiple routes to the same destination, avoiding loops, and interacting with IGPs in calculating distances. (IGPs are Interior Gateway Protocols such as EIGRP and OSPF.)
BGP is a vector protocol since each BGP node relies on downstream neighbors to pass along routes from their routing tables. Each BGP node makes its route calculations based on received routes and passes the best routes to upstream neighbors. The best route is the winning route chosen out of a set of routes towards the same destination received from different BGP neighbors.
More specifically and from a broader view, BGP is also called a path vector protocol because BGP uses a list of Autonomous Systems (AS) numbers to quantify the distance to destination networks.
Let’s discuss and understand what an Autonomous System is.
What is an Autonomous System?
An Autonomous System is a collection of routers that run BGP, and other routing protocols, managed by one administrative entity (company/business). For instance, Internet Service Providers (ISPs) such as Verizon, AT&T, Level3, PennTeleData, Comcast, LightPath, Lumen, and others, have their own Autonomous System numbers. Each ISP manages its own network of BGP routers within its Autonomous System.
Each router is configured with the BGP Autonomous System number that it belongs to. Each router establishes a “BGP peering” with other BGP routers so they can talk BGP (exchange BGP messages). The BGP peering can be with another BGP router within or outside their own Autonomous System. A BGP peering with a router inside the Autonomous System is called internal BGP (iBGP). A BGP peering with a router outside of the Autonomous System is called external BGP (eBGP).
The list of AS numbers associated with a BGP route is called the AS_PATH and is one of several path “BGP Attributes” associated with each route. So, each BGP route received comes with a set of BGP Attributes that each BGP router looks into to decide the “best route” to each destination (i.e.: corporate networks).
For instance, notice in the diagram below that AS 8 is receiving an advertisement from AS 5 with an AS_PATH of [5,6,1,2] and another advertisement from AS 9 with an AS_PATH of [9,4,2]. Each number inside the brackets represents each AS the advertisement has traversed. The originating AS was AS 2 and you follow the AS path sequence from right to left.
As mentioned before, each advertisement contains several BGP attributes that are analyzed to define the best route for each destination. One of those attributes is the AS_PATH. In the example above, if the AS_PATH was the only BGP attribute considered to determine the best path to 220.127.116.11/24, from AS 8’s perspective, the route received from AS 9 would win because of its shorter AS_PATH. The route received from AS 9 has three ASs, [9,4,2] whereas the route to the same destination received from AS 5 has four ASs, [5,6,1,2]. Again, notice that the originator AS number is placed on the very right.
Inside each AS, there are BGP routers running an IGP (Interior Routing Protocol) such as OSPF or IS-IS.
What is the current version of BGP in use?
The current version of BGP is BGP-4 which was introduced in 1995 in RFC 1771. This RFC was then obsoleted by RFC 4271.
What is the goal of BGP?
The main goal of BGP is to keep routing within the Internet both manageable and reliable with the support of Classless Inter-Domain Routing (CIDR).
What port does BGP use?
As its underlying delivery mechanism and to increase the reliability of the connection, BGP uses TCP port 179. The TCP layer handles session maintenance and updates mechanisms with acknowledgment, retransmission, and sequencing. Moreover, since TCP is a point-to-point protocol, each BGP neighbor must establish a separate point-to-point session with other BGP neighbors.
These are the basics of the operation of BGP:
- BGP sends unicast messages and forms a separate point-to-point connection with each BGP peer. This point-to-point connection is based on TCP at port 179.
- BGP is an application layer protocol that relies on TCP for session maintenance tasks such as acknowledgment, retransmission, and sequencing.
- BGP is a path vector protocol because it sees the route to a destination network as a path through a series of autonomous systems rather than a series of routers (hops).
- A BGP route describes the path vector using a specific route attribute called the AS_PATH (there are other BGP attributes), which sequentially lists all the autonomous system numbers that make up the path to the destination network.
- In addition to the AS_PATH attribute, there are several other weighted attributes associated with each route. Given multiple routes to the same destination, the router looks into all the BGP attributes sequentially top-down, and based on their predetermined weight, the router picks a winner route called the “best” route. For instance, if all other previous attributes are the same for, let’s say, three routes to the same destination, when it comes to looking at the AS_PATH attribute, the route with the shortest AS_PATH (fewest AS numbers) is chosen as the best route and this is the route that’s sent out to other BGP peers.
- BGP best routes are sent to the routing table or Routing Information Base (RIB) in an attempt to be placed in the RIB. If there are routes in the RIB for the same destinations with better administrative distances, BGP routes won’t make it into the RIB.
- A router receiving a BGP route with its own AS number in the AS_PATH assumes a loop and discards the route.
- If a router has a BGP session with a neighbor with a different AS number, the session is called external BGP (eBGP). The neighbor is called an eBGP neighbor.
- If a router has a BGP session with a neighbor with the same AS number, the session is called internal BPG (iBGP). The neighbor is called an iBGP neighbor.
The default Administrative Distance (AD) for external BGP routes is 20. The default AD value for internal routes is 200. BGP is interested in getting traffic out of the Autonomous System. BGP doesn’t want to keep traffic inside the Autonomous System. So, external routes with an AD of 20 are preferred.
BGP Message Types
To establish a BGP peer connection, two neighbors must open a TCP connection on port 179. TCP performs the standard three-way handshake and provides fragmentation, retransmission, acknowledgment, and sequencing. Over the TCP connection, BGP sends unicast messages to neighbors. BGP uses four basic message types:
BGP Open Message
The BGP open message is sent after the TCP session is established. The BGP router uses this open message to identify itself and announce its BGP operational parameters such as:
- BGP version number: Cisco IOS default is Version 4.
- Autonomous system number: this is AS number the router belongs to. If the numbers differ, they’ll establish an eBGP session. If the numbers are the same, they’ll become iBGP neighbors. Keep in mind that a BGP router, and all of its interfaces, can belong to one autonomous system at a time.
- Hold time: this is the maximum number of seconds the router will wait to receive a Keepalive or an Update message before the router declares the neighbor as down. Cisco’s default hold time is 180 seconds. If within 180 seconds the router doesn’t receive a keepalive or an update message, the router will understand that the neighbor is down. If two BGP routers exchange open messages with different hold times, they will agree on the smaller of the two.
- BGP identifier: this is an IPv4 address that identifies the BGP router. The numerically highest IP address of the loopbacks is used. If no loopbacks with IP addresses are configured, the numerically highest IP address of the physical interfaces is used. You can manually configure the BGP identifier.
- Optional parameters: this field is used to advertise the router’s support for optional capabilities such as authentication, multiprotocol support, and route refresh.
BGP Keepalive Message
If a BGP router accepts the parameters specified in the neighbor’s Open message, the receiving BGP router responds with a Keepalive, which is different from the TCP keepalive. By default, subsequent Keepalives are sent every 60 seconds or a period equal to one-third of the agreed-upon holdtime. Like the holdtime, the keepalive interval can also be specified for the whole BGP process or on a per-neighbor basis.
BGP Update Message
The BGP Update message advertises feasible routes, withdrawn routes, or both. The Update message includes the following information:
- Network Layer Reachability Information (NLRI)
- Path attributes
- Withdrawn routes
The NLRI comprises one or more Length-Prefix tuples that represent destination prefixes and their lengths. For instance, if 18.104.22.168/19 were advertised, the Length portion of the NLRI would show /19 and the Prefix portion 216.203.170.
Path attributes are characteristics of the advertised NLRI that BGP uses to choose the best path, detect routing loops, and determine routing policy.
Withdrawn routes are length-prefix tuples describing destination networks that have become unreachable and are being removed from service.
BGP Notification Message
The BGP notification message always causes the BGP connection to close and is always sent whenever an error is detected.
For instance, if a BGP-4 router receives an Open message specifying version 3, the BGP-4 router responds with an “unsupported version number” Notification message indicating that version 3 is not supported, and as a result, the session is terminated. Other reasons why a Notification message is sent are:
- Message Header Error
- Connection not synchronized
- Bad message length
- Bad message type
- Open Message Error
- Bad peer AS
- Bad BGP identifier
- Unsupported optional parameter
- Unacceptable hold time
- Update Message Error
- Malformed attribute list
- Unrecognized well-known attribute
- Missing well-known attribute
- Attribute flags error
- Attribute length error
- Invalid ORIGIN attribute
- Invalid NEXT_HOP attribute
- Optional attribute error
- Invalid network field
- Malformed AS_PATH
- Hold Timer Expired
- Finite State Machine Error
BGP Path Attributes
A BGP path attribute is a characteristic of an advertised BGP route. Just as every route advertisement has information about the destination (address prefix), an informational value to determine which of the routes received to the same destination is better (metric), and some directional information about the destination (next hop), BGP routes also include a number of other attributes that are designed to be manipulated for the creation and communication of routing policies.
Each BGP path attribute falls into one of the four categories:
- Well-known mandatory
- Well-known discretionary
- Optional transitive
- Optional nontransitive
A well-known attribute means that the attribute must be recognized by all BGP implementations.
An optional attribute means that the BGP implementation is not required to support the attribute.
A mandatory attribute means that it must be included in all BGP Update messages.
A discretionary attribute means that it may or may not be sent in a specific Update message.
A transitive attribute means the BGP process should accept the Update in which it is included even if the BGP process does not support the attribute, and it should also pass the attribute on to its BGP peers.
A nontransitive attribute means that a BGP process that does not recognize the attribute can ignore the Update in which the attribute is included and not advertise it to its peers. In other words, a nontransitive attribute either can or cannot transit the router.
Here’s the list of BGP Path Attributes.
|EXTENDED COMMUNITY||Optional transitive|
|MULTI_EXIT_DISC (MED)||Optional nontransitive|
|Multiprotocol Reachable NLRI||Optional nontransitive|
|Multiprotocol Unreachable NLRI||Optional nontransitive|
Let’s take a quick look at the well-known mandatory attributes because they are required to be in every BGP update.
The ORIGIN is a well-known mandatory BGP attribute that specifies the origin of the routing update. When BGP learns multiple routes to the same destination, BGP uses each route’s ORIGIN attribute as a tie-breaker to determine the preferred, or best route to the destination network.
- IGP: The NLRI was learned from a protocol internal to the originating AS. IOS gives BGP routes an origin of IGP if the route was injected into BGP with a BGP network statement.
- EGP: The NLRI was learned from the Exterior Gateway Protocol. Because BGP is obsolete, you should never see this origin type.
- Incomplete: The NLRI was learned by a method that’s unknown to the BGP process. The information needed to determine the origin of the route is incomplete. Routes injected into BGP via redistribution carry the incomplete origin attribute.
An ORIGIN attribute of IGP is preferred over EGP, and EGP is preferred over Incomplete; however, since EGP is obsolete, you’ll always encounter routes with the IGP and Incomplete attributes. In this case, IGP is always preferred over Incomplete.
For instance, if a BGP router receives two routes to 22.214.171.124/24, and one route has an ORIGIN attribute of IGP and the other route Incomplete, the BGP router will choose the IGP route over the Incomplete route.
The AS_PATH is a well-known mandatory attribute that uses a sequence of AS numbers to describe the inter-AS path or all the autonomous systems each route has traversed.
When a route leaves the AS in which it originated, the edge router adds its own AS to the AS_PATH. The receiving AS then attaches its own AS when the route leaves the AS, and so on. The result is that the AS_PATH describes all the autonomous systems the route has passed through beginning with the most recent AS and ending with the originating AS.
On the BGP table, the AS that originated the route is seen on the very right of the AS Path. For instance, this BGP router below located on AS 6447 shows that the 126.96.36.199/24 prefix originated in AS 13335 and propagated to AS 7018.
route-views>show ip protocols summary Index Process Name 0 connected 1 static 2 application 3 bgp 6447 *** IP Routing is NSF aware *** route-views> route-views>show ip bgp 188.8.131.52 bestpath BGP routing table entry for 184.108.40.206/24, version 2430557167 Paths: (21 available, best #16, table default) Not advertised to any peer Refresh Epoch 1 7018 13335, (aggregated by 13335 220.127.116.11) 18.104.22.168 from 22.214.171.124 (126.96.36.199) Origin IGP, localpref 100, valid, external, best Community: 7018:2500 7018:37171 path 7FE13C683700 RPKI State valid rx pathid: 0, tx pathid: 0x0 route-views>
The NEXT_HOP attribute is a well-known mandatory attribute that describes the IP address of the next-hop router on the path to the advertised destination. The next-hop IP address is not always the address of a neighboring router. To understand why the next-hop IP address isn’t always the IP address of a neighbor, let’s look at the following rules:
- If the advertising and receiving routers are eBGP peers (routers located in different autonomous systems), the NEXT_HOP is the IP address of the advertising router’s interface.
- If the advertising and receiving routers are iBGP peers (routers located within the same autonomous system), and the Update’s NLRI refers to a destination within the same AS, the NEXT_HOP is the IP address of the router that originated the route. Keep in mind that R1 could peer with R3, which is not directly connected. R1 connects to R2, and R2 to R3. R1 and R3 are iBGP peers. If R3 injects a route into BGP and sends it to R1, R1 receives that route with a NEXT_HOP of R3, not R2.
- If the advertising and receiving routers are iBGP peers and the Update’s NLRI refers to a destination in a different autonomous system, the NEXT_HOP is the IP address of the external peer from which the route was learned.
Weight is a Cisco-specific BGP path attribute that applies only locally on the router. It is not sent to other routers. The weight is a number between 0 and 65535 that can be assigned to a route. The higher the weight, the more preferable the route is. On Cisco routers, the weight attribute is the very first attribute considered.
By default, all routes learned from a peer have a weight of 0, and all routes regenerated locally on the router have a weight of 32,768. If router A is receiving routes from routers B and C, router A could be configured to assign a weight of, let’s say 100, to routes received from B. As a result, routes from B will be preferred over routes from C. Why? Because routes received from B have a weight of 100 whereas routes from C have a default weight of 0. A higher weight is preferred.
Now, let’s take a look at how BGP decides what route is the best path to a destination.
BGP Decision Process
To talk about the BGP decision process, we need to cover the BGP Routing Information Base (RIB). The RIB consists of three parts:
- The Adj-RIBs-In is the routing database that stores unprocessed routing information learned from BGP Updates received from peers. The routes in the Adj-RIBs-In are considered feasible routes.
- The Loc-RIB contains the routes that the BGP router has selected by applying the BGP decision process and inbound policies to the routes in the Adj-RIBs-In table. These routes populate the routing information base (RIB or routing table) along with routes received from other routing protocols.
- The Adj-RIBs-Out holds the routes that result from the outbound routing policies and that the BGP speaker advertises in BGP Updates to its peers.
All three RIBs are virtual and don’t need to be separated at the implementation level as stated in RFC1771 Section 3.2:
“Although the conceptual model distinguishes between Adj-RIBs-In, Loc-RIB, and Adj-RIBs-Out, this neither implies nor requires that an implementation must maintain three separate copies of the routing information. The choice of implementation (for example, 3 copies of the information vs 1 copy with pointers) is not constrained by the protocol.”
Pro tip for Cisco network engineers:
- show ip bgp neighbor x.x.x.x received-routes shows the output of the Adj-RIBs-In table.
- show ip bgp neighbor x.x.x.x routes shows the output of the Loc-RIB table.
- show ip bgp neighbor x.x.x.x advertised-routes shows the output of the Adj-RIBs-Out table.
- show ip bgp shows the output of the Loc-RIB for all neighbors.
The BGP decision process applies incoming routing policies to the routes in the Adj-RIBs-In and places the selected or modified routes in the Loc-RIB. This decision process is divided into three phases:
- In Phase 1, the BGP decision process kicks in every time the router receives a BGP Update from a peer in a neighboring AS that contains a new route, a changed route, or a withdrawn route. Each feasible route is analyzed separately, and a number (nonnegative integer) is assigned to it that indicates the degree of preference for each route.
- Phase 2 is invoked after Phase 1 is completed. In Phase 2, the BGP decision process chooses the best route out of all the feasible routes to each destination and installs the best routes in the Loc-RIB. In Phase 2, loops are also detected by examining the AS_PATH and dropping any routes with the local AS number in their AS_PATH.
- After Phase 2 is completed, if the Loc-RIB has changed, Phase 3 is invoked. In Phase 3, the BGP decision process adds the appropriate routes to the Adj-RIBs-Out for further advertisement to peers. If needed, route aggregation happens in this phase.
The BGP decision process is a sequential set of rules that analyze the BGP attributes of many specific routes to the same destination to break the tie among them to select the best route.
IMPORTANT: If the IP address in the NEXT_HOP attribute is “unreachable,” the route is NOT selected and therefore won’t go through the BGP Decision Process.
So, a reachable NEXT_HOP is the first check.
The BGP decision process used by Cisco IOS is as follows:
- Prefer the route with the highest weight. Remember that the weight attribute is Cisco-specific and local to the router.
- If the weights are equal, prefer the route with the highest LOCAL_PREF value.
- If the LOCAL_PREF values are the same, prefer the route that was originated locally on the router and injected into BGP with the network or aggregate commands or through redistribution. Routes injected into BGP with the network command or redistribution are preferred over a local aggregate injected with the aggregate-address command.
- If the LOCAL_PREF is the same and routes were not locally originated, prefer the route with the shortest AS_PATH.
- If the AS_PATH length is the same, prefer the route with the lowest ORIGIN code. IGP (network statement) is preferred over EGP (obsolete) and EGP is preferred over Incomplete (redistribution).
- If the ORIGIN codes are the same, prefer the route with the lowest Multi_EXIT_DISC (MED or Multi Exit Discriminator) value. The MED is a numerical value, like a metric for IGP, that’s injected into the BGP advertisement and that’s only considered when the routes come from the same AS.
- If the MED values are the same, prefer eBGP routes over Confederation eBGP routes, and prefer Confederation eBGP routes over iBGP routes. eBGP routes are normally preferred over iBGP routes because BGP wants traffic to exit the AS and not to stay within the AS.
- If the MED values are the same, prefer the route with the shortest path to the BGP NEXT_HOP. At this point, the BGP decision process considers the lowest IGP metric to the next-hop IP address.
- If the IGP metrics to the NEXT_HOP addresses are the same, they’re from the same neighboring AS, and BGP multipath is enabled with the maximum-paths command, install all the equal-cost routes in the Loc-RIB.
- If the routes are still equal and external, prefer the route that was received first as it helps to reduce flapping when a newer route takes precedence over an older route. This step is not considered if the bgp best path compare-routerid command is configured.
- If multipath is not enabled, prefer the route with the lowest BGP router ID. If route reflection is used, prefer the route with the lowest ORIGINATOR_ID.
- If the routes are still equal and route reflection is used, prefer the route with the shortest CLUSTER_LIST.
- If the routes are still equal, prefer the route advertised from the neighbor with the lowest IP address.
As you can see, BGP always selects “the best route” for each destination. Knowing the BGP decision process and the correct order of each step is essential to working with BGP.
Again, when a router has more than one alternative route to reach the same IP subnet (network and mask), the router has to select one of the routes as best. To make this selection, the router uses the BGP attributes that are attached to the various updates.
The following table gives you an up-down visual of the BGP decision process with the most common attributes checked:
|NEXT_HOP||IP address used by the BGP router to determine the outbound interface and immediate next-hop address that should be used to forward transit packets to the associated destination.||Is the next hop reachable? Yes. Good. No. Trash.|
|WEIGHT||Administrative value; local significance to the router. Cisco proprietary.||Highest value.|
|LOCAL_PREF||Exchanged between peers with an AS.||Highest value.|
|Self-Originated||Prefer paths originated locally on the router making the decision (you’ll see their next hop = 0.0.0.0 on the BGP table).||True.|
|AS_PATH||Minimize AS hops.||Shortest path.|
|ORIGIN||Prefer IGP-learned routes over EGP, and EGP over unknown/incomplete.|
* IGP routes are those routes injected into BGP with the network statement under the router BGP process.
* EGP is not in use anymore.
* Unknown/incomplete routes are redistributed into BGP from another routing protocol (static routes, RIP, OSPF, EIGRP, etc.)
|IGP is preferred over Incomplete.|
|MULTI_EXIT_DISC||Used externally to enter an AS.||Lowest value.|
|Neighbor Type||Prefer EBGP path over IBGP path.||EBGP.|
|IGP Cost||Prefer path through the smallest IGP neighbor. This IGP refers to the underlying Interior Gateway Protocol used such as OSPF or IS-IS.||Lowest IGP metric.|
|EBGP Peering||Prefer the oldest EBGP path.||Oldest route.|
|Router ID||Use BGP router ID.||Lowest.|
|Neighbor IP Address||Use the neighbor’s IP address.||Lowest.|
The first check that indicates a difference (tie-breaker) is then used for route selection, and no further testing is done down the list.
I covered what BGP is, autonomous systems, types of BGP peerings, BGP operation, path attributes, decision process, and BGP tables.
It is important to understand that BGP is a protocol that runs on routers with the purpose of exchanging route information and deciding what the best route is to reach a target subnet. BGP is very scalable.
Within an Autonomous System, BGP cannot run by itself. BGP needs an underlying Interior Gateway Protocol (IGP) such as OSPF and IS-IS. BGP does not provide the router with an exit interface. In addition to other attributes, BGP provides the IP address of the next hop. To determine the exit interface towards that next hop, the router needs to run an IGP.
Also, know that although BGP can handle a large number of routes, BGP does not converge as fast as EIGRP, OSPF, or IS-IS.
I hope this post was informative to you.
If you have any questions, you can use the comments section below.
4 thoughts on “BGP: The Ultimate Guide ”
Excellent Article in Respect of BGP
Thank you, Om!
Wow!! Great article
Thank you, Prasad!