NetBackup™ Backup Planning and Performance Tuning Guide
- NetBackup capacity planning
- Primary server configuration guidelines
- Media server configuration guidelines
- NetBackup hardware design and tuning considerations
- About NetBackup Media Server Deduplication (MSDP)
- MSDP tuning considerations
- MSDP sizing considerations
- Accelerator performance considerations
- Media configuration guidelines
- How to identify performance bottlenecks
- Best practices
- Best practices: NetBackup AdvancedDisk
- Best practices: NetBackup tape drive cleaning
- Best practices: Universal shares
- NetBackup for VMware sizing and best practices
- Best practices: Storage lifecycle policies (SLPs)
- Measuring Performance
- Table of NetBackup All Log Entries report
- Evaluating system components
- Tuning the NetBackup data transfer path
- NetBackup network performance in the data transfer path
- NetBackup server performance in the data transfer path
- About shared memory (number and size of data buffers)
- About the communication between NetBackup client and media server
- Effect of fragment size on NetBackup restores
- Other NetBackup restore performance issues
- About shared memory (number and size of data buffers)
- Tuning other NetBackup components
- How to improve NetBackup resource allocation
- How to improve FlashBackup performance
- Tuning disk I/O performance
PCI architecture
Peripheral Component Interconnect (PCI), PCI-X architecture was the first step of sending PCI signals quickly to the peripheral cards such as Ethernet NICs, Fibre Channel and Parallel SCSI Host Bus Adapters as well as RAID controllers, all of which enabled RAID storage and advanced connectivity to many servers in a network.
PCI-X was a very good start, beginning in 1998, to the solution. It was a parallel interface and utilized an expander to derive multiple "slots" from the signals sent from the CPUs. With parallel architecture, timing of the signals needed to be rigidly enforced as they all needed to arrive or be sent at the concurrent times. With this restriction, the overall speed and latency of the system was limited to the frequency of the timing circuitry in the hardware. As the speed market needs kept increasing, the difficulty of maintaining the concurrent timing became more and more difficult.
PCIe came into being in 2002 and provided the change to the Peripheral Component Interconnect with two features; serial communication and direct communication from the processor to the PCIe enabled card, NIC, HBA, RAID, and so on. This allowed for a significant increase in bandwidth as multiple lanes of PCIe could be allocated to the cards. As an example, Fibre Channel Host Bus Adapters in 1998 had speeds of 1 Gb with PCI-X, and today, 22 years later, the standard is 16Gb and 32Gb is expected to surpass 16Gb in the next two years.
PCI-X @ 133Mhz was the last widely supported speed. PCIe supplanted PCI-X at 800MB/s data transfer speed. PCIe 3 can today achieve up to 15.745GB/s with 16 lane cards. PCIe 4, which is available today on AMD processor systems and will be available on Intel based systems in 2021, can reach 15.745GB/s with 8 lane cards as PCIe 4 doubles the transfer rates of the current PICe 3 Architecture. The following page notes the speed capability of the versions past and future. By 2026, the supported PCIe throughput is expected to increase 8-fold.
It is expected that the number of PCIe lanes per processor will increase rapidly in the future. It's easy to see that the race to increase lanes dramatically is on by reviewing currently available processors. For example, the Intel processor family has 40 PCIe lanes, and AMD has countered with 128 lanes per processor.
Table: PCI Express Link performance
Version | Introduced | Line code | Transfer rate | Throughput | ||||
---|---|---|---|---|---|---|---|---|
1 lane | 2 lane | 4 lane | 8 lane | 16 lane | ||||
1.0 | 2003 | 8b/10b | 2.5 GT/s | 0.250GB/s | 0.500GB/s | 01.00GB/s | 2.00GB/s | 4.00GB/s |
2.0 | 2007 | 8b/10b | 5.0 GT/s | 0.500GB/s | 1.00GB/s | 2.00GB/s | 4.00GB/s | 8.00GB/s |
3.0 | 2010 | 128b/130b | 8.0 GT/s | 0.985GB/s | 01.969GB/s | 3.938GB/s | 7.877GB/s | 15.754GB/s |
4.0 | 2017 (now on AMD) | 128b/130b | 16.0GT/s | 1.969GB/s | 3.938GB/s | 7.877GB/s | 15.754GB/s | 31.508GB/s |
5.0 | 2019 (projected 2022) | 128b/130b | 32.0GT/s | 3.938GB/s | 7.877GB/s | 15.754GB/s | 31.508GB/s | 63.015GB/s |
6.0 | 2021 (projected 2024) | 128b/130b+ PAM-4+ECC | 64.0GT/s | 7.877GB/s | 15.754GB/s | 31.508GB/s | 63.015GB/s | 126.031GB/s |
With the advance in speed of the CPU to peripheral communication, both latency (decreased lag) and data transfer rate (more data per second) have improved dramatically. Intel CPUs will increase from present Skylake and Cascade Lake families of 40 PCIe 3 lanes to Ice Lake with 64 PCIe 4 lanes. As noted earlier, AMD had built their processors with 128 PCIe4 lanes. The reason for this up trend is peripherals other than Ethernet, Fibre Channel, and RAID are quickly earning a place on the bus.
NVMe SSDs (Non-Volatile Memory express, Solid State Drives) have quickly carved out a significant niche in the market. The primary advantage they possess is the use of PCIe connection to the processor. These SSDs do not require a SAS or SATA interface to communicate, which results in significant speed and latency advantages because the inherent media conversion is not needed. With the aforementioned PCIe 4 coming into being and the expansion of the number of PCIe lanes, the speed of the NVMe SSD will double, increasing throughput and decreasing (slightly but measurable) access timing.
The latest designs of Intel and AMD motherboards accommodate the NVMe architecture as the primary storage for the future. It is expected that in 2021, systems with the new architecture will be available and will have density up to 12.8TB, speeds of 8,000 MB/s reads and 3,800 MB/s writes and 24 SSDs. These new systems will be dramatically faster than earlier disk-based solutions, which can struggle to reach 10 - 12GB/s reads or writes. The new architecture will also increase network reads and writes; 200GB/s is not a difficult to reach read nor is a 100 GB/s Write. As a scale perspective a 30TB backup at 0% deduplication would take 8 seconds with the proper transport: 304 connections of 100Gb Ethernet NIC ports. Now, this is not going to be the kind of network bandwidth we can expect, but it is illustrative of the coming speeds.
The future of the PCI-e technology is to move to PCI-e 5.0 starting in 2022 and 6.0 in 2024. This cadence would appear to be rather optimistic given the past history of the PCIe revisions as shown below.
It should be noted, however, that the specifications for revisions 5.0 and 6.0 are well defined. The significant challenge appears to be, from the delays that were incurred on the 4.0 release, with routing on motherboards. It stands to reason that the PCIe 5 and 6 will be relegated to very high-end systems initially, such as 4 and 8 socket systems that can more adequately use the additional bandwidth and number of lanes.