|AMD Opteron Benchmark|
|Your best source for AMD Athlon 64, barebones, DDR, nForce2, nForce3, nForce4, motherboards, notebooks, Opteron, Sempron & Turion 64 information|
Opteron OPN & ID
Turion 64 Special
Athlon 64 Special
Athlon 64 Motherboards
Dual Core Special
|"For all those who have been keeping an eye on AMD 64-bit technology for a while, I would suggest to bookmark AMDBoard.com’s Opteron Special Live News"
X-bit labs 07/30/03
|AMD Opteron Page
Microsoft & x86-64
AMD on Opteron
Extending x86 Architecture to 64 bits
1 2 3 4 5 6 7 8 9 10 11
|2U Chassis Quad Opteron
Setting a LAN Throughput Record
In light of recent dramatic increases in network bandwidth and capacity, particularly the widespread deployment of gigabit and more recently, 10-gigabit links, some people have questioned whether current server technology could keep pace with the increased LAN capacity.
A major vendor of ultra high-speed network components asked Neal Nelson & Associates to profile a 2U server system and determine the maximum possible network traffic that this form factor could support with current technology.
There were three objectives set for the test:
1) Identify the best components available for a 2U chassis system.
2) Integrate the components into a working system and confirm proper operation.
3) Measure the maximum LAN throughput that could be supported by this system.
Market research revealed that at the time of the test in the spring of 2005 the most powerful 2U form factor machine would use 4 AMD Opteron 850 CPU’s. The AMD 850 CPU specifications are included in Appendix “A”.
Tyan offered a 4 CPU server motherboard as their model S4882. This motherboard had four banks of RAM sockets (one for each CPU) and it supported a Non Uniform Memory Architecture (NUMA) with AMD DirectConnect links between CPU’s. The specifications for the S4882 are included in Appendix “B”. Tyan also offered a chassis and power supply with an S4882 motherboard as a package with model number Transport TX46. This package was selected on the presumption that the Tyan engineers had researched and properly sized the power supply and cooling fans.
The fastest memory supported by the S4882 was DDR 400 registered RAM. ECC DIMMS were purchased because it was felt that in a production environment server class machines would install ECC memory. Two 512 byte DIMMS were installed for each CPU. The Tyan motherboard configures a single DIMM on a 64 bit wide bus. A pair of DIMMs is automatically configured as a 128 bit wide bus with bank interleave. This configuration was selected as the one that would be most likely to deliver the best performance.
Substantial effort was spent in selecting the LAN cards. The chassis required low profile cards. Preliminary testing revealed that cards with multiple Ethernet ports on a single card delivered higher throughput than multiple cards with one port per card. The S4882 has two gigabit Ethernet ports on the motherboard and for the initial configuration we installed a pair of dual port PCI-X cards (one in each of the two PCI-X busses on the motherboard) and also used the two gigabit ports on the motherboard. Later a third dual port LAN card was installed and the ports on the motherboard were left idle. The reasons for this are explained in the test results section below.
The dual port low profile cards that were selected were Intel Pro 1000MT Server cards. The Pro 1000MT cards have driver support for the “zero-copy” feature where data is moved directly from the application space to the adapter buffer without having to transfer through kernel buffers.
Another configuration option was the version of the operating system to run. Initial testing was performed with Solaris x86, RedHat 9.0 (a version of Linux based on the 2.4 kernel) and SUSE LINUX Professional 9.2 (a version of Linux based on the 2.6 kernel). It was immediately apparent that the 2.6-based SUSE product would deliver the highest throughput of the three.
Finally, the test would be performing FTP “gets” and “puts” so an FTP server had to be selected. Vsftpd is an FTP server that also supports the “zero-copy” feature so it was chosen for the test.
The summary list of the server configuration is:
• Tyan S4882 motherboard in a Tyan TX46 2U chassis
• 4 AMD Opteron 850 2.4 GHz CPU’s
• 8 DIMMs (2 per CPU) 512 MB per DIMM
DDR registered 400 MHz CL 2.5
• 3 Intel PRO 1000MT Dual Port Server
Adapters (6 gigabit ports total)
• 2 PCI-X busses
2 LAN cards (4 ports) on 100 MHz PCI-X bus
1 LAN card (2 ports) on 66 MHz PCI-X bus
• 36 gig SCSI disk
• Novell SUSE LINUX Professional 9.2
• Vsftpd FTP server
The test was conducted at the client/server testing facility
located in the offices of Neal Nelson & Associates. This
facility includes a test bed with 96 client machines. The test
bed is a Linux cluster that has been configured to run under
the control of the Neal Nelson Multi-Node Remote Terminal
Emulator software package. In this configuration all of the
RTE client machines can be directed to perform functions
either independently or in a synchronized fashion. For this
test it was determined that FTP “gets” and “puts” would be
performed between the various client machines and the one
Since the server would be configured with multiple gigabit ports, the clients were divided into six sets of 16 machines. The 16 clients in a group were connected to an intermediate switch, and each of the six intermediate switches was connected to a gigabit port on the server. The following diagram shows the general relationship between the machines.
Diagram 1: Relationship of RTE nodes, hub/routers and server
Each test cycle consisted of the following three steps:
1) RTE controlled “users” login from each of the client machines to the server machine.
2) RTE “users” alternately perform “gets” and “puts” for a specified time interval.
3) Data transfer and timing information is collected from each client node and analyzed to compute the total throughput during the test.
A large number of tests were conducted at the beginning of the test sequence. These tests were quite short (5-10 minutes) and were designed to assist with system configuration and tuning. The automated nature of the tests, combined with the process of making carefully controlled changes to the configuration and tunables, provided a very orderly path toward the maximum achievable throughput.
Some of the factors that were tested during this phase included: different versions of Unix/Linux, different brands and models of LAN cards, different numbers of cards plugged into a given PCI-X bus, the speed of the PCI-X bus (66, 100, 133 MHz), the “width” of the PCI bus (32/64 bits), throughput with 1, 2 and 4 CPU’s, throughput with 1, 2 and 4 memory modules per CPU, processor affinity, total number of active users, size of data file transferred, different versions of LAN card driver software, LAN driver tunables (buffer sizes, interrupt limits, offloaded checksum calculations, etc.) and TCP-IP tunables.
Observations and Findings during Preliminary Tests
The onboard (Broadcom) LAN ports worked well for single user, single file transfers but under heavy multi-user load they generated very high numbers of interrupts and context switches. We could not find any tunable or other way to limit these factors so we switched to the Intel LAN cards that provided us with tunables for these factors.
Offloading the checksum process to the LAN cards provided less throughput with no visible decrease in host CPU utilization.
The “zero-copy” option is very important to achieve maximum throughput. The Intel PRO 1000 MT Dual Port Server Adaptors, the vsftpd server software and the SUSE LINUX 2.6 kernel all support this feature.
If a PCI-X bus is configured to run at 133 MHz it can only accept one card. If two cards are plugged in, and they begin to process substantial amounts of data the machine will lock up. In order to run with 2 cards the bus must be strapped down to 100 MHz.
Attempts to manually control interrupt vectoring and processor affinity provided no measurable improvement. The default procedures in 2.6 Linux seem to automatically arrive at optimal values.
When the chassis was configured with 2 CPU’s the highest throughput achieved was 4.8 gigabits per second. When the same chassis was configured with 4 CPU’s the maximum throughput increased to 6.5 gigabits per second.
After the preliminary tests were complete and we felt that we had identified the proper values for all configuration and tuning options we ran a set of final tests. These tests were designed to prove that the machine would support the high throughput for extended periods of time. The longest of these final tests ran 48 hours and transferred over 1,000 terabits (one petabit) of data.
These final tests were the “official” tests and the following data was collected:
Transfer rate 6.5 gigabits/second
Total file transfers 7,885,579 put/get commands
Total packets 136+ billion packets
Total dropped/overrun packets 50 packets
Total data transferred 1.058 petabits
A single 2U server machine built with quality components such as AMD Opteron CPU’s, a Tyan Transport TX46 chassis/motherboard, Intel Pro 1000MT Dual Port Server Adapters, and configured with properly tuned system software can support very high levels of network traffic.
Appendix A: AMD 850 CPU specifications
Appendix B: Tyan S4882 specifications