In this article, I will discuss the most important component, the graphics card, which turns a computer into a deep learning PC. It will be about which hardware can be used to train neural networks the fastest. In the previous report I described which considerations have to be made in order to build such a system at all. Now I describe the advantages and disadvantages of the graphics card and which card I personally chose. The goal I pursued was to build a computer for AI applications in the area of training neural networks that can stand under the desk in my office. It will not be a computer on which only trained networks are executed. For this purpose a current smartphone with an accelerator chip would be sufficient. It should be a computer with which neural networks can be trained on the basis of large amounts of data. If the computer is to be placed under my desk in the office, this means that all components have to be selected in such a way that the computer is as quiet as possible but the costs do not get completely out of hand.
Decision – graphics card
I started with the selection of the graphics card. In principle, the NVIDIA RTX3090 models are very interesting or the Quadro series with the new RTX A6000 or the RTX 8000. The latter has been on the market for a while and has already proven itself at work.
Note: If a much cheaper card like an RTX 3090 is chosen, then the manufacturer of the graphics card does not play a big role for applications in the deep learning environment. Because no special drivers or optimization tools are needed, as it is sometimes necessary for games or is recommended by the gamers themselves. Therefore, the NVIDA chip itself is the essential component and that the card works technically flawlessly, i.e. stays nice and cool. The cooling of the graphics card has to be the focus and it has to be good.
Therefore, I have chosen models from PNY for the listing of graphics cards here. I have already had good experiences with this manufacturer. You should also not forget that PNY is a product partner of Nvidia, i.e. PNY equips its own graphics cards with NVIDIA graphics chipsets. Within the scope of this partnership, PNY is, among other things, the sole board partner and thus manufacturer for the Quadro series, i.e. NVIDIA’s own graphics card series. (source: https://de.wikipedia.org/wiki/PNY_Technologies).
Note: I link for the display of the models below to the Amazon web store with corresponding affiliate links.
PNY GeForce RTX™ 3090 24GB XLR8 (similar RTX 3090 models are used by me in my professional environment.)
The RTX 3090 series is certainly ideal for projects that require a lot of computing power but whose models do not exceed the 24 GB of memory of the graphics card during training. Therefore, this graphics card should be completely sufficient for most cases in the professional and private environment. In the scientific field or in the industry, the 24 GB are often too small when many parallel projects or employees train their models in parallel on a Deep-Learning server. In this case, cards with 48 GB ram like the current Quadro series are the better choice.
Therefore, my recommendation is to get consumer graphics cards from the RTX 30XX series, such as the RTX 3090, if you want to get started with neural networks in a private environment.
|Clock Speed||1395 MHz|
|Boost Speed||1695 MHz|
|Memory Speed (Gbps)||19.5|
|Memory Size||24GB GDDR6X|
|Memory Bandwidth (GB/sec)||936|
|Outputs||DisplayPort 1.4 (x3), HDMI 2.1|
|Power Input||Two 8-Pin|
|Bus Type||PCI-Express 4.0 ×16|
Hier noch der Link auf die PNY Herstellerwebseite zu der Grafikkarte: GeForce RTX™ 3090 24GB XLR8 Gaming REVEL EPIC-X RGB™ Triple Fan Edition
NVIDIA Quadro RTX 8000 (similar RTX 3090 models are used by me in the professional environment with water cooling)
I have had the pleasure of using a few PCs with PNY Quadro RTX 8000 graphics cards at my work. These are certainly interesting and currently well available and powerful. They have also always given a good figuar in the area of neural network training and here up to 12 jobs in parallel. Therefore, a Quadro RTX 8000 graphics card is certainly interesting and a good choice for requirements in the field of artificial intelligence.
The only thing I noticed about the card is that the card itself is quite quiet as an air-cooled version. However, I don’t like the small perforated plate through which the hot air flows to the outside. The manufacturer could find a more airflow-optimized solution here, such as in the brand new RTX A6000. If the card belonged to me privately, I would probably try to simply remove the small perforated plate so that the hot air can escape more directly and thus faster and quieter.
The following picture shows a comparison of the Quadro RTX A6000 to the Quadro TRX 8000 and here the air outlet.
Below are the technical details for those who know a bit about the hardware that characterizes such a card.
|Rays Cast||10 Giga Rays/Sec|
|Peak Single Precision FP32 Performance||16.3 TFLOPS|
|Peak Half Precision FP16 Performance||32.6 TFLOPS|
|Peak INT8 Performance||206.1 TOPS|
|Deep Learning TFLOPS||130.5 Tensor TFLOPS|
|GPU Memory||48 GB GDDR6 with ECC|
|Memory Bandwidth||672 GB/Sec|
|System Interface||PCI Express 3.0 x16|
|Display Connectors||DisplayPort 1.4 (4) + VirtualLink|
NVIDIA Quadro RTX A6000 (is used by me privately)
The successor of the RTX 8000 but with a bit more performance is the RTX A6000, which is now available as the latest model from Nvidia or PNY. I personally got the chance to try out such a model from NVIDIA. My tests with the card and especially the comparison to the RTX 8000 will follow here. The nice thing about the RTX A6000 is that it also has 48 GB of RAM and can thus calculate many projects in parallel in the commercial sector, i.e. neural networks.
Here is a picture of the card still pasted with a protective film shortly after I had received this from NVIDIA.
Note:Now I write here again and again that the 48 GB are great, for example, to be able to calculate several or very large neural networks in parallel. However, it is always important to note that the training pipeline that supplies the GPU with the training data must also be programmed accordingly. If this is not the case, then the CPU does not manage to supply the GPUs of the graphics card with data at sufficient speed. This means that a lot of RAM on the graphics card doesn’t always automatically help to be really fast and that’s what it’s all about when you choose such a professional model. I will show an example of a bottleneck in the training pipeline later in this series.
Currently, the RTX A6000 is not yet available as a maintained product on Amazon. Therefore, here is the link to PNY’s manufacturer page.
|Single Precision Performance||38.7 TFLOPS|
|RT Core Performance||75.6 TFLOPS|
|Tensor Performance||309.7 TFLOPS|
|GPU Memory||48 GB GDDR6 with ECC|
|Memory Bandwidth||768 GB/sec|
|System Interface||PCI Express 4.0 x16|
|Display Connectors||4x DisplayPort 1.4a|
|Maximum Power Consumption||300 W|
The next report is about the very important decision of the case, the power supply and the selection of the CPU with which one then also decides on the other components such as motherboard and CPU cooling.
The following picture shows my computer from the inside with the installed NVIDIA RTX A6000 graphics card.
Conclusion graphics card
Most of the tasks to be solved will be able to be calculated with an RTX 3090. If that is not the case, a second RTX 3090 can be installed in most motherboards. Then you have the similar performance as with one of the big RTX Quadro cards, but not the advantages of being able to install more cards. Above all, you can speculate a bit if the prices might drop again and save money. In the professional field, I would always recommend a model from the Quadro series alone because of the parallelization thanks to the large memory. Also here is valid that in most motherboards two Quadro cards can be installed quite simply as long as the power supply with approx. 1200W to 1600W supplies sufficient achievement. At my employer, we had also built a system from standard PC components and installed three RTX Quadro 8000 cards as a test. However, there was hardly any space left for air cooling on the third graphics card due to the motherboard design.
Article Overview - How to Build a PC for Deep Learning Tasks
The following articles describe the construction of a PC system for Deep Learning tasks..Build a Deep-Learning PC yourself - a step by step guide
Build a Deep-Learning PC yourself – Selection of graphics card
Build a Deep-Learning PC yourself - Selection of the operating system