Nvidia RTX 30 Series Graphics Cards: What's New in Ampere Architecture Compared to Turing?

Buzz

Content

GA102 and GA104

Ampere SM: FP32 processing performance doubles

GDDR6X: Weaponizing Data Bandwidth to Nearly 1TB/s

DLSS 2.0

HDMI 2.1 and AV1 Codec

Dual axial cooling

Nvidia Reflex

Inside every GPU equipped on the recently launched RTX 3080 or 3090 graphics cards is a second-generation ray tracing processing core and third-generation Tensor Cores for deep learning processing. Alongside this are the new streaming multiprocessors (SM) dedicated to providing smoother gaming experiences and higher graphics quality. For streamers and content creators, the RTX 30 series also offers specific features developed by Nvidia, which will be discussed in detail in this article.

However, it must be acknowledged that the RTX 30 series boasts several 'firsts': the first commercial graphics card equipped with GDDR6X VRAM, the first graphics card to support HDMI 2.1 signal output, enabling gaming at 4K 120Hz or 8K 60Hz, and the Founder’s Edition of the RTX 30 series is also the first generation card to be equipped with a dual axial flow cooling mechanism, providing efficient cooling for the already powerful GPU on the motherboard.

GA102 and GA104

The transistor count on the Ampere architecture is nearly double that of the Turing architecture (TU102 GPU). The transistor density of GA102 is 44.6 million transistors per square millimeter of semiconductor chip, compared to 24.67 million transistors per square millimeter for TU102. This achievement is also attributed to Samsung's 8nm process.

Each SM mentioned above is equipped with 4 tensor cores and 1 ray tracing core. Tensor cores handle deep learning tasks, one of the most important being DLSS processing on supported games. Instead of conventional anti-aliasing, DLSS uses deep learning algorithms to process low-resolution image samples, thereby upscaling the image to a higher resolution while maintaining graphic quality. As for ray tracing, in recent years, titles like Control, Battlefield V, and Metro Exodus have all recognized the potential of this image processing technology. Instead of relying on TPCs to handle ray tracing, the RTX 30 series has dedicated RT cores to process lighting, shadows, creating the most realistic virtual worlds.

On the RTX 3080, the diagram clearly depicts a 5MB L2 cache, while for the RTX 3090 it's 6MB shared across all GPC clusters. The GA102 on the RTX 3080 is equipped with 10 32-bit controllers and a 320-bit bus, whereas on the RTX 3090, it's 12 32-bit controllers with a 384-bit bus. Similarly, with the GA104 GPU on the RTX 3070 set to launch on October 29th. This GPU features 46 SMs, totaling 5888 CUDA cores, 184 Tensor cores, and 46 RT cores. The GA104 boasts a 4MB L2 cache shared across all GPC clusters, along with 8 32-bit controllers, resulting in a 256-bit memory bandwidth.

Ampere SM: FP32 processing performance doubles

Perhaps no one explains this more aptly and understandably than Tony Tamasi, Nvidia's Vice President of Technology: “One of the goals in designing SMs on the Ampere architecture is to achieve double the FP32 processing speed compared to SMs on Turing GPUs. To accomplish this, the Streaming Multiprocessors of the Ampere GPU are designed with new data paths to handle FP32 and INT32. One data path in each partition consists of 16 CUDA FP32 cores, with each processing clock capable of 16 FP32 operations. Another data path consists of 16 CUDA FP32 cores and 16 CUDA INT32 cores. The result of this new design is that each SM partition on the Ampere GPU can process either 32 FP32 operations or 16 FP32 + 16 INT32 operations per clock. Four SM partitions combined can process 128 FP32 operations per clock, or 64 FP32 + 64 INT32 operations, doubling the processing performance over Turing GPU SMs.”

Doubling the FP32 processing performance helps increase the efficiency of graphics tasks, common algorithms. Shader tasks in games are typically FP32 algorithm operations such as FFMA, FADD, or FMUL, combined with simpler integer operations for data access, floating-point comparison, or setting minimum/maximum values for processing results,… One of the most evident advantages of doubling the FP32 processing performance is the ability to process ray tracing denoising shaders. Doubling the computing performance means doubling the supported data paths, which is why the SMs on the Ampere GPU also double the memory and L1 cache performance for each SM. The total L1 cache bandwidth on the RTX 3080 is 219 GB/s compared to 116GB/s on the RTX 2080 Super.”

GDDR6X: Weaponizing Data Bandwidth to Nearly 1TB/s

If Ampere GPU was a beast, then the GDDR6X VRAM produced by Micron equipped on the RTX 3080 and 3090 also helps make gaming performance faster and stronger than the previous generation of graphics cards. Firstly, the new VRAM technology allows processing twice the amount of input and output data. On the RTX 3090, GDDR6X helps the card achieve a data bandwidth of 1TB/s, thereby contributing to processing high-resolution textures and character models sharply enough to play games at 4K or 8K resolutions.

Micron's GDDR6X itself achieves theoretical speeds of up to 21 Gbps, although on the RTX 3090, the VRAM speed is 19.5 Gbps, but it's still enough to impress gamers diving into the stunning virtual worlds of their favorite games.

DLSS 2.0

Thanks to the support of Deep Learning Super Sampling (DLSS), Nvidia has been able to harness the power of AI to assist in the rendering process, creating what's known as super resolution. Here, only a few pixels need to be rendered, and then AI is used to reconstruct the images with higher sharpness and resolution without requiring as much hardware as before. Nvidia states that on GPUs equipped with Tensor cores designed to run AI tasks, DLSS is used to boost frame rates while still ensuring beautiful, crisp images in games. This allows users to push ray tracing settings to the highest possible levels while also increasing the resolution. Curious how amazing DLSS is? Take a look at these two screenshots I captured while reviewing the PC version of Death Stranding. The left image is with DLSS, and the right one is with TAA + FidelityFX Sharpening:

DLSS 2.0 delivers image quality equivalent to native resolution while only needing to render 1/4 or 1/2 of the pixels. NVIDIA claims this technology uses temporal feedback techniques to enhance image detail and sharpness while stabilizing each frame. With DLSS 2.0, AI utilizes Tensor cores more efficiently to double computational speed compared to before, thereby boosting frame rates and overcoming previous limitations on GPU, Game Settings, and resolution.

HDMI 2.1 and AV1 Codec

The rapid evolution of the global TV industry has outpaced the development of HDMI connection standards. While 4K and 8K displays are becoming increasingly common, older connection cables and standards like HDMI 2.0 struggle to meet the demand. With HDMI 2.0, the image output to the screen is limited to 4K HDR 98Hz. That's why the RTX 30 series supports HDMI 2.1 standard, with a bandwidth of 48 Gbps, enabling 8K image output to the screen with just one HDMI cable and allowing gaming at 4K 120Hz, or 8K 60Hz. Additionally, the support for AV1 standard by the RTX 30 series will also allow streamers to transmit image signals to their streaming channels with up to 50% reduced bandwidth compared to the H.264 standard, enabling 4K game streaming, which has been quite rare until now.

Dual axial cooling

What truly impresses me is the heatsink system and the pair of much more complex heat spreaders compared to the RTX 2080 Ti, equipped by Nvidia for the RTX 3080 and 3090. Nvidia has opted for a dual push-pull fan system along with a nano carbon-coated aluminum alloy heatsink. The bottom fan draws in fresh air, while the top fan blows air out to cool the GPU. Nvidia claims that this system increases cooling airflow by 55% and is much cooler than the GPU on the RTX 2080 Ti. Additionally, the triple-slot PCIe design helps the RTX 3090 operate 10 times quieter than the previous generation RTX Titan.

Nvidia Reflex

This Nvidia technology targets the eSports gaming community, reducing display input latency during professional matches. Game latency comes from both hardware and internet connection. Nvidia cannot fix the latter, only upgrade to a stronger connection. But with system latency, Nvidia has a solution. This latency occurs between the time you click the mouse and the gun fires in-game. During that time frame, a lot happens inside your computer: The delay from when you click the mouse to when the computer receives the input command, the delay from when the computer finishes rendering one frame to the next, and finally the delay from when a frame is rendered to when it's displayed on your computer screen. These time intervals are measured in milliseconds, but sometimes they also determine whether you hit or miss your opponent. A simple example of how system latency affects gaming is like this: Ignoring transmission delay (as tested in Valorant's practice mode) and even ignoring reflex delay (only counting the time from clicking the mouse). The result is 38ms. Taking Valorant running at a 'decent' speed of 150 FPS, the ideal latency between each frame would be 6.7ms. 38ms is equivalent to almost 6 frames, leaving enough time for an opponent to jump through a narrow door gap before you can react: Nvidia's goal with Reflex is to reduce the negative impact of system latency, using software and optimized drivers to reduce latency, making your in-game actions much more precise. The Reflex software suite allows game developers to optimize both the game development engine to minimize latency when the GPU renders a frame, eliminating the 'queue' of frames waiting to be rendered by the GPU, as well as reducing the CPU's impact on scenes that require more GPU power. Currently, Nvidia Reflex is supported by three games: Apex Legends, Fortnite, and Valorant, but in the future, other games like CoD: Black Ops Cold War, Destiny 2, or Mordhau will also adopt this technology. Graphics cards ranging from GTX 970 to RTX 3090 will all support Reflex. And if a game is not supported by developers, you can enable this feature yourself in the Nvidia Control Panel, under “Manage 3D Settings,” then navigate to “Low Latency Mode” and select Ultra.

Nvidia Broadcast

Utilizing tensor cores in RTX graphics cards to process AI, streamers or those working from home can create gaming streams or remote workspaces right from their bedrooms through software applications with three excellent deep learning features. Apart from the graphics cards, I believe this is Nvidia's most effective AI application product for consumers, second only to DLSS. Remember RTX Voice, the app that filtered out background noise like traffic, fans, or noisy chatter from your microphone input? Well, now it's one of the three main features in Nvidia Broadcast. The other two features include creating virtual backgrounds similar to Zoom but with more precise and beautiful background removal, as well as automatically tracking the user's head movements to keep the streamer's face centered (Nvidia describes it as having a virtual cameraman closely following every move you make).

Get a new card, get a new game, absolutely free

All the advantages and analysis I've described above when you get an RTX 3080 or 3090 will mean nothing if they aren't accompanied by games that fully support the latest technologies like DLSS 2.1 or ray tracing. The good news is, simply purchasing an RTX 3070, 3080, or 3090 graphics card, regardless of the brand, or systems equipped with RTX 30 series graphics cards, will entitle you to a free Watch Dogs: Legion license key. The game unfolds in the aftermath of a terrorist bombing that forces the government of the UK capital, London, to rely on the private company Albion to keep the citizens safe. However, the goal of Albion's director, Nigel Cass, is to dominate London, and you will join a group of resistance fighters to reclaim London from his tyrannical grip. The beauty of Watch Dogs Legion is that you won't play as a specific main character, but virtually all the NPC characters in the open world of London can be controlled by you. From a construction worker, a taxi driver, to a hacker, former spy, or even dissatisfied Albion soldiers with the company's treatment of civilians, all can be recruited to become operatives to sabotage Albion's ambitions. Each character has their strengths, each with a distinct playstyle suitable for different missions. The more members recruited, the more freedom you have in gameplay. However, the gameplay combination of gunfights and the use of tech gadgets to gain advantages, as seen in the previous two Watch Dogs versions, still exists. Watch Dogs: Legion will maximize ray tracing and reflection processing technology through RT cores, utilize DLSS technology to enhance game processing performance, leading to higher frame rates, and display HDR images for the best graphical presentation on screens and TVs that support HDR 10 standards.

Mytour's content is for customer care and travel encouragement only, and we are not responsible.

For errors or inappropriate content, please contact us at: [email protected]