GTX 980 and 970: 28nm Maxwell Version 2
Almost a year ago, it seemed that the last bits of performance were squeezed out of the 28nm process for GPU manufacturing. AMD's R9 290X and Nvidia's GTX 780 Ti were based on the monstrous chips with 250W TDP each. The later advances at high end were limited. Titan Black was barely faster than GTX 780 Ti, R9 295X2 was a water cooled dual-GPU monster, while dual-GPU Titan Z was a waste of money. As TSMC has been ramping up 20 nm production this year, it seemed that the smaller transistors would bring us the performance enhancements.
I have clearly underestimated the ingenuity of Nvidia's engineers. Early hints came this year in the form of Maxwell based GTX 750 and 750 Ti using theGM107 chip. The new architecture and a more mature process allowed these cards to become the new leaders in performance/power ratio. Back then, Maxwell only brought greater efficiency and nice OC potential to the table. The output ports and DirectX hardware feature support remained the same as in previous Kepler cards. Now, Nvidia has released much more powerful GeForce GTX 970 and GeForce GTX 980 cards using the GM104 chip, using a considerably updated Maxwell architecture. While their performance is not an extreme leap forward, they bring it at much lower prices and power consumption, while also bringing many new features to the table.
In case of Maxwell, Nvidia prioritises mobile platforms and designs its GPUs from the bottom up, unlike the previous generations. Intel's similar priorities, when designing Haswell left many PC gamers disappointed with minimal improvement to the top CPU performance, although there is finally a more affordable hexa-core Haswell-E part. Fortunately, GPUs usually are used for easily parallelised tasks. A single Kepler SMX powers mobile Tegra K1, while 15 of them provide the high end performance in GTX 780 Ti. Thus, the focus on power efficiency allows Nvidia to improve the performance of desktop parts at the same time.
GTX 980 and GTX 970 are based on a new GM104 GPU. Nvidia already positions it as a successor to GK104 used in its GTX 680 line, rather than to GK110 from GTX 780. This strongly suggests that we can expect to see even more powerful Maxwell based parts in the future. GM104 has 5.2 billion transistors, an increase over GK104, but less than in GK110 or AMD's Hawaii (R9 290X). GTX 980 uses the full version of the chip, with 2048 CUDA cores (16 SMM units), clocked at 1126 MHz, with 1216 MHz boost. GTX 970 differs mostly in core count and lower frequency – 1664 cores (13 SMM units), 1050 MHz base clock and 1178 MHz boost clock. The memory system of both cards are identical – 64 ROPs, 256-bit bus and 4 GB of 7 GHz GDDR5 VRAM. With all that, the reported TDPs are quite nice: 165 W for GTX 980 and 145 W for GTX 970. The reference model of GTX 980 uses a cooler very similar to the one on Titan and GTX 780, but without the vapour chamber used on 250W parts. Unfortunately, Nvidia has also decided to focus on acoustics, so the reference card is very silent, but can throttle slightly under prolonged high load. GTX 970 will come in custom designs from get-go, but its hard launch is two weeks away. GTX 980 MSRP is $550, which unfortunately seems to become a new standard in place of $500 price point. GTX 970 is much more affordable at $330.
Looking at the numbers alone, new Maxwell cards may not seem to have performance beyond the top end of yesterday. However, all the little details that went into the 2x increase in efficiency per watt and changes in memory access make an enormous difference. GTX 980 outperforms GTX 780 Ti in almost every case, sometimes by 10%-20%. Even at 4K resolution (3840x2160) the limitations of 256-bit bus show up too rarely to hamper the card. The combination of much larger cache and improved compression techniques allow Maxwell to improve the efficiency of bandwidth use by 25% on average. However, as 7 GHz is near limit for GDDR5, 384-bit memory bus would benefit Maxwell as well. GTX 980 is a new single-GPU performance king, while being very silent and low power for a high-end card.
Based on specifications, GTX 970 should have around 80% of GTX 980 performance, which corresponds well with the actual benchmarks. That still leaves it in a nice performance category, consistently beating GTX 780 and R9 290, while sometimes getting very close to R9 290X. The much lower price of GTX 970 makes it a very appealing card. Both cards also have decent overclocking potential, which can improve their performance even more.
For most graphics card launches, the performance, power consumption and price are almost everything there is to know. However, the launch GTX 9xx series is also the launch of a new architecture with many new features. Nvidia calls these cards Maxwell 2 for a reason. For a long time Nvidia had not added the support for the latest DirectX features. While GTX 6xx and 7xx cards could handle most of thing introduced in DX 11.1 and 11.2, they were not fully compliant. GTX 9xx have full hardware support of DX 11.3 and the corresponding features in DX 12. Do not worry if you are confused, as the current Microsoft plans are somewhat confusing. Direct3D 12 will focus on low-level access, while D3D 11.3 will allow developers to access the same graphical features using high level API. The feature levels in D3D 12 will work similarly to D3D 11, allowing programming for the older cards with same API, just restricting the features requiring newer hardware.
Outside of DirectX specifications, Nvidia has also optimised GTX 9xx cards for voxel based global illumination and virtual reality rendering. The different implementation of voxel global illumination can be seen in the first version of UE4 Elemental demo, before it was cut. Nvidia has demonstrated the method with a realistically lit model of a Moon landing photo (see below). VR Direct implements the idea of rendering the frames and quickly correcting them right after the render based on the latest position of headset. VR SLI is a very nice way to use multiple cards for VR – instead of rendering alternating frames, each card will render for each eye. Such approach should both reduce the latency and improve the usability of SLI for VR setup. Nvidia has also implemented other features based on suggestions for the best VR practices by Oculus VR.
The media capabilities have improved considerably in Maxwell 2. While GTX 750 was developed too soon for HDMI 2.0 support, it is supported in GTX 9xx cards. As a result, they can output 4K at 60 Hz to displays supporting the latest HDMI standard. DisplayPort support remains at 1.2, which is expected as DP 1.3 was standardised the same week as the cards launched. Nvidia's reference cards now come with 3 DP outputs, a single DL-DVI output and a single HDMI output. Hardware encoding has received the upgrade as well. The improved H.264 encoder can now handle 4K 60 FPS with 130 Mbps bitrate on GTX 9xx cards. The hardware decoder can handle these resolutions as well. These new cards also include the encoder for a new more powerful H.265 standard, which can achieve the same video quality at the lower bitrate. There is no software available for it yet, but current GTX 9xx cards will be compatible when it comes. On the other hand, there is no dedicated H.265 decoder, so playback will use the general-purpose GPU resources using more power in process. This should not be a big problem in desktops.
Nvidia also introduces several new anti-aliasing modes. One of them, called Dynamic Super Resolution is a variation of a method known in PC community as Downsampling. The game is rendered internally at a higher resolution and then scaled back to monitor's native resolution It is a type of Super Sampling AA, using non-rotated square grid. Games supporting sparse-grid SSAA provides higher quality results. However, for games without support for a proper SSAA, DSR is a nice easy-to-use way to improve image quality. Nvidia's implementation allows you to select the internal resolution between 1x1 (native) and 2x2 (4x native) and the sharpness of a Gaussian filter used to scale the image down to your display's resolution. The large benefit of DSR is the independence of monitor type which often limits other downsampling methods. As DSR is still a version of SSAA the similar performance penalties apply, as well as the potential issues with HUD in games if it does not scale well with resolution. The latest version of GeForce Experience can enable DSR automatically for less demanding games with properly scaling interface. DSR is currently available only for GTX 9xx cards, although there is a good chance that the availability will be extended to Kepler cards as well.
Yet another newcomer is MFAA (Multi-Frame Sampled Anti-Aliasing), which is a new attempt to improve AA quality and performance using temporal data (below). It is similar to MSAA, but it takes half of its samples from current frame and another half – from a previous frame. In best-case scenario, it could provide 2xMFAA quality for no cost, 4xMFAA for the same performance hit as 2xMSAA and so on. However, the worst cases may be extremely blurry and full of artefacts, especially at lower frame rates. This feature depends on hardware inside GTX 9xx and is not yet available. When it appears, it can be enabled for any game from drivers, unlike TXAA, which has to be included by game developer.
GeForce Experience has been updated to 2.1.2, adding support for the new cards and their features. It also adds the support for Shield Controller, connected to PC via USB. The latest WHQL driver 344.11 has been released to support GTX 9xx cards.
Overall, the launch of high performance Maxwell cards leaves a nice impression. The new architecture manages to improve performance, reduce power consumption while still using the same 28 nm process as over 2 years ago in GTX 680. 4 GB VRAM in both cards with decent bandwidth (considering architecture improvements) and DX 11.3 support means that these cards have a long life ahead of them. GTX 980 and GTX 970 are Nvidia's top cards now, as GTX 770, 780, 780 Ti are no longer being produced. AMD will have to adjust its own prices to make R9 290 and R9 290X appealing again compared to the Nvidia's newest cards. While R9 285 has managed to push Nvidia in midrange, it will be interesting to see what else AMD is up to and how soon will they be able to compete with Nvidia's Maxwell on performance/power ratio. Obviously there are some other thing we can expect in the future. 165 W TDP of GTX 980 suggests the possibility of dual-GPU GTX 990 in the near future. There is also clearly a space to be filled by a Big Maxwell GPU to eventually replace GK110. Depending on a time frame it may still be based 28 nm or the 20 nm will finally catch up in efficiency. Still, it is always nice to see completion in GPU sector and thus getting lower prices.