Jump to content

Nvidia Unveils Next-Gen Fermi Architecture


Recommended Posts

Nvidia Unveils Next-Gen Fermi Architecture

 

http://www.thestandard.com/news/2009/09/30/nvidia-unveils-next-gen-fermi-architecture

 

Nvidia may have renamed its NVISION promotional conference to the "GPU Technology Conference," but it's still an Nvidia show through and through.

 

CEO Jen-Hsun Huang's took some time during his keynote to unveil the company's next major GPU architecture, code-named "Fermi." This is the chip graphics fans have been calling GT300, the generational successor to the GT200 chip that powers cards like the GeForce GTX 285.

 

Prediction: A Verizon-Apple Tablet?

 

The chip giant was very careful to position the chip as not a new graphics chip, but a new "compute and graphics" chip, in that order (italics mine). In fact, nearly everything revealed about the new chip relates to its computational features, rather than traditionally graphics-oriented stuff like texture units and render-back ends.

 

What we do know is that the chip is huge at an estimated 3.0 billion transistors, and will be produced on a 40nm process at TSMC. This is about 40% more transistors than the RV870 chip in the new Radeon 5800 series DirectX 11 cards just released by rival AMD. The chip has 512 processing units (Nvidia calls them CUDA cores) organized into 16 "streaming multiprocessors" of 32 cores each. This is more than double the 240 cores in GT200, and the cores have significant enhancements besides. The chip will utilize a 384-bit GDDR5 memory interface.

 

Here are some of the major bullet points:

 

Third Generation Streaming Multiprocessor (SM)

 

* 32 CUDA cores per SM, 4x over GT200

* 8x the peak double precision floating point performance over GT200

* Dual Warp Scheduler that schedules and dispatches two warps of 32 threads

* per clock

* 64 KB of RAM with a configurable partitioning of shared memory and L1 cache

 

Second Generation Parallel Thread Execution ISA

 

* Unified Address Space with Full C++ Support

* Optimized for OpenCL and DirectCompute

* Full IEEE 754-2008 32-bit and 64-bit precision

* Full 32-bit integer path with 64-bit extensions

* Memory access instructions to support transition to 64-bit addressing

* Improved Performance through Predication

 

Improved Memory Subsystem

 

* NVIDIA Parallel DataCache hierarchy with Configurable L1 and Unified L2

* Caches

* First GPU with ECC memory support

* Greatly improved atomic memory operation performance

 

NVIDIA GigaThread Engine

 

* 10x faster application context switching

* Concurrent kernel execution

* Out of Order thread block execution

* Dual overlapped memory transfer engines

 

There are lots of additional features that should improve the performance of this chip in stream computing tasks, like much faster double-precision floating point computation rate. Current Nvidia GPUs compute double-precision at fraction of the speed of single-precision operations. Double-precision floating point operations should now be at half the performance of single-precision, which is a huge improvement. Big improvements in caching and scheduling are apparent as well. You can read more about the architecture at Nvidia's new Fermi page, which includes a PDF whitepaper.

 

So when will you be able to buy a graphics card that uses this chip? Nvidia isn't saying. Company representatives have said that they're currently "bringing up the chip", which means working samples have only recently come back from the fabrication plant. Making an educated guess from past history, we would say December is an optimistic release date, and Q1 2010 for wide availability is more likely. Expect boards to be expensive. Nvidia won't divulge the chip size, but judging by the transistor count we would guess between 450 and 500 mm2. Coupled with the added board costs of a 384-bit memory interface and the challenges with getting good yields out of such a huge chip on the relatively new 40nm manufacturing process, and you're looking at cards that are likely to be both more powerful and more expensive than AMD's just-released Radeon 5800 series cards.

Link to comment
Share on other sites

I just read the white paper off of the Nvidia web site:

http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIAFermiComputeArchitectureWhitepaper.pdf

 

It is fairly interesting. A few of key points:

 

1. This one isn't about the viewport... It is about the renderer. With this implementation of CUDA, the real-time, highly accelerated, and ray-tracing engines all have to be rewrittent to take advantage of the POWER. It has 512 cores. That means the next version will likely have 2048 or at least 1024.

 

2. The content creation software is so far behind the current paradigm of hardware. From an evolutionary perspective, the hardware is still doubling every year or so on the graphics side and 1-1/2 years on the system side. Yet, the software, from a creative production standpoint has not changed radically in ten years. Revit, with its problems, is the only thing that has been a game changer.

 

3. I just upgraded to the i7 860 with a Nvidia GT250 video card. That is eight-cores with 2Ghz Ram. Basically, for ALL of the samples with Max2009 it is "real-time" rendering. Even with the light-box set to global-illumination with high-res it renders a frame in 20 seconds. I rendered the apache AH-64 sample's 200 frames in about 1-1/2 minutes.

 

So, with infinite hardware, the creation of content needs to radically improve. Or perhaps, once the top 75,000 things in the world are "libraried", this will become a drap-n-drop game. After all, we used to make a big deal about shoes, cheese, and sofas -- today they are common items available anywhere.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...