Which one to choose? GTX 580, GTX 590 or GTX 780 (not ti).

danielperez · April 27, 2015

Hello everyone. It´s been a long time since I begun to read this forum but until now I never decided to write, and I am doing now because I have a big doubt and the more I read the more confused I get.

I´ll try to be clear:

I am a 3D artist (still studying and learning) and I primarily use 3Ds Max and Vray or Mental Ray to render (still deciding which one I like the best). I´d like to begin to use Vray RT and I am trying to find out which card of the following ones you would recommend me and why:

GTX 580 1,5 gb vram (120$)

GTX 590 3 gb ram (split into 2 cores) (210$)

GTX 780 (3gb monocore)(260$)

I have read that the 580 and 590, when rendering, are about the same and they work in a very similar way since the 590 is dual core and it divides the amount of ram (and 3ds max or vray Rt can´t take advantage of both cores, just of one of them, resulting in a maximum of 1,5 gb of ram). My questions are:

1- Does Vray RT take advantage of both core´s Cuda Cores or just of one of them as it does with ram? Does it add both core´s Cuda Cores number when rendering?

2- What are the advantages of a 590 when compared with a 580? This is a little confusing to me.

It behaves like two 580s in Sli.

3- What advantages does this have in relation to using a simple core Gtx 580? Is it worth it the extra cost?

4- And about 780, which has a simple core with 3gb of ram; should I go for it?

5- What about the behaviour of the viewport? Does it drastically change when using one or another of the listed cards?

Thanks a lot in advance! I´d really appreciate if someone could clarify this to me.

Edited April 27, 2015 by danielperez

Dimitris Tolios · April 27, 2015

Vray RT can use both GPUs on the 590, pretty much in the same way it could use two discreet 580s
Each GPU on the 590 has access ofc to 1,5GB RAM.
SLI doesn't allow for sharing memory between the cards, even if SLI is happening on a single board. Same is true for 690, Titan Z, AMD's 6990 & 295x2 etc. Companies are lying / deceiving you when they are marketing the total RAM on the board, instead of the usable RAM for each GPU. Even if one of the two GPUs is idling, the other is electrically separated by the other half of the RAM on-board and practically "oblivious" to its existence. But I guess you had that figured out.
The advantages of a 590 vs. a 580 is for programs that are aware of SLI (games) in which case the graphics on the monitor are filled almost 2x as fast (GPUs on the 590 are clocked slightly lower than the 580, so practically you don't get 2x the speed, but for simplicity shake, lets say you do). Same is true for GPGPU programs that will recognize each GPU on the 590 board independently, and assign tasks for each independently, having 2x workers instead of 1x. Programs that cannot do either, for example 3DS Max not utilizing SLI for its viewport, nor GPGPU for its physics calcs etc, will actually perform slightly worse using the 590 vs. a 580, as the former has slightly slower fill rates due to lower clock speeds.
The 780 has access to double the RAM than either a 1.5GB 580 (there is a 3GB version) or a 2x1.5GB 590. It will consume a fraction of the energy when idling (which is most of the time) and same or less power than a single 580, while at the same time outperforming both the 5xx GPUs @ full load in GPGPU, gaming or 3DS viewports.
The performance difference is not linear, as it is subject both to the GPU rendering frames on-screen, and the CPU feeding the "drafts" for those frames to the GPU to commence rendering them. When the complexity of the scene is low, the GPU has virtually no "resistance" rendering the frames, so it is down to the CPU to produce those drafts - a fast CPU can do that without much problem, so we can end up with hundreds of Frames Per Second (FPS) with pretty much any modern GPU.

As complexity, screen resolution, texture sizes or all of them increase, the GPU gets more resistance rendering the scene, much like the CPU has more resistance preparing the drafts, so the FPS "balances out" lower and lower. Some scenes end up being restricted by the CPU that cannot draw the drafts fast enough, some are restricted by the GPU which cannot render the pixels fast enough.

With current top of the line GPUs - even those being as old as a 580 - it takes a lot of CPU power to saturate their rendering capacity. Reality is that after you've reached GPU performance levels of a 580 (as old as it is), very few CPUs can prepare drafts fast enough to actually choke the GPU. In this particular case, the 1.5GB RAM might be a bigger reason for the GPU to slow down in large resolutions with large texture sizes, than the GPU not having enough grunt.

By the time you reach GTX 680 performance, it is pretty much a given that most scenarios will be CPU and not GPU bound. There are even youtube videos demonstrating the differences between a 760/670 ~ 680 & 780 are negligible in most cases. So yes, the 780 will have a perf. advantage, but unless your scenes were really struggling due to the 1.5GB limitation on the 580, I don't think it will be a "night-n-day" difference.

danielperez · April 28, 2015

Mmm, I understand. First of all, thanks a lot for your extense and complete answer. It clarified all of this a lot to me.

So basically you are saying that I should go for the 580 unless I am gonna use super complex scenes?

Correct me if I am wrong: Due to its Cuda fp64 core numbers the 580 renders fasters but can´t handle scenes as big as the 780 can (but this one renders slower). After reading a lot, that's what I got. So this means: In case I could get a 3gb 580; would it be the best option to go for?

The thing is: If renders speed is not that different between 580 and 780, I would go for 780 due to its ram amount. If the 580 is way faster, I don´t know if the 780 is worth the extra money.

So, resumming, what would be the best of these three options for my purposes?

-580 1,5gb

-580 3gb

-780.

Thanks a lot again.

Dimitris Tolios · April 28, 2015

The FP64 / Double Precision FLOPS and its affect to graphic or GPGPU performance is discussed ad nauseam by many.

The tough reality is that the number of applications that cares about DP is very limited, and the amount of those that saturates the output of even the simplest modern GPU is unknown...as in, there is none that you probably care for (none I know of that has anything to do with GPU rendering, or viewport performance for any remotely popular 3D/2D CAD & Modelling app.

It was widely discussed as the hidden weapon for the Titan line of cards, but the hard truth is that a 780Ti would perform exactly the same as a similarly clocked Titan Black in virtually anything it was ever tested against. i.e. the rumored advantage due to DP in favor of the Titan was undetectable.

If you have a the cash, go for the 780, it will be better than the 580 in pretty much anything.

If you don't have the cash, and you cannot get a 580 3GB really cheap, I would prefer to look for a used 660Ti or 750Ti that at least have 2GB of RAM.

The 580 is an old, big, power consuming* and noisy card that I would not mind toying with if it was given to me DIRT cheap (or for nothing), but 1.5GB is too little for GPU rendering (if you have a choice) to worth the hassle - if that is what you are changing your current card to begin with.

*the 780 is almost as power hungry at full tilt, but when idling it can effectively throttle down to almost nothing - much like all 6xx/7xx/9xx cards - and its reference cooler was excellent at very low rpm / noise levels

danielperez · April 28, 2015

I ended up buying a 580 3GB for 160€, but now I don´t know if it was the best option. In the same range of price I could have gone for a second hand 660Ti or 750Ti. Would them have been a better option than the 580 3GB? Maybe I can still cancel it. I was absolutely sure yesterday about the fact that the most important when rendering were the fp64 type Cuda Cores, but now I see I was wrong (according to what you say, and I am sure you are right).

So what really counts is the total number of Cuda Cores regardless of its kind? Or in the past (5 years ago) render engines used fp64 but now that´s changing and makes no sense to use a 580Gtx due to its fp64 Cuda?

Thanks!

Dimitris Tolios · April 28, 2015

€160 is steep for that an old card :/

What counts is the core * clock aggregate, if we are speaking same generation cards. e.g. 6xx / 7xx cards (other than the 750Ti) are all Kepler and are comparable / almost linearly scalable.

The 580 is Fermi, so slightly different (does more per core, but not due to FP64). It is a good card still, but I believe you can get a 4GB 670 or even a 680 for that or slightly more money. The 580 should still be faster than the 660Ti & 750Ti in some stuff, while slower in other stuff. OpenCL would probably favor the 750Ti, games the 660Ti, CUDA might do better with the 580.

GPUs used for compiling code and/or supercomputer applications do use FP64, so cards are designed with the ability to do lots of calcs with double precision. nVidia and AMD sell the fully unlocked GPUs for compute under the nVidia Tesla or AMD Firepro S brand-names, charging a serious premium. Most Quadro and Firepro cards also offer unlocked or fully unlocked FP64 at much higher prices than what the same GPU chips are marketed for as GeForce & Radeon respectively, so the "gaming" cards are artificially "Crippled" in FP64 performance (BIOS + drivers) so those who actually care for FP64 cannot undercut and go for GeForce or Radeons instead of the "pro" priced cards.

Some Flagship models, like the 580 and the GTX Titan series, had full their FP64 performance unlocked, but it did not mean much for the average buyer. Actually it meant nearly nothing.

FP64 is a niche, and software developers for the CGI industry that designed the GPGPU routines for it, don't really care for that kind of precision. With GPU rendering, results are interpolated anyways, and the pixel values you will see are averaged out from many samples (sometimes 100s or 1000s of samples). Plus - as I said above - it is not like there is zero FP64 capability in GTX/Radeons, it is just turned down a notch, so even if a program makes a request or two for it, it will get it - apparently fast enough for our purposes, as there is no performance advantage measured between cards with restricted and fully unlocked FP64 of the same family and comparable CUDA*clock aggregate.

danielperez · April 28, 2015

Aham.. I see.

Then there´s something I don´t understand:

What are the advantages of using a Tesla or a Quadro then? I guess they are not the best ones for rendering, am I wrong?

Another question: In case render engines don´t really take much advantage of fp64 Cuda Cores, the 780 is suppossed to be much faster when rendering than the 580 due to its Cuda Cores amount (x4 the amount of the 580), right? Does it depend on the render engine we use?

What am I missing here? This is something I´d love to understand to make a better decision the next time (I am afraid it won´t take long until I have to change my graphics card again).

Thanks!

Dimitris Tolios · April 28, 2015

Teslas & Firepro S: for the most part those are "headless", i.e. there is no port for you to hook a monitor. Those are used as dedicated GPGPU accelerators, might pack more RAM than commercially available gaming cards, that RAM is usually ECC to insure less mistakes, but those still use the same GPU as the one found in the top-of-the-line gaming cards. Since those cards are often used in harser environments than gaming cards and might be stressed 24/7, cores and RAM are often clocked lower, to ensure reliability, so chances are your gaming card is actually faster if what you are computing doesn't require those lil special features that are locked in the GTX/Radeon line.

Same lines for Quadro / Firepro, only those are not headless. It is a repackaged version of the same GPU that is used for gaming cards, with ECC Ram for the expensive models and roughly 3-4x the cost between the gaming and the workstation version of the same GPU, although the workstation cards often have 2x the RAM the gaming version has. e.g. the K2200, being almost identical with a 750Ti, has 4GB instead of 2GB for the gaming 750Ti (and costs 3.5x as much). The K5200 is a 8GB version of a 780 (kepler/GK110) but has 8GB of RAM and costs more than many workstation towers.

Quadros/Firepro are again clocked slower, so unless you are after the maximum memory available on a single card, you are usually better of with a GTX/Radeon for compute.

The Quadro/Firepro perk comes with tough OpenGL viewports, like Solidworks, Creo, Siemens NX, where GTX cards intentionally suck, having bad/non-optimized drivers for those.

But most software companies are "waking up", and since most of them have zero inscentives to brown-nose nVidia and AMD, new viewport engines are actually written to take advantage of "vanilla" gaming drivers, increasing performance with gaming cards. For Direct3D based apps specifically, like most Autodesk, there was little to no issue with gaming cards over the last 4-5 years, making the need for a workstation card almost obsolete, unless ofc you really needed all the VRam you could get.

Now, for what counts in GPU rendering:

It does take some optimization in the rendering engine side, thus when Kepler first came out (680), despite the massively more CUDA Cores performance wasn't there: Kepler was using more yet simpler cores than Fermi, which was going a long way as far as efficiency for games was concerned (and a mere 660/660Ti would be as fast or faster than the ol-mighty-580) but compute engines that were optimized for Fermi gave back underwhelming results. That's with CUDA.

In OpenCL, Kepler was very very weak, and much cheaper AMD cards would actually perform much better. A 7950, a sub $200 card when the 1st Gen Titan was out, would actually beat a GTX Titan ($1000-1050).

Maxwell, the architecture 9xx cards and the 750Ti use, is actually far more efficient in compute for both CUDA and OpenCL (the 750Ti beats the Titan in some OpenCL tests), but for whoever counts, the core count is smaller than that of Kepler (still more than Fermi per cluster), with more cache available per cluster that apparently does works wonders for compute. Of course, the application developers need to adapt and re-write portions of the code to make full use of the new cards, and that takes time.

Also, keep in mind that the fact that a card beats another card in some tests, doesn't mean that we should all trash our cards and go for 750Ti, it just shows how fast tech advances, and how much BS forums might be selling (I've heard how "amazing" Titan was in compute by too many, too often, by people who never used it for compute - I had one, got rid of it, don't miss it).

Which one to choose? GTX 580, GTX 590 or GTX 780 (not ti).

Recommended Posts

danielperez

Link to comment

Share on other sites

Dimitris Tolios

Link to comment

Share on other sites

danielperez

Link to comment

Share on other sites

Dimitris Tolios

Link to comment

Share on other sites

danielperez

Link to comment

Share on other sites

Dimitris Tolios

Link to comment

Share on other sites

danielperez

Link to comment

Share on other sites

Dimitris Tolios

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity