Jump to content

when will we see an 8 core chip


Recommended Posts

Two 8-core rendering jobs on a 16 core machine is going to be much more efficient than a single 16-core job/process because the rendering software just doesn't scale well past about 8 cores.

 

Please explain. Many of us use vray to render, and it spawns 'buckets' to send a tiny render job to a thread (if I'm understanding the way it works correctly). Having more cores/threads means more buckets. Even if it not completely efficient, going from two cores to a new workstation with 8 cores means much, much faster renders.

Link to comment
Share on other sites

  • Replies 63
  • Created
  • Last Reply

Top Posters In This Topic

Please explain. Many of us use vray to render, and it spawns 'buckets' to send a tiny render job to a thread (if I'm understanding the way it works correctly). Having more cores/threads means more buckets. Even if it not completely efficient, going from two cores to a new workstation with 8 cores means much, much faster renders.

 

Are you talking about using the built-in distributed rendering using the VRAY "spawner" across several machines to speed up the rendering of a single, large frame ?

Link to comment
Share on other sites

rendering.jpg

 

Each core, whether logical or physical, is assigned a bucket in vray.

 

Because of this, vray scales enormously well. The more cores you throw at it, the faster it goes. Since the Core i7 system's are no longer massively bandwidth limited, they scale almost perfectly. (See old xeon vs new xeons)

http://www.techreport.com/articles.x/16656

 

 

Though you will still see diminishing returns, they are FAR FAR less then previous multi-cpu or multi-core platforms. Bucket based rendering system's are one of the best rationalizations for getting as many cores/cpu's as you can afford. Either lets you make more money in the same time frame, or get some well deserved free time :).

Edited by Greg Hess
Link to comment
Share on other sites

It's dependent on scene, and rendering time.

 

Vray has a number of steps which are not threaded very well. It's the final part of the rendering process (the bucket section), which scales the best. (I believe this is not completely vray's fault, but the way 3dsmax hands off the data to vray).

 

When dealing with short term renders (less then 5 minutes), a majority of the time will be spent in the "pre" bucket phase.

 

In other words, you can tailor your benchmark to either scale more, or scale less, depending on the % of time the benchmark is spent in the final phase of rendering.

 

You can see this in your last two vray benchmarks, airport vs the vraybenchmark. The scaling % increase will grow as the benchmark time increases.

 

I ran into the same problem a long time ago when I was doing all the 3dluvr techbits. I created a set of benchmarks to help show the differences in cpu power, but it became apparent that after a period of time (a year or two), the growth in cpu power made the benchmarks unable to show statistically relevant data.

 

As such, I would recommend possibly creating a new series of benchmarks if you want to show differences in vray. Perhaps one that takes roughly an hour on a current gen system. (And multiple hours on an older one).

 

That way, in the next few years, it will still be relevant as the # of cpu cores (and speed) increases.

 

Once you start getting into the sub 5 minute benchmark range, differences become less apparent. You can sometimes offset this by repeating the benchmark multiple times (10-20 times for a sub 1 minute benchmark and taking the totals).

 

I wonder if the vray forum has an updated benchmark, if so, that would be a good start.

 

Oh, and is Ed Caracappa still working over at boxx? If so, tell him Greg from 3dluvr says hi!

Edited by Greg Hess
Link to comment
Share on other sites

Greg, In theroy i agree with your statements. Whenever there are significant part(s) of the process that are only single threaded (or not well multithrewaded), then the testing would be skewed (and therefore not accurate enough to be relevant) when comparing systems with more cores.

 

This was a big problem with the prior Cinebench test - and it was an inherent problem that showed up when trying to benchmark most commercial raytrace renderers using relatively small, lightweight test scenes.

 

However, in the past couple of years, rendering engines themselves have become more and more threaded for ALL of the statges of rendering and pre-rendering.

 

One other thing to point out about my 16-core tests as linked above: 8 of those 16 cores are actually "virtual" cores achieved through the use of "Hyperthreading". I would expect that a true 16-core system would scale measurably better than a dual/quad nehalem (such as I used) but i still don't expect it to scale anywhere near "linearly" as Greg has suggested.

Link to comment
Share on other sites

Whenever there are significant part(s) of the process that are only single threaded (or not well multithrewaded), then the testing would be skewed (and therefore not accurate enough to be relevant) when comparing systems with more cores.

 

Forget tests. More core = more buckets. More buckets is mo'better.

 

When I tell vray to render from Cinema4D, first it does some pre-work to grab the scene and make it vray-able. That part is one area where you could argue speed of one architecture vs another, but then it sets about doing a series of GI passes in buckets, and then a final raytrace pass in buckets. This bucketwork is the majority of the render process, and that goes faster as you render more buckets.

 

I remember running into Nils and Lon from Neoscape at a Siggraph a few years ago as they watched a file rendering at the Boxx booth with dozens of buckets. They were smiling big.

Link to comment
Share on other sites

I would expect that a true 16-core system would scale measurably better than a dual/quad nehalem (such as I used) but i still don't expect it to scale anywhere near "linearly" as Greg has suggested.

 

I will agree that I exaggerated a bit. What I should have said, is the newer architecture scales far better then the previous architecture did.

 

You could take a look at the scaling from just the cores themselves, by trying the same set of benchmarks with hyperthreading disabled. You should see slightly better scaling (when comparing core to core, vs core and logical to core and logical).

 

One of the things I noticed in terms of disadvantages to hyperthreading in vray (in a few select circumstances), is that sometimes a logical bucket will be attached to an enormously computational part of the scene (a bucket with reflection, refraction, subsurface scattering/displacement, etc).

 

In that situation, sometimes the entire scene will finish rendering, while that damn bucket just sits there, slowly trying to figure out how to finish.

 

Of course that is a very limited occurrence, and overall, hyperthreading is a big boon (especially when you're always rendering things on white backgrounds, like myself).

Link to comment
Share on other sites

sometimes a logical bucket will be attached to an enormously computational part of the scene (a bucket with reflection, refraction, subsurface scattering/displacement, etc)...sometimes the entire scene will finish rendering, while that damn bucket just sits there, slowly trying to figure out how to finish.

 

Sure, but that area needs to be processed regardless. With a line-by-line raytracer, the line stops while it crunches through that difficult region, slowing the entire render.

Link to comment
Share on other sites

Oh I completely agree Ernest. I'm just pointing out how logical buckets can sometimes effect rendering time.

 

I'm mainly trying to get the benchmarks re-run with hyperthreading disabled, to see a core vs core comparison (vs the systems without hyperthreading). Kinda of a way to see the effectiveness of hyperthreading etc.

 

I like data :).

Link to comment
Share on other sites

I remember running into Nils and Lon from Neoscape at a Siggraph a few years ago as they watched a file rendering at the Boxx booth with dozens of buckets. They were smiling big.

 

It was 80 "buckets" (80 cores) across Ten (10) 8-core renderBOXX nodes.

 

Yeah, that was cool to watch, but the truth is that the same scenes rendered in the same amount of time when only 56 cores (7 nodes) were used. In otherwords, after 56 or so cores, there was no increase in rendering performance.

 

Of course, normal "network rendering" of animated sequences in VRAY scales almost linearly:

 

one node = x frames per hour.

10 nodes = 10x frames per hour.

500 nodes = 500x frames per hour. (network saturation not withstanding)

Link to comment
Share on other sites

http://www.youtube.com/watch?v=BQ4shSQJTd0&rel=0&color1=0xb1b1b1&color2=0xcfcfcf&hl=en&feature=player_embedded&fs=1

 

http://blogs.zdnet.com/hardware/?p=4437

 

When rendering is real-time -- perhaps the issue will get back to means-n-methods. The software, tools, and emotive side of the equation needs a great deal of attention. It is nice to see the final answer coming...

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share


×
×
  • Create New...