Molehill is out!

Molephoto © 2005 Andi | more info (via: Wylio)
Adobe took everyone by surprise at the Flash Gaming Summit 2011, by making available on Adobe Labs their super secret new API, the 3D hardware accelerated Flash Player, codename Molehill.

Molehill is still not officially released. It’s not even in beta. What Adobe is doing is to create a sort of pre-beta program, called the Flash Incubator Program. It is a program aimed at developers who are interested in testing and providing feedback on potential features and features under consideration for future versions of the Flash Player. Features highlighted in the incubator program may or may not ship in future releases of the Flash Player.

So, we still don’t even know when the real first release of the Molehill Flash player will be out for the general public.

But…who cares! The toy is out, we can play with it and talk about it.

Why is Molehill important?

Performance!

The Flash Player has an incredible penetration rate. It’s installed on more than 90% of all computers around the world. And it’s amazing how much its technology has been lagging. Running a Flash application is like switching to a 15 years old computer. The Flash Player till now has been unable to take advantage of the 3D hardware acceleration available on most computers today. And this means that it’s not been possible to create 3D Flash experiences that even get close, in quality and complexity, to those that can made in standard native applications.

Ok, But how fast is it?

This is the 1 Million Dollar question, that everybody is going to have.

With the 3D rendering techniques we’ve all been using so far in Flash, all the rendering took place on the CPU. And, like I said, this meant super simple models and scenes.

How fast is the Molehill player going to be, and how detailed are the models that we can now display?

I am going to provide some quantitative data about Molehill performance.

Let’s start with the CPU based rendering. The old way, used by all current 3D engines, such as Papervision3D and Away3D.

First of all, download the Molehill Flash Player here and install it on your computer. (Disclaimer: it is not beta software. It’s PRE beta. Install it at your own risk. Molehill is cool, but if it fries your computer, don’t come to me and complain).

Old style CPU Flash 3D rendering performance with a 20000 triangles model

Here is a sample application that renders a high detail 3d model of a “stone spring” using the standard CPU based drawTriangles call, available in Flash Player 10. The model consists of about 20000 triangles, 4000 for each coil of the spring. It is rendered using back face culling, so the actual triangles that get rendered at every frame is about half of that: 10000 or so.

You can see the number of polys and the frame rate in the hud at the top left. On my MacBook Pro, this 3D scene runs at less than 10 frames per second. meaning that 20000 triangles is way too much for a software rendered 3D scene, drawn with the standard Flash Player 10 call, drawTriangles, that’s used for 3D rendering.

Molehill hardware accelerated Flash 3D rendering of a 20000 triangles model

Let’s compare this with the Molehill renderer. Here’s the same model rendered with Molehill (close the previous browser tab with the CPU renderer before opening this one). On my Mac, this runs super smoothly at 60 fps (except at startup, which takes a bit. But that can be optimized and it’s not really a measure of rendering performance).

Molehill hardware accelerated Flash 3D rendering of an 80000 triangles model

Cool! Let’s increase the number of polys now. Here’s a scene of a similar model, made of 80000 polys rendered with Molehill. Again, it runs super smoothly.

There’s no point in running this one with the CPU renderer, as it would slow to a crawl.

So, the conclusion is that Molehill can render 80000 triangles models without a hitch. An 80000 triangles model is quite of a detailed model. Can we go any higher?

The behemoth test

Unfortunately there’s a hard limit in Molehill’s Vertex Buffers. They cannot contain more than 64k vertices. This particular geometry that I used for the 80000 triangles spring contains about 40000 vertices. So, there’s not really room to go much more detailed than this by using a single spring model.

What we can do is to create a scene with more than one spring. Here is a scene made of four 80000 triangles springs.

Click to launch. Molehill hardware accelerated Flash 3D rendering of a 320000 triangles scene

It consists of about 320000 triangles! And it runs super smoothly, still at 60 frames per second. Note that each spring is being rendered with its own Vertex Buffer and Texture on the GPU. Therefore, as far as the GPU knows, the four springs are four different models.

Let’s double that! Make it 8 springs, of 80k triangles each. Here it is.

Click to launch. Molehill hardware accelerated Flash 3D rendering of a 640000 triangles scene

This scene does take a bit to setup. But once it’s all loaded, it still runs pretty smooth. At around 30 frames per second. So, the frame rate halved, but we’re rendering 640000 triangles, all on the screen at the same time!

Look Ma…1 Million Polys!

Ok, now for the last test. Let’s be masochistic and double our springs once more; let’s get 8 more on the scene. Here‘s the result. Note, this last test will take quite a bit of time to setup. It will crawl your computer for a few minutes while the scene is being created. Just be patient.

Click to launch. Molehill hardware accelerated Flash 3D rendering of an 1.2 Million triangles scene

Once the scene is uploaded to the GPU and setup it still manages to run. Frame rate on my Mac is around 15 fps. Not great, but hey, we’ve got 1.2 Million triangles on the screen!

A few caveats

This is a very specific kind of test, and it doesn’t assess the performance of the Molehill rendering pipeline in all possible kinds of rendering situations.

It’s very rare that a scene consists of a single, or a few, super detailed models, like I did here. More frequently you’re going to have a multitude of much less detailed models. And, with a similar total triangle count, that would probably end up being slower than our sample, depending on the number of models.

Generally, the biggest bottleneck in a 3d rendering pipeline is fill rate. Meaning how many texels per second can be filled by the video card. With this test I’m not really pushing the pipeline fill rate, as all the triangles that we are rendering end up being extremely small on the screen.

Also, I used a super simple Shader here. Here’s the code of the shader, in AGAL Assembler.

Vertex Shader:

m44 op, va0, vc0 // pos to clipspace
mov v0, va1  // copy uv

Pixel Shader:

tex ft1, v0, fs0 <2d,linear,nomip>  // sample texture
mov oc, ft1 // copy sampled texture to output

The Vertex Shader simply transforms the 3D vertices to the screen.  And the Pixel Shader maps a single texture for the model without any lighting or effect. The texture is 2048×2048 1024×1024 pixels.

Many times a real model will be using more than one texture, and it will have more complex shaders.

Nonetheless this test is a good as a rule of thumb reference point: in the simplest rendering case, Molehill can render a single textured model with 80000 triangles like a piece of cake. A scene with 320k triangles can be rendered quite smoothly, and a 640k triangles is still ok to render. A 1.2 Million triangle scene is a bit extreme, but it doesn’t completely crawl the computer.

Oh, note that I did all these tests with the integrated NVidia 9400M GPU of my Mac. A dedicated video card will probably run these tests faster.

Update:

I just scaled down the texture to 1024×1024 for all the samples, as I received some error reports from some users.

Spread the love!

If this article was useful to you, chances are that it might be useful to many other people. Link back to it, by using this permalink, and let other people know about it.

I am interested in getting a sense of what the performance difference is on other 3D platforms such as Unity and will be exploring that soon, any advice is welcome.

Filed under: Performance

Like this post? Subscribe to my RSS feed and get loads more!