Ogre 2.0 doc (slides) - Updated 1st dec 2012

by **Wolfmanfx** » Mon Nov 19, 2012 7:06 am

Awesome work!
1. we need to update our roadmaps but more importatnt we need to create working packages regardless if we do the work or mentor it through GSoc.
2. Jira should be used for that task planning + wiki is outdated
3. we should start to search out students for next year GSoc maybe right now - a campaign

by **syedhs** » Mon Nov 19, 2012 7:08 am

Awesome effort to at least document what to dos...

by **DanielSefton** » Mon Nov 19, 2012 11:25 am

This is excellent, it addresses everything wrong with Ogre in comparison to modern techniques. I lol'd at the confusion memes.

Ideally Ogre requires a complete rewrite from the core. It's matured around an architecture designed over a decade ago. What worries me is who's going to be up to the challenge to essentially write a brand new engine with Data-Oriented paradigms? Chipping away at the current codebase would be like painting a rotting fence.

Edit: I haven't kept up with discussions on Ogre 2.0, I'm guessing the plan was to start a fresh anyway?

by **Klaim** » Mon Nov 19, 2012 12:13 pm

Nice work!

dark_sylinc wrote:[*] No Boost please. Most of it is bloated for a performance intensive application (and compile times!). Not to mention the executable size sky rockets by unbelievable factors (10x is not uncommon); and debugging the call stack of a thread created with boost is a major pita when compared to clean callstacks and thread names in the Thread window when using native thread creation. Using Boost can be very tempting, but we do everyone a service by refraining ourselves from using it in Ogre.[/list]

I kind of agree with you on this but not with everything in boost (the libraries I use don't have these effects, but I choose them with great care and I use the c++11 standard library), so I wanted to ask if you took a look at the recent containers in boost which, I think some could be very helpful in the design you are suggesting, like flat containers? (basically built around a vector) It's unperfect with VS Debugger data visualization though. Anyway at least the concept (or the code) could be reused in ogre because these containers purpose are mainly to improve performance on search and traversal by making data contiguous, making insertion and removing slower but then more appropriate for modeling data I think (while not appropriate to build lists in each frame for example).

by **stealth977** » Mon Nov 19, 2012 12:48 pm

my 50 cents about external libraries, and especially for boost:

If OGRE is going to use only a few functionality of a library, adding that library as a dependency is usually overkill. That kind of functionality usually doesnt need maintenance, so a little time can be spared to include OGRE's own implementation without the burden of a huge dependency.

Especially boost, which is a great library, but OGRE makes use of maybe %1 of it and maybe even less. Also what OGRE uses from boost is a static functionality. What I mean by static is : Once OGRE implements its own code instead of boost's, that part of the code will most of the time wont need any further modifications, and the functionality we borrow from boost already has well tested and easy to implement example codes all over the net...

So, i would strongly suggest dropping boost and instead using own code for the very little static functionality OGRE needs...

by **Klaim** » Mon Nov 19, 2012 12:52 pm

stealth977 wrote:So, i would strongly suggest dropping boost and instead using own code for the very little static functionality OGRE needs...

Currently, boost isn't a dependency in Ogre, it's optional for threading support. Shared pointers and all were, if I remember correctly, copy/pasted in Ogre from boost (or other libraries) a long time ago, as you suggest. I think it's the good way to go too (even if as pointed, shared pointers in ogre are not very good).

by **saejox** » Mon Nov 19, 2012 1:54 pm

that was nice read. thanks

what alternative do you propose for boost::thread?

by **CABAListic** » Mon Nov 19, 2012 2:22 pm

saejox wrote:that was nice read. thanks

what alternative do you propose for boost::thread?

Given the kind of threading design that would benefit the Ogre core, we'll need something along the lines of Intel's tbb. Now, given tbb's unpopular license, it won't become a dependency. But since people might have their own task-based threading system in place which they'd like Ogre to use, a possible way might be to use an abstracted task scheduling system that, as its backend, can use tbb or a custom backend. We could then, as the default choice for people who are not using either, offer a more light-weight alternative based on e.g. cpptasks or perhaps even our own implementation in Ogre.
But this is just speculation on my part on what might work.

by **dark_sylinc** » Mon Nov 19, 2012 2:42 pm

saejox wrote:that was nice read. thanks

what alternative do you propose for boost::thread?

Encapsulating native thread creation into generic classes should be enough. It's not hard at all. "CreateThread" is more than enough.
Boost has a few tempting solutions that are meant to spawn a thread at any time, and execute it immediately or delay it's execution until a "submit" command (aka. tasks). Fortunately we won't miss that, because such pattern leads to obscure race conditions, lack of control over what's going on, and other weird bugs. Furthermore spawning a new thread is quite expensive. The problem of tasks is that they're hard to track, so it's design has to be thought in advance.
On 90% of cases, the best solution is to spawn worker threads at startup, and keep it suspended until certain condition (using sync barriers or condition variables).
Writing working threads reyling on a core class and shared data that execute a predefined algorithm with synchronization classes works better on the long run. It's not very different from OpenCL's approach, except that threads are dormant.

I haven't tried posix threads on Win32, but afaik on Windows they just wrap to OS native functions (except for functionality that isn't present).

The only feature we would miss is the mutex familiy of boost. Note that in Win32 most threading functions are completely broken (except for those added after Vista)
PulseEvent is broken, and SignalObjectAndWait does not do what the name suggests (it signals and then waits, but both operations are not atomic)
This makes fast condition variables impossible on WinXP, and making a sync barrier tricky (but possible!). Implementations that claim to do condition variables on XP are either using a mutex, or wrongly relying in SignalObjectAndWait to be atomic; or they are using a custom-made driver running in kernel mode (I've never seen that though).

by **dark_sylinc** » Mon Nov 19, 2012 3:00 pm

Klaim wrote:so I wanted to ask if you took a look at the recent containers in boost which, I think some could be very helpful in the design you are suggesting, like flat containers? (basically built around a vector) It's unperfect with VS Debugger data visualization though.

Yes, flat containers are just vectors using std::lower_bound and insert. That's all. In fact, I use lower_bound directly in Distant Souls, no debugger visualization issues

(Boost already had the linky, that's very nice of them)

Note that "flat" containers are for fast lookup, fast iteration, but slow removal, slow insertion (which is usually what Ogre wants). Note that the HighLevelCull may want to keep stuff ordered for SIMD, which sorting rules doesn't have to match the standard sort. i.e.
We have objects A & C. "[b]" is for blank spaces. Three As get created, then one C. The high level cull organizes them like this for efficient SSE2:

Code: Select all

AAA[b] C[b][b][b]

Then two As get created, then another two Cs:

Code: Select all

AAAA CCC[b] A[b][b][b]

If we were using a flat container, it would force us to move all Cs by inserting the As in the middle, which is unnecessary:

Code: Select all

AAAA A[b][b][b] CCC[b]

Both outputs are legal for the HighLevelCull as they satisfy the locality and SIMD rules. Which one is better, however, depends on profiling. My wild guess is the former will beat the latter every time. But I could be proven wrong.

Also, note that while doing (i.e) transformations, we don't have to iterate through the Entity pointers and reading their SoA_Vector3s. We can just read the mChunk of the position, rotation & matrices, and iterate them directly (because reading the pointers to read the SoA_Vector3 address would add an additional indirection for the CPU)

by **dark_sylinc** » Mon Nov 19, 2012 3:19 pm

DanielSefton wrote:Ideally Ogre requires a complete rewrite from the core. It's matured around an architecture designed over a decade ago. What worries me is who's going to be up to the challenge to essentially write a brand new engine with Data-Oriented paradigms? Chipping away at the current codebase would be like painting a rotting fence.

Edit: I haven't kept up with discussions on Ogre 2.0, I'm guessing the plan was to start a fresh anyway?

The plan is to start fresh.

My view is that we can keep the Script parser (i.e. Materials), the Mesh format (although with some modifications), Vector, Quaternions & Matrix are great (we just need to add their SoA counterparts), the Shadow camera setup math still works, the Camera works (only change is Frustum check code), the Vertex declaration & binding is fine (although, we could remove the "start offset" parameter, and reinforce declaration order as important, that would be less error prone and easier to manage, I believe; anyway it's not a major change) there's a lot of stuff in RenderSystems that can be reused (from Ogre-related stuff like Depth sharing & RTTs management to HW management like window creation & device lost) and even the Resource system.
The Resource system has it's limitations, and not very threadable. It could have a rewrite too. But... let's just keep it one step at a time. The only thing that needs a rewrite is our "read only" initialization pattern which I talked in the DX11 thread: DX11 requires that read only memory has to be initialiazed on creation, but we first create then lock() and fill data once; this is holding DX11 back in terms of optimization.

by **Klaim** » Mon Nov 19, 2012 3:43 pm

dark_sylinc wrote:Encapsulating native thread creation into generic classes should be enough. It's not hard at all. "CreateThread" is more than enough.[...]

I agree with Cabalistic on this, you can't let Ogre spawn worker thread for it's own use and ignore there rest of the application usage. Providing a way for tasks to be pushed into whatever task scheduler is used by the client code would help a lot.

Yes, flat containers are just vectors using std::lower_bound and insert.

As far as I understand, it's a bit more efficient than that, but I might be wrong.

Note that "flat" containers are for fast lookup, fast iteration, but slow removal, slow insertion (which is usually what Ogre wants).

My understanding was that part of ogre wants fast insertion (while building rendering lists) and different parts of it wants (very) fast lookup. Am I wrong?

(I'm not saying we have to use these flat containers, just that they are good examples that could be used in ogre - if ogre need it, otherwise forget about it, just a comment)

The Resource system has it's limitations, and not very threadable. It could have a rewrite too. But... let's just keep it one step at a time.

There were discussions at some point to make it a component or something optional because some of us (me included) need to setup a game/app-specific resource system which then would inject resources into Ogre itself. Can't remember if there was some agreement on this. Anywyay it's a concern that is not directly linked to all you said so I guess it's ok to put on the side or to have some work done on this in 1.x

by **Xavyiy** » Mon Nov 19, 2012 4:11 pm

After reading the slides, I cannot agree more with you.
But I don't know how the ogre team/community must deal with that.

The plan is to start fresh.

Rewriting ogre from the scratch would be tremendous work(who is going to do it? how much time that would take? We'll need new demos. Also a lot of testing.) and also will break the compatibility with almost ogre-based apps, which is not bad taking account ogre 2.0 is going to be a completely new version, but the "break" may be just too big(I think we should keep the new API similar to the current one as max as possible).

Instead of it, I would suggest split the redesign into 4-6 parts and make it step by step. Taking account that this rewrite might require several months or even years, maybe we can target Ogre 2.X for this redesign and think about it as a transition to the stable 3.X. For example:

2.0 -> Cache misses, DX11 & OGL4 RS
2.1 -> Scene manager redesign: scene traversal & processing
2.3 -> FF -> "states"
2.4 -> Vertex format enhancements
2.5 - 2.9 -> Fix bugs. Remaining stuff

3.0 -> First stable version of the "new ogre"

Well, this is just an idea, but I really think doing it from the scratch and without releasing intermediate versions may be quite dangerous.

Xavier

by **PhilipLB** » Mon Nov 19, 2012 4:34 pm

I agree with big refactoring instead of a rewrite.
Think of Netscape back then, this broke their neck: http://www.joelonsoftware.com/articles/ ... 00069.html

by **dark_sylinc** » Mon Nov 19, 2012 4:39 pm

Good Xavyiy, that's the feedback I was looking for. To be honest I share the same fears as you.

2.0 -> Cache misses, DX11 & OGL4 RS

Good! We can create the system and use SoA adjusted for 1 component (XYZXYZXYZ) when we finish 2.1, we can change at compile time to use 4 component (XXXXYYYYZZZZ)
The DX11 RS will require the submit on creation change though.
Note that the bone system (skeletal animation) may be updated either in 2.0 & 2.1
Also, we may want to start stripping down data from MovableObject & Renderable that isn't used as often.

2.1 -> Scene manager redesign: scene traversal & processing
2.3 -> FF -> "states"
2.4 -> Vertex format enhancements

Nice, looks like a plan. Note that 2.4 can be done at anytime, and it can be done by GSoC students.
I wouldn't leave 2.0 and/or 2.1 to students, unless they're really good, for example.

by **DanielSefton** » Mon Nov 19, 2012 7:18 pm

Yeah, I meant that ideally it would require a rewrite, but refactoring is a wiser option. Specific areas can definitely be targeted one at a time. Just that, it's not as easy as "fixing" cache misses without code restructure, and the whole of Ogre's core works in a totally anti-data-oriented manner. Anyway, it's a step in the right direction.

For those unfamiliar with DOD, here's a whole bunch of handy articles: https://plus.google.com/115950681746193428612/posts

by **Xavyiy** » Mon Nov 19, 2012 7:50 pm

Good!

Note that the bone system (skeletal animation) may be updated either in 2.0 & 2.1

Well, I think it should be done in 2.1, since the "scene manager refactor" will be mixed with the "new compositor manager" so after having that done plus the changes from 2.0 it'll be far easy/guided.

We can create the system and use SoA adjusted for 1 component (XYZXYZXYZ) when we finish 2.1, we can change at compile time to use 4 component (XXXXYYYYZZZZ)

Great. Definitively all the memory-layout must be specifiable at compile time(not all processors have 64+ cache lines, etc). More info: viewtopic.php?f=4&t=30250&start=175#p454275 ( Important: care about OGRE_DOUBLE_PRECISION before starting to code it). I think that should be done for 2.0 and then in 2.1 taking benefit from it (In 2.0 just the SoA classes, shouldn't be complex)

Note that 2.4 can be done at anytime, and it can be done by GSoC students.

+1. Also I doubt this is going to do a big boost: the idea is doing occlusion culling so at the end we're not to render big queues.

-------------------

IMHO, these are the next steps:

1. Add masterfalcon's OGL3+ RS to 1.9
2. 1.9 RC before the end of the year
3. Release 1.9.0 in ~February

----------- Ogre 2.0 development starts here

1. Reduce cache misses everywhere: Node, Frustum, Entity, etc. ---> Profiling a lot!
2. Write all SoA classes
3. Improve the DX11&OGL4 Render systems. Make them the default in PC platforms.
4. ThreadManager?

----------- Ogre 2.1 development starts here

1. Scene manager redesign ---> Role decomposition + SoA for all nodes, etc. (SoA classes are here since 2.0)
2. etc.

------------- > > >

Ideally, these 2.X versions should have a <=3 months life time: enough for developing it and - the most important- enough to allow users update their apps. Otherwise it'll be a really pain in the ass to update a more or less complex app from Ogre 1.9 to Ogre 2.

Note: this will require a lot of support from the ogre team and also to change their way of work: more than working in their particular area of interest, the work must be very focussed on the current version roadmap (just until 2.4-3.0, of course). (Sorry if that sounds rude, but I think it's the reality)

Personally, this university year I'll not be able to help in the ogre development since I'm very busy, but I think I'll have much more free time next year(my 'backup' year: master project + some little stuff). Also next September I'll go commercial with the Paradise Engine, so after that I'll actively contribute to ogre since I'm very interested in this "next-gen" changes. Btw I'll be able to test these changes/performance improvements in a lot of different scenes, that will help.

Xavier

by **Mako_energy** » Tue Nov 20, 2012 12:56 am

Sorry I'm late to the show, I just want to add my input on some of the things covered earlier in the thread.

CABAListic wrote:

saejox wrote:that was nice read. thanks

what alternative do you propose for boost::thread?

Given the kind of threading design that would benefit the Ogre core, we'll need something along the lines of Intel's tbb. Now, given tbb's unpopular license, it won't become a dependency. But since people might have their own task-based threading system in place which they'd like Ogre to use, a possible way might be to use an abstracted task scheduling system that, as its backend, can use tbb or a custom backend. We could then, as the default choice for people who are not using either, offer a more light-weight alternative based on e.g. cpptasks or perhaps even our own implementation in Ogre.
But this is just speculation on my part on what might work.

Regarding the thread discussion, I have a co-worker working on the threading bits for our engine and is separating it out into it's own library for others to use. I can't give too many details just yet but it should be available in the not too distant future(well before Ogre 2.0), and we could offer use of that library as an alternative to TBB for Ogre to use if the Ogre team is at all interested. One thing to mention is that it is a lockless task-based threading library. Not sure if that fits into your existing plans or not.

Klaim wrote:

The Resource system has it's limitations, and not very threadable. It could have a rewrite too. But... let's just keep it one step at a time.

There were discussions at some point to make it a component or something optional because some of us (me included) need to setup a game/app-specific resource system which then would inject resources into Ogre itself. Can't remember if there was some agreement on this. Anywyay it's a concern that is not directly linked to all you said so I guess it's ok to put on the side or to have some work done on this in 1.x

1000x this. The resource system is very unfriendly to people that need/want to implement their own solutions, and dancing around the existing framework to make it integrated is more work then it should be. The resource system (imo) really needs to be separated out into it's own component.

Regarding the refactor/rewrite discussion...ultimately if the end destination is the same(or very similar) then I won't care too much either way. But one thing I do worry about when talking about using the 2.0 series of Ogre as a transition to the rebuild is if Ogre uses the existing release schedule than we are talking 3+ years until this re-write is complete. Admittedly I don't know the background to most of this stuff (btw Dark_Sylinc, I appreciate you dumbing down the issues as much as you did in your slides) so maybe that is an inevitability, but I agree with Dark_Sylinc's comment in his original post. These changes need to occur as fast as possible.

Edit:
I made my post before I could see Xavyiy's post that is above mine. He seems to address my last concern. If that is the plan that goes forward then I retract my previous statement regarding the release schedule.

by **dark_sylinc** » Tue Nov 20, 2012 2:28 am

I was thinking on my way back home from University, and I think we should follow the Blender approach: For a long time the 2.5x branch was "unstable" or "alpha".
Nothing was set in stone (particularly Python interfaces) while still trying to be as much as backwards compatible as possible. It was a "Use at your own risk"

The way Ogre 2.x you're proposing could work much like that.
We have to temporarily break some stuff. For example, to fix the cache misses we start by SoA, then by removing if( dirty ) update(). But then we change how SceneNodes are traversed. Then how SceneNodes are stored & traversed (breadth first) and updated in one single place.
At this point, the OctreeSceneManager will be incompatible. Therefore Ogre 2.0 would ship without a high level culling system, but efficient cache usage.

For 2.1 we start moving a few bits of SceneNodes management from (i.e.) SceneManager (or wherever we stored it) into HighLevelCull. Then HighLevelCull starts the bookeeping system, etc.

The point is, some updates will unnavoidable break a few features; only to get them back in the next release and lose something else, temporarily.

by **tuan kuranes** » Tue Nov 20, 2012 11:49 am

great work.
As Ogre Core source code is a huge and complex monster, I would add re-factoring it into small modules prior to any other work extending the plugin system to core code.

pros

faster recompilation
benchmark new version of module against old module is easy.
Unit/visual testing/comparison for new code against old module.
Ability to provide multiple modules for the same "feature" (sw skeletal/hw skeletal/dualquat/no, old morph / texture morph, deferred/forward/lightprepass, culling/chc/occlusion, etc.) like Ogre does for scene management and renderer.
Deeper User customization of ogre core: small scenes based project, just rendering a few objects in a static scene with few interactions, could remove all the culling/skeletal/morph/compositor code just removing according module. (product shows, small puzzle games, adventure games, etc.)

cons:

Boring/administrative work to do on the building system
Testing/maintaining build system
Too much user choice (much can be automated or documented, but still burdens users with complex choices)

some ideas of modules:
composition - culling - render cull result to render queue - transforms/node - entity - maths - color/images/pixel - render queue execution to render system - etc. or more stage based

by **drwbns** » Wed Nov 21, 2012 4:43 pm

I always thought that maybe since there's a lot of people on both sides of the fence of refactoring, continuing code development vs a complete rewrite. Maybe the 2 sides should fork there and have whoever signs up for a rewrite continue their work on a completely different branch maybe called 2.0b. I think it would truly seperate the 2 ideas but what do I know?

by **Herb** » Thu Nov 22, 2012 8:18 pm

Couple of thoughts.... I like the idea in principle of separating Ogre into more smaller components, but I also realize there can be more complexity with that model "if" you're using a majority of the components. More DLL's to load and register to use for the components. I guess I'm speaking more towards a person who's new to Ogre as there are so many other components to integrate already before even thinking about integrating components within Ogre. If nothing else, it's a thought to consider if that moves forward.

As for Boost, I agree with the comments. I actually like the fact that I can select what threading library to use, as for example, I use POCO instead of Boost. Really, if Boost is a requirement, then we should actually "use" it's features throughout the library. But, as for threading, has anyone looked at the threading support in C++11? I thought threading support was baked into that and that should be cross-platform, pending Visual Studio has it implemented (most things I find the GNU guys have already baked in).

by **Mako_energy** » Thu Nov 22, 2012 9:28 pm

Herb wrote:As for Boost, I agree with the comments. I actually like the fact that I can select what threading library to use, as for example, I use POCO instead of Boost. Really, if Boost is a requirement, then we should actually "use" it's features throughout the library. But, as for threading, has anyone looked at the threading support in C++11? I thought threading support was baked into that and that should be cross-platform, pending Visual Studio has it implemented (most things I find the GNU guys have already baked in).

C++11 threading support isn't all in there just yet. As of GCC 4.7 std::thread isn't actually implemented in GCC. It'll probably be complete enough by the time we need it for Ogre 2.0, but even then if Ogre is going for something more task based you need more than what C++11 provides. Unless Ogre plans to make it's own framework for the tasks and scheduler, that is.

by **syedhs** » Thu Nov 22, 2012 11:37 pm

Probably threading should be made into core and thus, compulsory not an option?