CFlags for recompiling

Tips on how to tweak Nexuiz for the best performance

Moderators: Nexuiz Moderators, Moderators

Mon Sep 18, 2006 3:41 pm

  • The gentoo wiki carries this page of safe cflags for different CPU architectures:
    http://gentoo-wiki.com/Safe_Cflags

    Do the Nexuiz devs think these are safe cflags for DP/Nexuiz? What cflags are used by default?
    Laters losers.
    Ed
    Forum addon
     
    Posts: 1172
    Joined: Wed Mar 01, 2006 12:32 am
    Location: UK

Mon Sep 18, 2006 3:54 pm

  • Nexuiz or better DarkPlaces uses currently:
    -O2 -fno-strict-aliasing

    -funroll-loops is known the make the binary very large while only being a very very small tad faster
    -ffast-math is known to give problems with gcc 4.1 and 64 machines for DP
    User avatar
    esteel
    Site admin and forum addon
     
    Posts: 3924
    Joined: Wed Mar 01, 2006 8:27 am

Tue Sep 19, 2006 2:55 pm

  • I wrote a script to coimpile multiple binaries with different march flags and optimisations and then benchmark each to provide performance analysis:
    Code: Select all
    #!/bin/bash
    #A script to compile many versions of Nexuiz and benchmark

    for arch in i686 pentiumpro pentium2 pentium3 c3-2 athlon athlon-tbird athlon-xp athlon-4 athlon-mp
    # Available processor architectures:
    # i386 i486 i586 pentium pentium-mmx i686 pentiumpro pentium2 pentium3 c3 c3-2 k6 k6-2 k6-3 athlon athlon-tbird athlon-xp athlon-4 athlon-mp etc.
    do

    for optimise in -O2 -O3
    # Optimisation types: -O -O2 -O3 -Os
    do

    make clean
    make nexuiz CFLAGS_RELEASE="" OPTIM_RELEASE="" CFLAGS_COMMON="-march=$arch $optimise -fno-strict-aliasing -ffast-math -fomit-frame-pointer"
    cp nexuiz-glx ../nexuiz-glx-$arch$optimise
    cd ..
    ./nexuiz-glx-$arch$optimise -game compile -sndspeed 48000 -benchmark demos/demo1
    cd darkplaces
    done
    done


    Extract the dp source to a darkplaces subdir, put this script there and run it. In this form it only does glx binaries as my SDL libs are a mess though I have done it with SDL on another computer.

    My system:
    Athlon XP 2400+
    1Gb RAM
    Geforce 6600GT 128Mb AGP
    Linux 2.6.12-12mdk
    GCC 4.0.1
    Nexuiz 2,0

    Settings in Nexuiz are purely arbitary but are fairly high in this case. -fomit-frame-pointer was used in most tests as I have found it to give a meaningful performance boost:
    Image

    The standard make settings give 50.37fps. By enabling -fomit-frame-pointer I get 50.94fps, by changing to -O3, 51.14fps.

    The startling thing is the difference by architecture. It doesn't make that much difference! The athlon-xp option should be fastest but isn't. Athlon-4 beats it and has a small lead over the i686. c3-2 is even quicker! These of course don't reflect what Nexuiz would perform like on that CPU.

    I did try all the -O options including -O and -Os which were in general significantly slower.

    I have also tried this to lesser extent on a Via Nehemiah (c3-2) and found that the i686 -Os is best there. c3-2 is surprisingly slow.

    I would recommend sticking to i686 as it doesn't seem to make much difference (and can be a lot worse) compiling for your architecture but also try different -O options and try -fomit-frame-pointer.

    For the official build, i686 seems like the best bet to stay with, despite the Nexuiz requirements needing a Pentium 3/Athlon or better. -O3 and -fomit-frame-pointer may be worth looking at, particularly with GCC 4.

    I would be interested if other people can repeat this experiment on different architectures. I would do a full run on my Nehemiah but it's graphics are hopeless and ATi broke my Dad's tbird. Feel free to add or remove options from the script, this was just what works on my setup.
    Laters losers.
    Ed
    Forum addon
     
    Posts: 1172
    Joined: Wed Mar 01, 2006 12:32 am
    Location: UK

Tue Sep 19, 2006 3:12 pm

  • Interesting stuff Ed. One thing i would sugest is running the bechmark twice w/o exiting the game, not sure on how to do that tho (with automation) :| this would be to remove the shader compiletimes from the equation as thise are preddy uninteresting in a game preformance point of view.
    HOF:
    <Diablo> the nex is a "game modification"
    <Diablo> quake1 never had a weapon like that.
    <Vordreller> there was no need for anything over 4GB untill Vista came along
    <Samua>]Idea: Fix it? :D
    <Samua>Lies, that only applies to other people.
    User avatar
    tZork
    tZite Admin
     
    Posts: 1337
    Joined: Tue Feb 28, 2006 6:16 pm
    Location: Halfway to somwhere else

Tue Sep 19, 2006 3:21 pm

  • Your benchmark makes the differences look rather huge but remember the difference from slowest to fastest was only TWO FPS. Thats not very much and only gentoo types would spend much time to get 2 fps more :P Sorry could not resist..
    User avatar
    esteel
    Site admin and forum addon
     
    Posts: 3924
    Joined: Wed Mar 01, 2006 8:27 am

Tue Sep 19, 2006 3:54 pm

  • tZork wrote:One thing i would sugest is running the bechmark twice w/o exiting the game ... this would be to remove the shader compiletimes from the equation as thise are preddy uninteresting in a game preformance point of view.

    The shader compilation makes it more CPU bound so will show more differences in compilation. The benchmark is of course entirely arbitary and I would not use the same settings on every machine as the point is the difference between builds on the same machine doing the same thing.

    esteel wrote:the difference from slowest to fastest was only TWO FPS.

    That's not all of the compilations I did. Many of the -O and -Os options were around 47fps and the slowest I got with any settings was 28fps. That makes a huge difference.
    esteel wrote:only gentoo types would spend much time to get 2 fps more

    Part of the point is actually that the gentoo cflags I linked to at the beginning are nto actually the best options. People compiling Nexuiz through Portage are not getting the performance they think they are.and would be better off using the standard binary. However, there are also ways of getting more than with the normal binary using other options that would not be used in Portage as they'd be unsafe system wide.
    Laters losers.
    Ed
    Forum addon
     
    Posts: 1172
    Joined: Wed Mar 01, 2006 12:32 am
    Location: UK



Return to Nexuiz - Performance Tips




Information
  • Who is online
  • Users browsing this forum: No registered users and 1 guest