LZMA (7zip) should be used for data package: 150MB smaller

Developer discussion of experimental fixes, changes, and improvements.

Moderators: Nexuiz Moderators, Moderators


  • switching to LZMA could save 150MB (25%) on the data archive with 8-16MB more memory required if program loads off archive during runtime.

    for data20090403.pk3 :

    gzip (current) - 596.2 MB
    LZMA (7zip) - 446.6 MB

    LZMA sdk is in the public domain or under LGPL with C++ and ANSI C (decompression only): http://www.7-zip.org/sdk.html
    It is has good compression and very fast decompression (much better than bzip2)

    read the docs to fine tune random access block size, and size of dictionary vs compression efficiency.
    scientes
    Newbie
     
    Posts: 4
    Joined: Wed Apr 22, 2009 4:44 am

Wed Apr 22, 2009 6:30 am

  • To give you some numbers:

    Reading all files from hard drive: 81.193s, 58.878s when (partially, does not fit fully in RAM) cached
    Reading all files from the zip file with infozip's decompressor: 47.336s, 29.515s when cached
    Reading all files from the zip file with 7z's decompressor: 37.676s, 35.482s when cached
    Reading all files from a 7z file at -mx=1: 150.924s, 149.442s when cached
    Reading all files from a 7z file at -mx=9: 127.622s, 126.599s when cached

    The file sizes are:

    1425MB uncompressed
    597MB as zip
    545MB as 7z -mx=1
    443MB as 7z -mx=9

    EDIT note: I accidentally had read the pk3 using 7z, not the 7z file. Thus the old numbers...

    From these results, you clearly see that 7z is not an option. It'd quadruple loading times for everyone...
    1. Open Notepad
    2. Paste: ÿþMSMSMS
    3. Save
    4. Open the file in Notepad again

    You can vary the number of "MS", so you can clearly see it's MS which is causing it.
    User avatar
    divVerent
    Site admin and keyboard killer
     
    Posts: 3809
    Joined: Thu Mar 02, 2006 4:46 pm
    Location: BRLOGENSHFEGLE

Wed Apr 22, 2009 7:59 am

  • nexuiz does not take 47s or 30s to start now. It starts almost instantly. How do those numbers have anything to do with startup times?

    You could also do a self-extractor or have it decompress it out of 7zip the first time.
    scientes
    Newbie
     
    Posts: 4
    Joined: Wed Apr 22, 2009 4:44 am

Wed Apr 22, 2009 8:40 am

  • I measured reading ALL files. Loading a map only reads about 10% of them, normally.

    The total time does not correspond directly to the loading time - but what you can see, is that zip loads twice as fast as uncompressed files, and lzma loads twice as slow.

    Which is why exactly zip was chosen over e.g. bz2 initially.

    Self extractor would be an option (like, a tool that converts a 7z to a zip on first startup), but this would have to be separately coded for every OS, and would no longer allow us to provide one download for all operating systems.
    1. Open Notepad
    2. Paste: ÿþMSMSMS
    3. Save
    4. Open the file in Notepad again

    You can vary the number of "MS", so you can clearly see it's MS which is causing it.
    User avatar
    divVerent
    Site admin and keyboard killer
     
    Posts: 3809
    Joined: Thu Mar 02, 2006 4:46 pm
    Location: BRLOGENSHFEGLE

Wed Apr 22, 2009 10:10 am

  • I wonder whether tar is better than zip. Of course, this can easily be left unanswered cause it's simple curiosity.
    Alien
    Forum addon
     
    Posts: 1212
    Joined: Tue Apr 22, 2008 7:12 am

Wed Apr 22, 2009 10:33 am

  • tar does not compress at all, so it's out. But DP actually supports an uncompressed format already - Quake 1's pak.

    tar.gz is a tar file, gzipped, and may compress better than just tar. However, in a gzipped stream you can't seek without decompressing everything up to that point, so it'd be a quite inefficient format... you'd always have to read the full tar.gz file at least once.

    A viable option would be gzipped or bzip2ed files inside a pak, or tar, and DP transparently uncompressing them. But gz inside tar uses zlib too, and would probably have very similar compression ratio and speed. bz2 is known to be slower than gz, so it probably isn't an option either.
    1. Open Notepad
    2. Paste: ÿþMSMSMS
    3. Save
    4. Open the file in Notepad again

    You can vary the number of "MS", so you can clearly see it's MS which is causing it.
    User avatar
    divVerent
    Site admin and keyboard killer
     
    Posts: 3809
    Joined: Thu Mar 02, 2006 4:46 pm
    Location: BRLOGENSHFEGLE

Wed Apr 22, 2009 11:22 am

  • lzma does streams with the sdk, without requireing any of the file handling 7zip stuff. It can also do random seeks if yyou set it do so, but that requires a archive file format that allows random accesses.

    bzip2 should be completely put aside because it sucks. its 10X slower than gzip and has woorse compression than lzma

    zip=gzip compression wise because they use a identical compression format/algorithm.
    scientes
    Newbie
     
    Posts: 4
    Joined: Wed Apr 22, 2009 4:44 am

Wed Apr 22, 2009 3:32 pm

  • As the benchmark numbers prove, lzma is useless anyway for the data pk3.
    1. Open Notepad
    2. Paste: ÿþMSMSMS
    3. Save
    4. Open the file in Notepad again

    You can vary the number of "MS", so you can clearly see it's MS which is causing it.
    User avatar
    divVerent
    Site admin and keyboard killer
     
    Posts: 3809
    Joined: Thu Mar 02, 2006 4:46 pm
    Location: BRLOGENSHFEGLE

Thu Apr 23, 2009 6:04 am

  • Tar question was not about size (of course) but about loading speed. For example, STEAM's gcf files have low compression, because they were made like this to increase loading speed by preventing file scattering around the filesystem and not to make an archive small as possible. One could compress whole nexuiz using lzma and then decompress everything to data.tar (if it was actually faster then zip). This way game would load (maybe) quicker while taking less time to download. The issue is not final data.* size, but time it takes to load the files.
    Alien
    Forum addon
     
    Posts: 1212
    Joined: Tue Apr 22, 2008 7:12 am

Thu Apr 23, 2009 6:15 am

  • Alien wrote:Tar question was not about size (of course) but about loading speed. For example, STEAM's gcf files have low compression, because they were made like this to increase loading speed by preventing file scattering around the filesystem and not to make an archive small as possible. One could compress whole nexuiz using lzma and then decompress everything to data.tar (if it was actually faster then zip). This way game would load (maybe) quicker while taking less time to download. The issue is not final data.* size, but time it takes to load the files.


    I guess you ignored the part "data in zip loads twice as fast as uncompressed".
    Let's ignore that part again. Let's ignore it forever!

    You could 7zip nexuiz, but then no one would beable to figure out how to unzip it and play the game.
    If peeps can't figure out how to copy/paste encryption keys, or hack macintosh computers and pwn them, how could the figure out what a "doyt 7z" is?
    tundramagi
    Forum addon
     
    Posts: 974
    Joined: Sun Jan 04, 2009 4:53 pm

Thu Apr 23, 2009 6:22 am

  • tundramagi wrote:
    Alien wrote:Tar question was not about size (of course) but about loading speed. For example, STEAM's gcf files have low compression, because they were made like this to increase loading speed by preventing file scattering around the filesystem and not to make an archive small as possible. One could compress whole nexuiz using lzma and then decompress everything to data.tar (if it was actually faster then zip). This way game would load (maybe) quicker while taking less time to download. The issue is not final data.* size, but time it takes to load the files.


    I guess you ignored the part "data in zip loads twice as fast as uncompressed".
    Let's ignore that part again. Let's ignore it forever!

    You could 7zip nexuiz, but then no one would beable to figure out how to unzip it and play the game.
    If peeps can't figure out how to copy/paste encryption keys, or hack macintosh computers and pwn them, how could the figure out what a "doyt 7z" is?


    Started stalking me? It was just another thing which I wanted - personal troll. 8) Yay
    Alien
    Forum addon
     
    Posts: 1212
    Joined: Tue Apr 22, 2008 7:12 am

Thu Apr 23, 2009 6:41 am

  • Alien wrote:
    tundramagi wrote:
    Alien wrote:Tar question was not about size (of course) but about loading speed. For example, STEAM's gcf files have low compression, because they were made like this to increase loading speed by preventing file scattering around the filesystem and not to make an archive small as possible. One could compress whole nexuiz using lzma and then decompress everything to data.tar (if it was actually faster then zip). This way game would load (maybe) quicker while taking less time to download. The issue is not final data.* size, but time it takes to load the files.


    I guess you ignored the part "data in zip loads twice as fast as uncompressed".
    Let's ignore that part again. Let's ignore it forever!

    You could 7zip nexuiz, but then no one would beable to figure out how to unzip it and play the game.
    If peeps can't figure out how to copy/paste encryption keys, or hack macintosh computers and pwn them, how could the figure out what a "doyt 7z" is?


    Started stalking me? It was just another thing which I wanted - personal troll. 8) Yay


    The forum doesn't get much posts. You're the only one posting today :P
    tundramagi
    Forum addon
     
    Posts: 974
    Joined: Sun Jan 04, 2009 4:53 pm

Thu Apr 23, 2009 2:12 pm

  • Alien wrote:Tar question was not about size (of course) but about loading speed.


    tar is not compressed AT ALL. Zip is faster than uncompressed.

    But DP already supports an uncompressed format, so you can do your own benchmarks and try. It is Quake's PAK format.
    1. Open Notepad
    2. Paste: ÿþMSMSMS
    3. Save
    4. Open the file in Notepad again

    You can vary the number of "MS", so you can clearly see it's MS which is causing it.
    User avatar
    divVerent
    Site admin and keyboard killer
     
    Posts: 3809
    Joined: Thu Mar 02, 2006 4:46 pm
    Location: BRLOGENSHFEGLE

Thu Apr 23, 2009 5:58 pm

  • What would be the explanation of zip being the fastest?
    Alien
    Forum addon
     
    Posts: 1212
    Joined: Tue Apr 22, 2008 7:12 am

Thu Apr 23, 2009 7:30 pm

Thu Apr 23, 2009 7:32 pm

  • Basically, zip decompression is faster than your hard drive can read data.

    Uncompressed is only faster if cached.
    1. Open Notepad
    2. Paste: ÿþMSMSMS
    3. Save
    4. Open the file in Notepad again

    You can vary the number of "MS", so you can clearly see it's MS which is causing it.
    User avatar
    divVerent
    Site admin and keyboard killer
     
    Posts: 3809
    Joined: Thu Mar 02, 2006 4:46 pm
    Location: BRLOGENSHFEGLE

Thu Apr 23, 2009 8:57 pm

  • Basically harddrives are garbage and should be shot.
    Alien: why AREN'T you using an SSD.
    Everyone used an SSD these days, get with the program G.
    tundramagi
    Forum addon
     
    Posts: 974
    Joined: Sun Jan 04, 2009 4:53 pm

Fri Apr 24, 2009 9:34 am

  • @tundramagi
    in this case, access times (what SSDs are so good at) are not really the thing making uncompressed slower, it is aligned reads, which i believe hard drives are still faster at.


    It would be kinda nutty but i could see the file starting as .tar.lzma and then being converted to .tar.gz on first load. lzma decompression itsself (without file packing 7zip stuff) is a very small C library. However i have no idea about tar.gz (and .tar.lzma) vs zip on random accesses (compression wise they are identical)
    scientes
    Newbie
     
    Posts: 4
    Joined: Wed Apr 22, 2009 4:44 am

Fri Apr 24, 2009 12:19 pm

  • tundramagi wrote:why AREN'T you using an SSD.

    Cost. SSDs are still more expensive than conventional hard discs, by an order of magnitude. MTBFs, really, aren't all too different despite the garbage that SSD salesdroids peddle.

    scientes: Recent SSDs are significantly faster at sequential, aligned reads than HDDs are. HDD sequential read speed is limited by how quickly you can spin the platters while SSDs are limited by how many flash dies you can run in parallel.

    Anyway back to the point. I think a major reason to use a more efficient compression is to reduce download times rather than map loading times since the difference in loading times between uncompressed/zip/bz2/7-z/etc. will be rather insignificant in any system from the last five years or so. Difference in download times will be many minutes for allot of the community though.

    On systems with slow HDDs the CPU time to decompress is often less than the HDD time to read uncompressed files.
    Taiyo.uk
    Alien trapper
     
    Posts: 436
    Joined: Mon Apr 17, 2006 8:48 pm
    Location: Reading, IN-GER-LUND!!!

Fri Apr 24, 2009 4:53 pm

  • Actually, it does matter if a map loads in 10 or 15 seconds...
    1. Open Notepad
    2. Paste: ÿþMSMSMS
    3. Save
    4. Open the file in Notepad again

    You can vary the number of "MS", so you can clearly see it's MS which is causing it.
    User avatar
    divVerent
    Site admin and keyboard killer
     
    Posts: 3809
    Joined: Thu Mar 02, 2006 4:46 pm
    Location: BRLOGENSHFEGLE

Fri Apr 24, 2009 5:41 pm

  • Load times in nexuiz are pretty long as is, it would be uncool if they were longer. There has been alot of work by divverent to cut down load times (particularly on the server) and this idea of using a dog-slow compression scheme to gain a one time advantage and a persistant curse would nullify all that work (not loading light maps on server, capping bot waypoint load times, and caching bot waypoints once loaded etc).

    On all my machines, no matter how fast, loading nexuiz maps is SLOW (but faster than last release by far). Other games (like urbanterror) load pretty fast in comparison. We need faster load times, not slower!

    (This message has been brought to you by the council for american justice, all rights reserved)
    tundramagi
    Forum addon
     
    Posts: 974
    Joined: Sun Jan 04, 2009 4:53 pm

Fri Apr 24, 2009 6:48 pm

  • When using bots in Nexuiz 2.5 the startup times are rather long on some open maps like silvercity. This has been improved already by caching "walkable" bot paths (using plain text files). This will be available on the next release.
    User avatar
    mand1nga
    Alien trapper
     
    Posts: 321
    Joined: Mon May 12, 2008 12:19 am

Sun Apr 26, 2009 9:00 am

  • +1 to everything Div said. LZMA = more CPU, more RAM to decompress. The more brutal you are at compressing it, the longer and more costly it is to decompress. This is not how ZIP works where the longer you take compressing it, in general the longer you take compressing, the quicker it is decompressing.
    Ed
    Forum addon
     
    Posts: 1172
    Joined: Wed Mar 01, 2006 12:32 am
    Location: UK

Sun Apr 26, 2009 3:12 pm

  • why isn't just the download package ziped with lzma?
    Asraniel
    Alien
     
    Posts: 112
    Joined: Tue Feb 28, 2006 9:15 pm

Sun Apr 26, 2009 3:14 pm

  • Asraniel wrote:why isn't just the download package ziped with lzma?

    /me seconds this question.
    quit for good
    alpha
    Alien trapper
     
    Posts: 492
    Joined: Tue Jun 17, 2008 7:18 pm

Sun Apr 26, 2009 4:29 pm

  • Windows and OSX do not support .7z natively, and not everyone uses 7zip.

    Zip is a well-defined standard, stick with it.
    TVR
    Alien trapper
     
    Posts: 404
    Joined: Fri Jun 01, 2007 12:56 am

Sun Apr 26, 2009 8:17 pm

  • TVR wrote:Windows and OSX do not support .7z natively, and not everyone uses 7zip.

    Zip is a well-defined standard, stick with it.


    err, what about .tar.gz? it offers pretty good compression, and it's a bit more common... Also, why not offer both formats?
    User avatar
    Psychcf
    Forum addon
     
    Posts: 1554
    Joined: Sun Dec 03, 2006 11:38 pm
    Location: NY, USA

Sun Apr 26, 2009 8:37 pm

  • Psychcf wrote:
    TVR wrote:Windows and OSX do not support .7z natively, and not everyone uses 7zip.

    Zip is a well-defined standard, stick with it.


    err, what about .tar.gz? it offers pretty good compression, and it's a bit more common... Also, why not offer both formats?

    A gzipped tarball won't work since you can't access certain files in it without unzipping the whole thing, but you *could* gzip everything first then make a .tar, but that will have about the same speed as zip.

    Nexuiz already ships with many external libs for mac and windows, so it won't be a problem.
    I would like to know the speed of compressing the contents with lzma, and then putting them in a tarball vs zip vs 7z, also going beyond -2 lzma compression isn't supposed to make a large difference in size, but will make a large difference in decompression time.
    uluyol901
    Member
     
    Posts: 16
    Joined: Sun Jan 25, 2009 7:39 pm

Mon Apr 27, 2009 8:57 am

  • Guess why I tried -1 lzma compression above. Not much faster either.
    1. Open Notepad
    2. Paste: ÿþMSMSMS
    3. Save
    4. Open the file in Notepad again

    You can vary the number of "MS", so you can clearly see it's MS which is causing it.
    User avatar
    divVerent
    Site admin and keyboard killer
     
    Posts: 3809
    Joined: Thu Mar 02, 2006 4:46 pm
    Location: BRLOGENSHFEGLE

Wed Apr 29, 2009 5:25 am

  • 7z is not a new compression method but more like a container.

    So it seems the bottleneck is hdd reading speed and zip has the best decompression time/file size ratio.
    Alien
    Forum addon
     
    Posts: 1212
    Joined: Tue Apr 22, 2008 7:12 am

Next


Return to Nexuiz - Development




Information
  • Who is online
  • Users browsing this forum: No registered users and 1 guest