LZMA (7zip) should be used for data package: 150MB smaller

Developer discussion of experimental fixes, changes, and improvements.

Moderators: Nexuiz Moderators, Moderators

LZMA (7zip) should be used for data package: 150MB smaller

Postby scientes » Wed Apr 22, 2009 4:56 am

switching to LZMA could save 150MB (25%) on the data archive with 8-16MB more memory required if program loads off archive during runtime.

for data20090403.pk3 :

gzip (current) - 596.2 MB
LZMA (7zip) - 446.6 MB

LZMA sdk is in the public domain or under LGPL with C++ and ANSI C (decompression only): http://www.7-zip.org/sdk.html
It is has good compression and very fast decompression (much better than bzip2)

read the docs to fine tune random access block size, and size of dictionary vs compression efficiency.
scientes
Newbie
 
Posts: 4
Joined: Wed Apr 22, 2009 4:44 am

Postby divVerent » Wed Apr 22, 2009 6:30 am

To give you some numbers:

Reading all files from hard drive: 81.193s, 58.878s when (partially, does not fit fully in RAM) cached
Reading all files from the zip file with infozip's decompressor: 47.336s, 29.515s when cached
Reading all files from the zip file with 7z's decompressor: 37.676s, 35.482s when cached
Reading all files from a 7z file at -mx=1: 150.924s, 149.442s when cached
Reading all files from a 7z file at -mx=9: 127.622s, 126.599s when cached

The file sizes are:

1425MB uncompressed
597MB as zip
545MB as 7z -mx=1
443MB as 7z -mx=9

EDIT note: I accidentally had read the pk3 using 7z, not the 7z file. Thus the old numbers...

From these results, you clearly see that 7z is not an option. It'd quadruple loading times for everyone...
1. Open Notepad
2. Paste: ÿþMSMSMS
3. Save
4. Open the file in Notepad again

You can vary the number of "MS", so you can clearly see it's MS which is causing it.
divVerent
Site admin and keyboard killer
 
Posts: 3809
Joined: Thu Mar 02, 2006 4:46 pm
Location: BRLOGENSHFEGLE

???

Postby scientes » Wed Apr 22, 2009 7:59 am

nexuiz does not take 47s or 30s to start now. It starts almost instantly. How do those numbers have anything to do with startup times?

You could also do a self-extractor or have it decompress it out of 7zip the first time.
scientes
Newbie
 
Posts: 4
Joined: Wed Apr 22, 2009 4:44 am

Postby divVerent » Wed Apr 22, 2009 8:40 am

I measured reading ALL files. Loading a map only reads about 10% of them, normally.

The total time does not correspond directly to the loading time - but what you can see, is that zip loads twice as fast as uncompressed files, and lzma loads twice as slow.

Which is why exactly zip was chosen over e.g. bz2 initially.

Self extractor would be an option (like, a tool that converts a 7z to a zip on first startup), but this would have to be separately coded for every OS, and would no longer allow us to provide one download for all operating systems.
1. Open Notepad
2. Paste: ÿþMSMSMS
3. Save
4. Open the file in Notepad again

You can vary the number of "MS", so you can clearly see it's MS which is causing it.
divVerent
Site admin and keyboard killer
 
Posts: 3809
Joined: Thu Mar 02, 2006 4:46 pm
Location: BRLOGENSHFEGLE

Postby Alien » Wed Apr 22, 2009 10:10 am

I wonder whether tar is better than zip. Of course, this can easily be left unanswered cause it's simple curiosity.
Alien
Forum addon
 
Posts: 1212
Joined: Tue Apr 22, 2008 7:12 am

Postby divVerent » Wed Apr 22, 2009 10:33 am

tar does not compress at all, so it's out. But DP actually supports an uncompressed format already - Quake 1's pak.

tar.gz is a tar file, gzipped, and may compress better than just tar. However, in a gzipped stream you can't seek without decompressing everything up to that point, so it'd be a quite inefficient format... you'd always have to read the full tar.gz file at least once.

A viable option would be gzipped or bzip2ed files inside a pak, or tar, and DP transparently uncompressing them. But gz inside tar uses zlib too, and would probably have very similar compression ratio and speed. bz2 is known to be slower than gz, so it probably isn't an option either.
1. Open Notepad
2. Paste: ÿþMSMSMS
3. Save
4. Open the file in Notepad again

You can vary the number of "MS", so you can clearly see it's MS which is causing it.
divVerent
Site admin and keyboard killer
 
Posts: 3809
Joined: Thu Mar 02, 2006 4:46 pm
Location: BRLOGENSHFEGLE

Postby scientes » Wed Apr 22, 2009 11:22 am

lzma does streams with the sdk, without requireing any of the file handling 7zip stuff. It can also do random seeks if yyou set it do so, but that requires a archive file format that allows random accesses.

bzip2 should be completely put aside because it sucks. its 10X slower than gzip and has woorse compression than lzma

zip=gzip compression wise because they use a identical compression format/algorithm.
scientes
Newbie
 
Posts: 4
Joined: Wed Apr 22, 2009 4:44 am

Postby divVerent » Wed Apr 22, 2009 3:32 pm

As the benchmark numbers prove, lzma is useless anyway for the data pk3.
1. Open Notepad
2. Paste: ÿþMSMSMS
3. Save
4. Open the file in Notepad again

You can vary the number of "MS", so you can clearly see it's MS which is causing it.
divVerent
Site admin and keyboard killer
 
Posts: 3809
Joined: Thu Mar 02, 2006 4:46 pm
Location: BRLOGENSHFEGLE

Postby Alien » Thu Apr 23, 2009 6:04 am

Tar question was not about size (of course) but about loading speed. For example, STEAM's gcf files have low compression, because they were made like this to increase loading speed by preventing file scattering around the filesystem and not to make an archive small as possible. One could compress whole nexuiz using lzma and then decompress everything to data.tar (if it was actually faster then zip). This way game would load (maybe) quicker while taking less time to download. The issue is not final data.* size, but time it takes to load the files.
Alien
Forum addon
 
Posts: 1212
Joined: Tue Apr 22, 2008 7:12 am

Postby tundramagi » Thu Apr 23, 2009 6:15 am

Alien wrote:Tar question was not about size (of course) but about loading speed. For example, STEAM's gcf files have low compression, because they were made like this to increase loading speed by preventing file scattering around the filesystem and not to make an archive small as possible. One could compress whole nexuiz using lzma and then decompress everything to data.tar (if it was actually faster then zip). This way game would load (maybe) quicker while taking less time to download. The issue is not final data.* size, but time it takes to load the files.


I guess you ignored the part "data in zip loads twice as fast as uncompressed".
Let's ignore that part again. Let's ignore it forever!

You could 7zip nexuiz, but then no one would beable to figure out how to unzip it and play the game.
If peeps can't figure out how to copy/paste encryption keys, or hack macintosh computers and pwn them, how could the figure out what a "doyt 7z" is?
tundramagi
Forum addon
 
Posts: 974
Joined: Sun Jan 04, 2009 4:53 pm

Next

Return to Nexuiz - Development

Who is online

Users browsing this forum: No registered users and 1 guest

cron