Overview of game file formats and archives
- 
				aluigi
- Site Admin
- Posts: 12984
- Joined: Wed Jul 30, 2014 9:32 pm
Overview of game file formats and archives
This is a paper I wrote in April 2013 but it has been never released until now.
It offers just an introduction and overview of the formats that you see daily on this forum.
There are also some statistics that I took in 2013.
The text version of the document is available here:
http://aluigi.org/papers/game_formats_stats.txt
Every post is a section of the paper.
			
			
									
						
										
						It offers just an introduction and overview of the formats that you see daily on this forum.
There are also some statistics that I took in 2013.
The text version of the document is available here:
http://aluigi.org/papers/game_formats_stats.txt
Every post is a section of the paper.
- 
				aluigi
- Site Admin
- Posts: 12984
- Joined: Wed Jul 30, 2014 9:32 pm
Re: Overview of game file formats and archives
Introduction and formats
Games use a lot of different file types: textures, sounds, musics, 3d
models, AI scripts, animation scripts, configuration files, images,
videos and so on.
Instead of having those files sparse in the game's folder, the
developers prefer to store them in one or more archives for the
following main reasons:
These solutions can be used alone or often combined, so it's not rare
to see an archive containing compressed and encrypted content.
When we are in front of an archive or an encrypted/compressed file our
target is just dumping its content and later understanding how to use
the dumped files, for example a 3d model in software like 3ds Max or an
Ogg file in a media player or customized formats that must be converted
to other formats and so on.
The part of the procedure covered by this document is just the first
step, understanding a file format for extracting its content "as is".
Usually only the following parameters are necessary:
Usually these information are stored in an index table usually called
TOC (table of content), in some games it may be encrypted to avoid the
correct dumping of the resources while in other formats the resources
are stored sequentially avoiding to specify an offset field for each
file.
In the next examples I will use some words to identify some common
fields:
Example of Index table:
Example of sequential files:
Example of sequential table followed by sequential files:
Some variants and customizations:
There are even archives in which the format is really complex because
they don't store the original files but they use them as direct
"resources" ready to be used in the game engine and so there are more
steps to accomplish our target.
If an archive uses a block cipher encryption like AES or Blowfish
there is also a third size component to take in consideration, the
block aligned size of the resource.
If this value is missing, usually it's automatically calculated or the
game uses CipherFinal of OpenSSL or stream modes like CTR.
Example of stored file encrypted with a block cipher:
A solution that is often used to save space is dividing the archived
files in small parts called "chunks".
The advantage of this technique is that the chunks are compressed
only if the compressed size is lower than the uncompressed one but the
disadvantage is that the usage of small chunks doesn't take the
benefits of the most advanced compression algorithms because the
dictionary/window doesn't have enough data to be filled and used.
Usually the decompressed size of the chunks is not specified because
it's hardcoded in the game.
A compressed chunk with size zero or equal than the chunk decompressed
size means it's stored "as-is" without compression.
Example of chunk based file:
			
			
									
						
										
						Games use a lot of different file types: textures, sounds, musics, 3d
models, AI scripts, animation scripts, configuration files, images,
videos and so on.
Instead of having those files sparse in the game's folder, the
developers prefer to store them in one or more archives for the
following main reasons:
-  performance
 accessing one single file (the whole archive) requires less resources
 than opening and closing every single file resource, it results in
 minor loading times, less memory and disk usage (no disk allocation
 unit alignment and continuous opening of different files).
-  content protection
 often these archives contain encrypted content, game developers and
 publishers try to avoid its usage for modding or personal user (for
 example listening a soundtrack) and obviously to avoid its embedding
 in other commercial projects.
 In this case the adopted solutions range from the simple obfuscation
 of the content by XORing the data with a fixed byte or key to
 customized encryption algorithms.
-  saving space
 many archives use compression algorithms and other mechanisms for
 saving space on disk for their games, it was quite common in the past
 just like it's necessary nowadays where games occupy gigabytes of
 space.
These solutions can be used alone or often combined, so it's not rare
to see an archive containing compressed and encrypted content.
When we are in front of an archive or an encrypted/compressed file our
target is just dumping its content and later understanding how to use
the dumped files, for example a 3d model in software like 3ds Max or an
Ogg file in a media player or customized formats that must be converted
to other formats and so on.
The part of the procedure covered by this document is just the first
step, understanding a file format for extracting its content "as is".
Usually only the following parameters are necessary:
-  offset of the resources, location of the file inside the archive
 (where it begins)
-  size of the resource
-  optional compressed/uncompressed size if the file has been shrinked
 with a compression algorithm
-  optional name of the resource, often the original name of the
 archived file
Usually these information are stored in an index table usually called
TOC (table of content), in some games it may be encrypted to avoid the
correct dumping of the resources while in other formats the resources
are stored sequentially avoiding to specify an offset field for each
file.
In the next examples I will use some words to identify some common
fields:
- OFFSET location of the file in hexadecimal (0x22 = 34)
- ZSIZE compressed size
- SIZE normal and uncompressed size
- FILES amount of files stored in the archive
- NAME name of the stored file
- FILE the content (data) of the stored file
Example of Index table:
Code: Select all
        +-----------------+
        | FILES         2 |
        +-----------------+
      /-| OFFSET 00000022 |
      | +-----------------+
      | | SIZE         41 |
      | +-----------------+
      | | NAME   test.txt |
      | +-----------------+
    /-+-| OFFSET 0000004b |
    | | +-----------------+
    | | | SIZE         20 |
    | | +-----------------+
    | | | NAME   blah.dat |
    | | +-----------------+-------------------------+
    | \>| FILE 1                                    |
    |   +----------------------+--------------------+
    \-->| FILE 2               |
        +----------------------+Example of sequential files:
Code: Select all
        +-----------------+
        | SIZE         41 |
        +-----------------+
        | NAME   test.txt |
        +-----------------+-------------------------+
        | FILE 1                                    |
        +-----------------+-------------------------+
        | SIZE         20 |
        +-----------------+
        | NAME   blah.dat |
        +-----------------+----+
        | FILE 2               |
        +----------------------+Example of sequential table followed by sequential files:
Code: Select all
        +-----------------+
        | FILES         2 |
        +-----------------+
        | SIZE         41 |
        +-----------------+
        | NAME   test.txt |
        +-----------------+
        | SIZE         20 |
        +-----------------+
        | NAME   blah.dat |
        +-----------------+-------------------------+
        | FILE 1                                    |
        +----------------------+--------------------+
        | FILE 2               |
        +----------------------+Some variants and customizations:
-  relative file offsets, usually the absolute offset from which are
 calculated the relative file offsets is specified directly at the
 beginning of the archive or calculated before or after having read
 the whole TOC:
 -  before: it can be accomplished only with fixed size file entries,
 for example with filenames having a maximum length:
 BASE_OFF = offset_first_entry + (entries * sizeof(entry))
 
-  after: it's necessary to parse the whole entries before knowing
 this offset
 
 
-  before: it can be accomplished only with fixed size file entries,
-  sector offset: quite common on PlayStation games where the specified
 offsets must be multiplied by 2048 (size of disk sector)
-  TOC at the end: the TOC is often located at the beginning of the
 archive but some games prefer to put it at the end for being able to
 update the archive in future with new content, usual methods:-  header at beginning telling the offsets where is located the TOC
 
-  few bytes of information at the end containing the TOC offset or
 just the size of the TOC from which can be retrieved the offset
 
-  header at beginning telling the offsets where is located the TOC
-  nested tree: usually the filenames already include the full path like
 models\character\chara_1.mdl but sometimes the whole directory tree
 is stored in the archive (folders and files) and it requires to be
 parsed recursively
-  sometimes TOC may be compressed
-  chunked files: see later
-  TOC in a separate file: usually called "index file", a small file
 that contains all the information of the files archive in a "data
 file", usually they share the same name and different extension, for
 example: archive.idx and archive.dat
-  ZIP format: sometimes games use just a ZIP archive for containing
 their files, some games may try to implement a custom version of the
 ZIP format as it happens with those that add a new compression
 algorithm (Forza Motorsport and Dark Sector) or those that use some
 different fields or don't use the classical "PK" magic values for the
 various sections of the ZIP archive.
There are even archives in which the format is really complex because
they don't store the original files but they use them as direct
"resources" ready to be used in the game engine and so there are more
steps to accomplish our target.
If an archive uses a block cipher encryption like AES or Blowfish
there is also a third size component to take in consideration, the
block aligned size of the resource.
If this value is missing, usually it's automatically calculated or the
game uses CipherFinal of OpenSSL or stream modes like CTR.
Example of stored file encrypted with a block cipher:
Code: Select all
        +-----------------+
        | OFFSET 00000022 |
        +-----------------+
        | ZSIZE        41 |     compressed size
        +-----------------+
        | SIZE        180 |     uncompressed size
        +-----------------+
        | XSIZE        48 |     archive size (aligned)
        +-----------------+
        | NAME   test.txt |
        +-----------------+--------------------------------+
        | FILE 1 (compressed and encrypted)        PADDING |
        +--------------------------------------------------+A solution that is often used to save space is dividing the archived
files in small parts called "chunks".
The advantage of this technique is that the chunks are compressed
only if the compressed size is lower than the uncompressed one but the
disadvantage is that the usage of small chunks doesn't take the
benefits of the most advanced compression algorithms because the
dictionary/window doesn't have enough data to be filled and used.
Usually the decompressed size of the chunks is not specified because
it's hardcoded in the game.
A compressed chunk with size zero or equal than the chunk decompressed
size means it's stored "as-is" without compression.
Example of chunk based file:
Code: Select all
        +-----------------+
        | OFFSET 00000022 |
        +-----------------+
        | SIZE        180 |
        +-----------------+
        | NAME   test.txt |
        +-----------------+
        | CHUNKS        3 +
        +-----------------+
        | CHUNK ZSIZE  30 |     * let's say CHUNK SIZE is 64
        +-----------------+
        | CHUNK ZSIZE  42 |
        +-----------------+
        | CHUNK ZSIZE  35 |
        +-----------------+--------------+
        | CHUNK 1                        |
        +--------------------------------+-----------+
        | CHUNK 2                                    |
        +-------------------------------------+------+
        | CHUNK 3                             |
        +-------------------------------------+- 
				aluigi
- Site Admin
- Posts: 12984
- Joined: Wed Jul 30, 2014 9:32 pm
Re: Overview of game file formats and archives
The modding perspective: rebuilders
Usually the purposes of obtaining a resource from an archive are the
following:
In the last two cases the user needs a way to force the game to load a
non archived file or to rebuild the archive or to reinject it in the
original archive:
If the archive uses asymmetric cryptography and/or digital signature
it's not possible to perform rebuilding or reimporting due to the lack
of the private key. An example are the GameGuard files.
In these cases the only solution is modifying the game executable for
removing the check of the signature or using a known private/public key
generated by us.
			
			
									
						
										
						Usually the purposes of obtaining a resource from an archive are the
following:
-  using the resource
 a typical example is the music of a game to listen on the own
 computer or images to use as wallpaper
-  modding the same game
 editing the extracted content and reinjecting it back in the archive
 or just rebuilding the whole archive from scratch
-  using the resource obtained from a game in another different game
 reinjecting the resource or rebuilding the archive of another game
In the last two cases the user needs a way to force the game to load a
non archived file or to rebuild the archive or to reinject it in the
original archive:
-  Usage of non archived resources
 in some cases it's possible to use the extracted resources in the
 game by default because the developers left this feature enabled for
 debugging or because the usage of archives was meant only to improve
 loading performances.
 In some other cases it's necessary to activate a specific option from
 a configuration file or command-line (like in Need for Speed Shift),
 while in other situations there is no way to force the game to read
 the extracted files.
-  Archive rebuilding
 this is the best solution but unfortunately it's also the most
 expensive because extracting a file is completely different than
 rebuilding the whole archive.
 For rebuilding an archive it's necessary to know "all" the fields
 used in the TOC and it's not possible to ignore most of them as we
 did with extraction, additionally creating a rebuilder requires more
 effort and programming work than writing an extractor.
-  Reinjecting/Reimporting
 this is the way that requires the minimal effort and in most cases
 can be implemented even automatically just like I do in my QuickBMS
 tool that allows an extraction script to be used also in reimport
 mode without any change.
 The downsides of this method are:-  no CRC/checksum/hash recalculation if used in the archive, exist
 some work-arounds that can be applied like automatically
 recalculating and overwriting the CRC field but this is not
 possible if the algorithm is not a common one, some games ignore
 the different CRC, others will reject the edited file
 
-  in the past there was a limitation with the size of the new files
 which has been bypassed by a new reimport method (reimport2), but
 still some archives are incompatible if they use sequential offsets
 
-  in case of custom encryption and compression algorithms it's
 possible that doesn't exist the code to re-encrypt or re-compress
 the data (this is valid for the rebuilding solution too)
 
-  in some cases it's possible that the new version of the archive is
 not fully compatible with the game, maybe the game checks the hash
 of the archive before using it or something else
 
 Anyway it's worth to note that the benefits of this solution are
 incredible for both the writer of the script and the modder and many
 mods, cheats and customizations have been created in this way.
-  no CRC/checksum/hash recalculation if used in the archive, exist
If the archive uses asymmetric cryptography and/or digital signature
it's not possible to perform rebuilding or reimporting due to the lack
of the private key. An example are the GameGuard files.
In these cases the only solution is modifying the game executable for
removing the check of the signature or using a known private/public key
generated by us.
- 
				aluigi
- Site Admin
- Posts: 12984
- Joined: Wed Jul 30, 2014 9:32 pm
Re: Overview of game file formats and archives
All the material that has been evaluated for creating this document
comes from my personal research available on my personal website.
The main source is composed by the scripts for my QuickBMS program
started in 2009: http://quickbms.aluigi.org
The secondary source are the stand-alone tools available in my Research
page: http://papers.aluigi.org
The last source used is my collection of archives passwords:
http://aluigi.org/papers.htm#info
The scripts and tools selected for the statistics are those that work
on the files of the games, so any tool related to the encryption of
network data or the decryption of content generated by the user
(savegames) or non-game related stuff have not been included.
Evaluated scripts:
about 810 (this document has been originally created in 2013), these
scripts are too many for being listed here.
They cover many types of games of big and small vendors, of any
platform like Xbox, Xbox 360, PC, PS3, PS2, PSP, Wii and others.
They even cover multiple versions of the same file format.
So it's possible to see the script for Crysis 2 and at the same
time the one for games of which I have never heard their name.
Evaluated tools:
rfactorgmdec, rfactordec, wtcced, hldlldec, halomus, rdbigext,
scfdec, umodext, unxwb, uniginex, mmviewer_dumper, osrwdec,
molebox2ext, sdgundamext, tdudec, partydec, ttarchext, asurauncmp,
ssaext, canhelpaczip, sgpdec, uodemoext, egoxext, cauldronext,
bsrdec, motorm4xdec, pyroblazerext, worldshiftext, ssnam67ext,
msmixext, xsoext, ysext, orkdec, ps2ext, vitalext, hedwadext,
borpak, ccftfext, fsbext, nexusext, tnt2zip, cbfext, virtdec,
unvirt, zanzapak, gguardfile, rtwsndext, manext, lin2ed.
Note that many scripts/tools work on multiple games and in some cases
two or more scripts may overlap (different script but same game), so
for realizing these statistics I counted just the scripts/tools and not
each single game they cover just because it's hard if not even
impossible to know what games are covered by a specific engine or if a
file format is used in other games.
Note also that some scripts use more than one algorithm, that's why the
sum of entries is bigger than the number of scripts and tools which
have been evaluated.
All the information have been collected the 13 Apr 2013 with the
manual and automatic checking of each source.
If you are interested in other externals sources (to which I contribute
too) take a look at the ZenHAX forum: https://zenhax.com
Regarding the results showed below, please note that they have been
obtained automatically by using a program over all the scripts
available on my website so some results may be redundant (for example
used multiple times in the same script or maybe two versions of the
same script) and some information may be missing (some scripts are
difficult to parse automatically).
So PLEASE do not take these results too seriously.
			
			
									
						
										
						comes from my personal research available on my personal website.
The main source is composed by the scripts for my QuickBMS program
started in 2009: http://quickbms.aluigi.org
The secondary source are the stand-alone tools available in my Research
page: http://papers.aluigi.org
The last source used is my collection of archives passwords:
http://aluigi.org/papers.htm#info
The scripts and tools selected for the statistics are those that work
on the files of the games, so any tool related to the encryption of
network data or the decryption of content generated by the user
(savegames) or non-game related stuff have not been included.
Evaluated scripts:
about 810 (this document has been originally created in 2013), these
scripts are too many for being listed here.
They cover many types of games of big and small vendors, of any
platform like Xbox, Xbox 360, PC, PS3, PS2, PSP, Wii and others.
They even cover multiple versions of the same file format.
So it's possible to see the script for Crysis 2 and at the same
time the one for games of which I have never heard their name.
Evaluated tools:
rfactorgmdec, rfactordec, wtcced, hldlldec, halomus, rdbigext,
scfdec, umodext, unxwb, uniginex, mmviewer_dumper, osrwdec,
molebox2ext, sdgundamext, tdudec, partydec, ttarchext, asurauncmp,
ssaext, canhelpaczip, sgpdec, uodemoext, egoxext, cauldronext,
bsrdec, motorm4xdec, pyroblazerext, worldshiftext, ssnam67ext,
msmixext, xsoext, ysext, orkdec, ps2ext, vitalext, hedwadext,
borpak, ccftfext, fsbext, nexusext, tnt2zip, cbfext, virtdec,
unvirt, zanzapak, gguardfile, rtwsndext, manext, lin2ed.
Note that many scripts/tools work on multiple games and in some cases
two or more scripts may overlap (different script but same game), so
for realizing these statistics I counted just the scripts/tools and not
each single game they cover just because it's hard if not even
impossible to know what games are covered by a specific engine or if a
file format is used in other games.
Note also that some scripts use more than one algorithm, that's why the
sum of entries is bigger than the number of scripts and tools which
have been evaluated.
All the information have been collected the 13 Apr 2013 with the
manual and automatic checking of each source.
If you are interested in other externals sources (to which I contribute
too) take a look at the ZenHAX forum: https://zenhax.com
Regarding the results showed below, please note that they have been
obtained automatically by using a program over all the scripts
available on my website so some results may be redundant (for example
used multiple times in the same script or maybe two versions of the
same script) and some information may be missing (some scripts are
difficult to parse automatically).
So PLEASE do not take these results too seriously.
- 
				aluigi
- Site Admin
- Posts: 12984
- Joined: Wed Jul 30, 2014 9:32 pm
Re: Overview of game file formats and archives
Results: Encryption and Obfuscation
Results: Compression
Results: Structure
Sorry, not available yet.
			
			
									
						
										
						Code: Select all
+-----------------------------------------------------------+---------+
| no encryption                                             |     676 |
+-----------------------------------------------------------+---------+
| XOR with one byte                                         |      44 |
+-----------------------------------------------------------+---------+
| XOR with key (multiple bytes)                             |      53 |
+-----------------------------------------------------------+---------+
| rotate (add/sub) with one byte                            |       4 |
+-----------------------------------------------------------+---------+
| rotate (add/sub) with key (multiple bytes)                |      12 |
+-----------------------------------------------------------+---------+
| AES                                                       |      18 |
+-----------------------------------------------------------+---------+
| Blowfish                                                  |      10 |
+-----------------------------------------------------------+---------+
| DES/3DES                                                  |       3 |
+-----------------------------------------------------------+---------+
| charset / substitution table                              |       3 |
+-----------------------------------------------------------+---------+
| incremental XOR                                           |       9 |
+-----------------------------------------------------------+---------+
| RC4                                                       |      12 |
+-----------------------------------------------------------+---------+
| TEA/XTEA/XXTEA                                            |       4 |
+-----------------------------------------------------------+---------+
| custom encryption / obfuscation                           |      48 |
+-----------------------------------------------------------+---------+
+-----------------------------------------------------------+---------+
| password protected archives (mainly ZIP, RAR and FSB)     |      53 |
+-----------------------------------------------------------+---------+Results: Compression
Code: Select all
+-----------------------------------------------------------+---------+
| no compression                                            |     500 |
+-----------------------------------------------------------+---------+
| zlib                                                      |     188 |
+-----------------------------------------------------------+---------+
| LZO                                                       |      20 |
+-----------------------------------------------------------+---------+
| deflate                                                   |      36 |
+-----------------------------------------------------------+---------+
| LZMA                                                      |      20 |
+-----------------------------------------------------------+---------+
| Microsoft XMem (LZX)                                      |      27 |
+-----------------------------------------------------------+---------+
| LZSS                                                      |      13 |
+-----------------------------------------------------------+---------+
| gzip                                                      |      10 |
+-----------------------------------------------------------+---------+
| bzip2                                                     |       9 |
+-----------------------------------------------------------+---------+
| custom / proprietary / less known                         |      41 |
+-----------------------------------------------------------+---------+Results: Structure
Sorry, not available yet.
Code: Select all
+-----------------------------------------------------------+---------+
| Index table                                               |       ? |
+-----------------------------------------------------------+---------+
| Sequential files                                          |       ? |
+-----------------------------------------------------------+---------+
| Chunks                                                    |       ? |
+-----------------------------------------------------------+---------+- 
				aluigi
- Site Admin
- Posts: 12984
- Joined: Wed Jul 30, 2014 9:32 pm
Re: Overview of game file formats and archives
Notes and information
During the reverse engineering of these files formats have been noticed
some interesting things.
In some cases the target platform makes the difference due to possible
in-hardware optimizations or the endianess of the CPU.
For example, on Xbox 360 it's quite common to see the Microsoft LZX
algorithm (XMemCompress) in use in place of zlib used for the same games
on other platforms and it's also common to see the archives packed using
the big endianess instead of the little endianess of the PC versions.
Another interesting point is about the version of the file formats
because some of them (like the MAS one for the ISI Gmotor engine) exist
from various years and have been used in many games with the result of
creating many versions very different between each other.
This is caused not only due to the enhancing of the format in the years
but mainly due to desire of customizing the format adopted by different
developers.
Games like those developed by Simbin use common archives (like the MAS
one mentioned above) with an additional layer of encryption that has
been updated game after game trying to make harder the life of the
maintainer of the decryption tools.
This is valid also for the Telltale Games archives in which these
continuous changes lasted various years for various versions.
In other cases a more complex and custom encryption algorithm has been
added after the developers have been aware of the existence of tools
for decrypting and extracting the content of the archives, a recent
example is Farming Simulator 2013 1.4 beta.
The most common compression algorithms are the zlib and deflate ones,
note that zlib is just a deflate stream with a header and a CRC so
basically they are the same thing.
This algorithm is used really in a lot of games and it's also the most
easy to identify because all the job can be performed with programs
like offzip that scan the whole archive finding the zlib data (thanks
to its CRC that avoids false positives) and returning the offset plus
the compressed and uncompressed size that can be used to identify the
index table in the archive.
On the encryption and obfuscation side the most used is without doubts
the classical and simple XOR solution followed by the custom and
proprietary solutions that go from simple obfuscations to the
customizing of known algorithms and even the implementation of
algorithms never seen online.
The password protected archives are a lot but they rely on known file
formats like ZIP, Rar and Fmod FSB so I have preferred to keep them out
from the final considerations.
Why developers opt for this solution? Because there are libraries
already available to handle these known archives and just a simple
password trying to keep modders out.
When a researcher encounters a custom encryption or compression
algorithms there are usually the following ways to solve the puzzle:
As already said, remember that this document is based ONLY on the work
publicly available on my website so doesn't cover other game extractors
written by other people or the scripts for QuickBMS written by users in
the community (that I personally thank for their feedback and support).
			
			
									
						
										
						During the reverse engineering of these files formats have been noticed
some interesting things.
In some cases the target platform makes the difference due to possible
in-hardware optimizations or the endianess of the CPU.
For example, on Xbox 360 it's quite common to see the Microsoft LZX
algorithm (XMemCompress) in use in place of zlib used for the same games
on other platforms and it's also common to see the archives packed using
the big endianess instead of the little endianess of the PC versions.
Another interesting point is about the version of the file formats
because some of them (like the MAS one for the ISI Gmotor engine) exist
from various years and have been used in many games with the result of
creating many versions very different between each other.
This is caused not only due to the enhancing of the format in the years
but mainly due to desire of customizing the format adopted by different
developers.
Games like those developed by Simbin use common archives (like the MAS
one mentioned above) with an additional layer of encryption that has
been updated game after game trying to make harder the life of the
maintainer of the decryption tools.
This is valid also for the Telltale Games archives in which these
continuous changes lasted various years for various versions.
In other cases a more complex and custom encryption algorithm has been
added after the developers have been aware of the existence of tools
for decrypting and extracting the content of the archives, a recent
example is Farming Simulator 2013 1.4 beta.
The most common compression algorithms are the zlib and deflate ones,
note that zlib is just a deflate stream with a header and a CRC so
basically they are the same thing.
This algorithm is used really in a lot of games and it's also the most
easy to identify because all the job can be performed with programs
like offzip that scan the whole archive finding the zlib data (thanks
to its CRC that avoids false positives) and returning the offset plus
the compressed and uncompressed size that can be used to identify the
index table in the archive.
On the encryption and obfuscation side the most used is without doubts
the classical and simple XOR solution followed by the custom and
proprietary solutions that go from simple obfuscations to the
customizing of known algorithms and even the implementation of
algorithms never seen online.
The password protected archives are a lot but they rely on known file
formats like ZIP, Rar and Fmod FSB so I have preferred to keep them out
from the final considerations.
Why developers opt for this solution? Because there are libraries
already available to handle these known archives and just a simple
password trying to keep modders out.
When a researcher encounters a custom encryption or compression
algorithms there are usually the following ways to solve the puzzle:
-  try to reverse engineer the pre-compiled algorithm in a higher level
 language like C or others
-  use a binary to C/pseudo code converted like IDA Pro or REC and then
 fix the resulted code (it may be a painful process)
-  dump the whole function and fix it where necessary, depending by the
 interest in the game and the complexity of the algorithm usually this
 is a very good compromise
-  if you are very lucky probably the game uses an external dll that can
 be used to perform the same tasks from any custom tool
As already said, remember that this document is based ONLY on the work
publicly available on my website so doesn't cover other game extractors
written by other people or the scripts for QuickBMS written by users in
the community (that I personally thank for their feedback and support).
- 
				aluigi
- Site Admin
- Posts: 12984
- Joined: Wed Jul 30, 2014 9:32 pm
Re: Overview of game file formats and archives
Feel free to provide any feedback, your comments and your personal experience with file formats and archives.
			
			
									
						
										
						- 
				ExtractResponseUnit
- Posts: 12
- Joined: Tue Sep 08, 2020 3:31 pm
Re: Overview of game file formats and archives
It's a elaborately documentation of analysis thank you for puplishing it.
Incredible helpful for the older generation among us like me.
			
			
									
						
										
						Incredible helpful for the older generation among us like me.