What's new

l_oliveira

Professional
Joined
Jun 26, 2015
Messages
546
Reaction score
605
Location
Brazil
On this thread I'll try to explain the approach I use when decrypting games manually.
The target, an American revision of Street Fighter Zero/Alpha3.

I am starting with the encrypted romset, decryption keys (mame can execute the game) and mame debugger.

The first step is discern is the size of the encrypted memory range:

MAME source code is a good start for this. So here's the relevant line:
src/src/mame/machine/cps2crpt.c


{ "sfa3u", { 0xe7bbf0e5,0x67943248 }, 0x100000 }, // 0C80 1C62 F5A8 cmpi.l #$1C62F5A8,D0


So we have:


Keypair -> 0xe7bbf0e5,0x67943248
Encryption range -> 0x100000

Watchdog kick instruction -> // 0C80 1C62 F5A8 cmpi.l #$1C62F5A8,D0 (commented, put on SRC just for documentation reasons)
So we now know the encryption range is 0x100000. Which means 1MB. That's the first two chips.

So now we go to the second step, which is obtain the decrypted code for analysis:

Load the game on MAME debugger and run the following commands:

save sfa3u.bin,0,100000,0
This saves a 1MB file with the contents of the first two ROMs


dasm sfa3ud.asm,0,100000,1,0
This saves a text file with ASM listing for the contents of the first two ROMs in decrypted form. It's mangled in the sense that anything which isn't code are corrupted.
 
On this post I'll further describe the process:


Opening the generated asm file (18MB+) on a text editor (notepad++ can deal with such gynormous files just fine) we see this,
at the point the ROM boots from:

00091A: 33FC 7000 0040 0000 move.w #$7000, $400000.l
000922: 33FC 0000 0080 40A0 move.w #$0, $8040a0.l
00092A: 33FC 807D 0040 0002 move.w #$807d, $400002.l
000932: 33FC 2461 0040 0004 move.w #$2461, $400004.l
00093A: 33FC 0000 0040 0006 move.w #$0, $400006.l
000942: 33FC 0040 0040 0008 move.w #$40, $400008.l
00094A: 33FC 0010 0040 000A move.w #$10, $40000a.l
000952: 33FC 0F00 0080 4040 move.w #$f00, $804040.l
00095A: 0C80 1C62 F5A8 cmpi.l #$1c62f5a8, D0
000960: 49FA 0006 lea ($6,PC), A4; ($968)
000964: 6000 04AE bra $e14

Because one of the parameters on the disassemble command was set to 1, the instruction opcodes were stored on the listing.
A Python script will used to parse the text file, and convert/copy the stuff in bold to a binary file.

The resulting file will have exactly 1MB and will contain all of the game program instructions in their decrypted form. But the data is mangled.
On another forum Razoola mentioned something called "mask". The said "mask" is a file filed with a fixed value which are then fed to a modified CPS2 emulator running the target game. The emulator will poke holes on the file leaving marks of where instructions and data are being read from within the address space.

Because this is about manual decryption, no mask is used during the actual decryption work.
A interesting thing to notice is keep in mind the processor endian. Motorola MC68000 is big endian.

Next step: "uncorrupt" the vectors.


The Motorola 68000 has this "vectors" concept where it has a fixed region at the top of the memory range with special addresses to be used on specific situations.
Each vector is 32 bits long (the width of the address registers). Because the original 68000 has only 24 bits of addressing lines, the highest byte is ignored when instructions interact with the address bus physical outputs.

Encrypted vectors:
Stack Pointer initial value 0x000000-0x00003
Program counter initial value 0x000004-0x00007

So, looking at the decrypted ROM we have:

Diff:
Offset      0  1  2  3  4  5  6  7   8  9  A  B  C  D  E  F



00000000   00 FF 80 00 00 00 09 1A  C2 4A BF 40 A5 18 0C FE   .ÿ€.....ÂJ¿@¥..þ
00000010   FE FE B8 03 6E 03 05 A4  84 EE 2E 18 10 A3 A5 49   þþ¸.n..¤„î...£¥I
00000020   B8 A8 C1 F1 FE 90 BB A0  10 7E 7E C9 AE BE 44 37   ¸¨Áñþ» .~~É®¾D7
00000030   32 9C 52 17 A2 08 B6 BF  EE D5 38 6A 17 83 E4 80   2œR.¢.¶¿îÕ8j.ƒä€
00000040   0A 16 30 55 41 EB 1E 1E  7A 6B 21 AA F5 C2 C1 5F   ..0UAë..zk!ªõÂÁ_
00000050   64 6B 5B 09 A3 09 78 35  0F B8 07 79 DC 85 01 25   dk[.£.x5.¸.yÜ….%
00000060   09 48 65 40 E1 1F CE 4E  2F 99 8A 1C 09 CD 75 88   .He@á.ÎN/™Š..Íuˆ
00000070   72 98 2E 88 0C E6 00 76  A9 CA C2 F9 B6 F7 9D 07   r˜.ˆ.æ.v©ÊÂù¶÷.
00000080   C8 19 84 5C 29 6D 38 6B  D5 F5 26 9C A1 D4 2B F2   È.„\)m8kÕõ&œ¡Ô+ò
00000090   55 C0 75 9C D8 55 50 C9  18 27 66 7E 8C DF 25 4D   UÀuœØUPÉ.'f~Œß%M
000000A0   5F B3 25 E8 D2 C1 AD 78  73 D7 EB 5B D8 16 61 A3   _³%èÒÁ­xs×ë[Ø.a£
000000B0   EF 2F 2E 10 DC 80 78 E2  51 13 66 DA 0F 0C 55 61   ï/..Ü€xâQ.fÚ..Ua
000000C0   33 FC 00 00 00 FF 00 00  4F EF 00 08 60 00 00 82   3ü...ÿ..Oï..`..‚
000000D0   33 FC 00 01 00 FF 00 00  4F EF 00 08 60 72 00 7C   3ü...ÿ..Oï..`r.|

The stuff at 0x0-0x07 are the encrypted vectors, which are respectively 0x00FF8000 and 0x0000091A (remember the asm listing I just posted is at 0x91A).
All the stuff from 0x08 up to 0xBF is mangled by the decryption. Normal vectors are read as non encrypted indirect addresses.

This is how the encrypted ROM listing for that very same part looks like:

Diff:
Offset      0  1  2  3  4  5  6  7   8  9  A  B  C  D  E  F



00000000   0E 00 0A 6E D1 5B 39 00  00 00 00 C0 00 00 00 D0   ...nÑ[9....À...Ð
00000010   00 00 00 DE 00 00 00 EC  00 00 00 FA 00 00 01 08   ...Þ...ì...ú....
00000020   00 00 01 16 00 00 01 24  00 00 01 32 00 00 01 40   .......$...2...@
00000030   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ................
00000040   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ................
00000050   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ................
00000060   00 00 01 4E 00 00 01 4E  00 00 11 82 00 00 01 4E   ...N...N...‚...N
00000070   00 00 01 4E 00 00 01 4E  00 00 01 4E 00 00 01 4E   ...N...N...N...N
00000080   00 00 12 EE 00 00 13 0E  00 00 13 32 00 00 13 64   ...î.......2...d
00000090   00 00 13 8E 00 00 13 B4  00 00 13 D0 00 00 13 E4   ...Ž...´...Ð...ä
000000A0   00 00 01 4E 00 00 01 4E  00 00 01 4E 00 00 01 4E   ...N...N...N...N
000000B0   00 00 01 4E 00 00 01 4E  00 00 01 4E 00 00 01 4E   ...N...N...N...N
000000C0   D1 FE 10 66 9F E9 BC 56  12 2B 35 5D 25 D0 AA CA   Ñþ.fŸé¼V.+5]%ЪÊ
000000D0   F4 B9 C9 E3 B9 1D E4 7C  34 40 90 71 3D 5F 2F 0C   ô¹Éã¹.ä|4@q=_/.
Notice how it's impossible to discern what the stuff at 0x0-0x07 and the stuff at 0xC0 to 0xDF are.

To fix the vectors properly I copy the stuff from 0x08 until 0xBF onto the decrypted file.

The fixed vectors look like this:

Diff:
Offset      0  1  2  3  4  5  6  7   8  9  A  B  C  D  E  F



00000000   00 FF 80 00 00 00 09 1A  00 00 00 C0 00 00 00 D0   .ÿ€........À...Ð
00000010   00 00 00 DE 00 00 00 EC  00 00 00 FA 00 00 01 08   ...Þ...ì...ú....
00000020   00 00 01 16 00 00 01 24  00 00 01 32 00 00 01 40   .......$...2...@
00000030   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ................
00000040   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ................
00000050   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ................
00000060   00 00 01 4E 00 00 01 4E  00 00 11 82 00 00 01 4E   ...N...N...‚...N
00000070   00 00 01 4E 00 00 01 4E  00 00 01 4E 00 00 01 4E   ...N...N...N...N
00000080   00 00 12 EE 00 00 13 0E  00 00 13 32 00 00 13 64   ...î.......2...d
00000090   00 00 13 8E 00 00 13 B4  00 00 13 D0 00 00 13 E4   ...Ž...´...Ð...ä
000000A0   00 00 01 4E 00 00 01 4E  00 00 01 4E 00 00 01 4E   ...N...N...N...N
000000B0   00 00 01 4E 00 00 01 4E  00 00 01 4E 00 00 01 4E   ...N...N...N...N
000000C0   33 FC 00 00 00 FF 00 00  4F EF 00 08 60 00 00 82   3ü...ÿ..Oï..`..‚
000000D0   33 FC 00 01 00 FF 00 00  4F EF 00 08 60 72 00 7C   3ü...ÿ..Oï..`r.|
Now with this the basic preparations to load the file on IDA have been made.
 
After the first step with the MC68000 vectors are done, it's time to load the file on IDA and start analysis.
Since the goal is simply discern what is 68000 instructions or encrypted data from what is non encrypted data, I will only load the memory region which is affected by encryption on IDA.

So, before we start, let's see what is relevant for this on the CPS2 MC68000 memory map:

0x000000-0x3FFFFF: 4MB, PRG ROM, the target of this process
0xFF0000-0xFFFFFF: 128KB, half of the onboard RAM for the 68000 CPU, the area used as CPU work memory (2x 128KB 8bit PSRAM chips)

On Street Fighter Alpha/Zero 3 we already know only 1MB of that 4MB window is encrypted so that's the size of the file we will load on IDA.



Loading a file for analysis on IDA is braindead simple. You select it, tell IDA it's a plain headerless binary file and CPU as Motorola 68000 at the open file dialog.

Next dialog will request information on the memory layout, I usually put this:

"Create RAM section" check
"RAM Start address" 0xff0000
"RAM Size" 0x100000


"Create ROM section" check
"ROM Start address" (this will be automatically filled by IDA)
"ROM Size" (this will be automatically filled by IDA)

Hit ok

Now you have the ROM loaded in IDA, ready for analysis.
Next step is find the entry ponts.


One should see this at the vectors region:

ROM:00000000 ; Segment type: Pure code
ROM:00000000 ; segment "ROM"
ROM:00000000 dc.b 0
ROM:00000001 dc.b $FF
ROM:00000002 dc.b $80 ; €
ROM:00000003 dc.b 0
ROM:00000004 dc.b 0
ROM:00000005 dc.b 0
ROM:00000006 dc.b 9
ROM:00000007 dc.b $1A


These values are actually the initial values for the stack pointer and initial PC address. Hitting "D" on them transform them to data type, which cycles around byte, word and long word. The vectors are long words.

The vectors for this particular game look like this after set up as longwords:


ROM:00000000 dc.l unk_FF8000
ROM:00000004 dc.l loc_91A
ROM:00000008 dc.l loc_C0
ROM:0000000C dc.l loc_D0
ROM:00000010 dc.l loc_DE
ROM:00000014 dc.l loc_EC
ROM:00000018 dc.l loc_FA
ROM:0000001C dc.l loc_108
ROM:00000020 dc.l loc_116
ROM:00000024 dc.l loc_124
ROM:00000028 dc.l loc_132
ROM:0000002C dc.l loc_140
ROM:00000030 dc.b 0
ROM:00000031 dc.b 0
ROM:00000032 dc.b 0
ROM:00000033 dc.b 0
ROM:00000034 dc.b 0
ROM:00000035 dc.b 0
ROM:00000036 dc.b 0
ROM:00000037 dc.b 0
ROM:00000038 dc.b 0
ROM:00000039 dc.b 0
ROM:0000003A dc.b 0
ROM:0000003B dc.b 0
ROM:0000003C dc.b 0
ROM:0000003D dc.b 0
ROM:0000003E dc.b 0
ROM:0000003F dc.b 0
ROM:00000040 dc.b 0
ROM:00000041 dc.b 0
ROM:00000042 dc.b 0
ROM:00000043 dc.b 0
ROM:00000044 dc.b 0
ROM:00000045 dc.b 0
ROM:00000046 dc.b 0
ROM:00000047 dc.b 0
ROM:00000048 dc.b 0
ROM:00000049 dc.b 0
ROM:0000004A dc.b 0
ROM:0000004B dc.b 0
ROM:0000004C dc.b 0
ROM:0000004D dc.b 0
ROM:0000004E dc.b 0
ROM:0000004F dc.b 0
ROM:00000050 dc.b 0
ROM:00000051 dc.b 0
ROM:00000052 dc.b 0
ROM:00000053 dc.b 0
ROM:00000054 dc.b 0
ROM:00000055 dc.b 0
ROM:00000056 dc.b 0
ROM:00000057 dc.b 0
ROM:00000058 dc.b 0
ROM:00000059 dc.b 0
ROM:0000005A dc.b 0
ROM:0000005B dc.b 0
ROM:0000005C dc.b 0
ROM:0000005D dc.b 0
ROM:0000005E dc.b 0
ROM:0000005F dc.b 0
ROM:00000060 dc.l locret_14E
ROM:00000064 dc.l locret_14E
ROM:00000068 dc.l loc_1182
ROM:0000006C dc.l locret_14E
ROM:00000070 dc.l locret_14E
ROM:00000074 dc.l locret_14E
ROM:00000078 dc.l locret_14E
ROM:0000007C dc.l locret_14E
ROM:00000080 dc.l loc_12EE
ROM:00000084 dc.l loc_130E
ROM:00000088 dc.l loc_1332
ROM:0000008C dc.l loc_1364
ROM:00000090 dc.l loc_138E
ROM:00000094 dc.l loc_13B4
ROM:00000098 dc.l loc_13D0
ROM:0000009C dc.l loc_13E4
ROM:000000A0 dc.l locret_14E
ROM:000000A4 dc.l locret_14E
ROM:000000A8 dc.l locret_14E
ROM:000000AC dc.l locret_14E
ROM:000000B0 dc.l locret_14E
ROM:000000B4 dc.l locret_14E
ROM:000000B8 dc.l locret_14E
ROM:000000BC dc.l locret_14E

Now, we kick off analysis by double clicking the PC initial address (second entry of vector table, at offset 0x000004)
One would see this there:

ROM:0000091A unk_91A: dc.b $33 ; 3
ROM:0000091B dc.b $FC ;
ROM:0000091C dc.b $70 ; p
ROM:0000091D dc.b 0
ROM:0000091E dc.b 0
ROM:0000091F dc.b $40 ; @
ROM:00000920 dc.b 0
ROM:00000921 dc.b 0
ROM:00000922 dc.b $33 ; 3
ROM:00000923 dc.b $FC ;
ROM:00000924 dc.b 0
ROM:00000925 dc.b 0
ROM:00000926 dc.b 0
ROM:00000927 dc.b $80 ; €

Hitting "c" while the position 0x00091A is highlighted causes IDA to analize the opcodes and assemble them as ASM instructions:

ROM:0000091A loc_91A: ; CODE XREF: ROM:00000620j
ROM:0000091A ; DATA XREF: ROM:00000004o
ROM:0000091A move.w #$7000,($400000).l
ROM:00000922 move.w #0,($8040A0).l
ROM:0000092A move.w #$807D,($400002).l
ROM:00000932 move.w #$2461,($400004).l
ROM:0000093A move.w #0,($400006).l
ROM:00000942 move.w #$40,($400008).l ; '@'
ROM:0000094A move.w #$10,($40000A).l
ROM:00000952 move.w #$F00,($804040).l
ROM:0000095A cmpi.l #$1C62F5A8,d0
ROM:00000960 lea loc_968,a4
ROM:00000964 bra.w loc_E14

And that's all for today.
 
Any time to finish this off, leo? Would be neat to have the end-to-end knowledge to tinker with this over the xmas break :D
 
Any more stuff besides what I posted on the thread is knowledge specific to the 68000 CPU (memorize mnemonics and hexadecimal values for each instruction) or IDA specific knowledge.

I don't have any idea what else to put here, so why you don't help out a tad too and name something you would like to know? ;)
 
Any more stuff besides what I posted on the thread is knowledge specific to the 68000 CPU (memorize mnemonics and hexadecimal values for each instruction) or IDA specific knowledge.

I don't have any idea what else to put here, so why you don't help out a tad too and name something you would like to know? ;)
Hrm, I guess some more about what you're looking for/doing within IDA (as you say, IDA specific knowledge, but I've not used IDA before), when/how you test in an EMU, fixing problems as you find them?

You've sort of introduced loading it up into IDA, and some of the terms used, but not so much what you actually do in there :)
 
Well If you pick something to disassemble and then try it yourself (for example a Mega Drive ROM) you can get what I mean. The process of disassembling a program is pretty much identical regardless of it being encrypted or not.

It's something that is out of scope from the actual process of decryption. And even then I'm not sure if I am good enough at it for trying to teach people how to use the disassembler. That's the main reason why I am not talking much about it.

I suggest you look Charles McDonald's page on how to decrypt SEGA System 16/18 games, it has another approach to this stuff and that also take in consideration hardware specific details related to that encryption system and platform.

http://techno-junk.org/ click "FD1094 Game Conversion"
 
Okay, this might sound like a REALLY stupid question as I'm just getting into this whole decryption thing. (Need to while I wait on some things for my arcade cabinet to come in, and I get the replacement flyback installed on my monitor. Heh).

I see that in the first post you indicate how we can dump a .bin file from the debugger. Is this dump from the debugger encrypted or decrypted? I would think that with MAME running the game, it would need to fully decrypt the program roms in order to do so. Therefore, when you dump that .bin file in the first step, aren't you having MAME do all the decryption for you in the proper spaces? (E.g. not decrypting data that is already decrypted?).

So why can't we just put these dumps into a file and play them on a multi? I'm just having trouble understanding the need to use the disassembler. Or, is it because MAME knows what to decrypt and what not to decrypt, and the dump it provides is decrypted program code and erroneously-decrypted data?

Am i guessing that we need to go through the disassembler process because some of the "erroneously decrypted" data matches up with properly decrypted program code so we can't just compare the files and copy the non-decrypted data in the original roms into the decrypted program code in the decrypted roms?

As i said, i have some time available to me and while I am not an expert, I might possibly pick up some ideas and be able to help out in the future.
 
Scientifically, the one doing the actual decryption is MAME.

The process I describe here is make the mangled decryption digest into something that is usable by the real hardware.

That involves:

-Use MAME debugger to obtain the decrypted program
-Convert the ASM listing/dump of decrypted data from MAME into raw binary data which can be then studied/analyzed
-Discern data from code (discern stuff which is affected by the encryption from stuff that isn't)
-Split the proper decrypted code from the mangled data and replace said mangled data with the non mangled data from the original encrypted ROM.
 
Okay. Thanks Leo. I'm always interested in learning new things, and this is definitely new to me. :)
 
I've never read this thread before.
I have a completely different approach.
I simply run the game with a modified trace function in MAME to output decrypted opcodes and arguments in hexadecimal. I leave the attract mode running for a full cycle and then play the game a bit with different characters in different levels.
I also enter the test mode and run every test.
This will generate a huge file (several Gb).
Will a small script I made I then delete all duplicate lines. The file I obtain is less than 1Mb.
With a second script I identify "holes" in the file and generate breakpoint commands for MAME. Generally with this method I expect to find ~500 "holes".
I run MAME again with all the breakpoints and everytime it stops I check what instruction it is. If it's a conditional branch I force MAME to take the one not decrypted yet. If not then I mark the "hole" as "normal" (i.e. contains data not instructions).
Once all breakpoints have been reached I run again the script to find new "holes" as new decrypted parts may also contain branches, etc. Usually I'm down to less than 200 "holes" the second time.
Rince and repeat few times and you get a fully decrypted game.
It's quite long I admit (between 4 and 8h depending of the game complexity).

All the details aren't there but you get the idea.
 
Last edited by a moderator:
Yes, but your method won't decrypt code that never executes. That is fine for just making the games play but it's not good for people who want to do things like enable debugging functions or dig unused code from the game engine.
 
Yes, but your method won't decrypt code that never executes. That is fine for just making the games play but it's not good for people who want to do things like enable debugging functions or dig unused code from the game engine.
It does. Code that never executes is called somewhere somehow so it's picked up by the "hole" detection. Plus at the end of the process I just check each remaining part to see if it's unused code or not. This is why my M92 decryptions have different CRCs than any other ones: because even unused subroutines are decrypted.
It's a bit stupid in the sense they are never called anywhere and are probably leftovers from development but I'm a bit perfectionist.
Also often you can see a pattern for data area (i.e. always starting and/or ending with the same values).

P.S.: I'm not saying my method is better, just it does work for M92/M107, System 16/18/24, etc. with no bug reported so far.
 
Last edited by a moderator:
Well, your method would be a excellent addition to my tools for when I have games with decryption errors. Would make finding the errors really trivial instead of the current frustration fest it is currently.

Would you mind sharing the MAME patches and scripts for tracking? I only have interest on MC68000 CPU related files, of course.
 
I'd like to see that trace .modification as well. I used the usual trace command from MAME that generates a huge file.

If you check the cps3 section you'll see that I ended up doing a robot that checks all possible branches and takes them all. You just give it a starting point and he will recognize what's regular instruction and what's branch and will analyse them all.

I published the code of the robot iirc.
 
Hi @l_oliveira, thank you so much for sharing this valuable information. I know it's been a while since this thread was posted, but this still seems to be the best source I can find on decrypting CPS2 games and doing decomp. I'm keen to learn how to do this and try to decrypt and load ssf2xj (not ssf2xjr1, which is on https://cps2.avalaunch.net/) into Ghidra for further understanding. I'm still learning Ghidra, but your tips are on MC68000 vectors being part decrypted and part encrypted was valuable. I've used radare2 to decrypt the encrypted rom address range as reported from MAME, but as you mention, the data is mangled because only the opcodes are encrypted. I'm still not fully sure how you have worked out what is data and what is not. I see this:

On another forum Razoola mentioned something called "mask". The said "mask" is a file filed with a fixed value which are then fed to a modified CPS2 emulator running the target game. The emulator will poke holes on the file leaving marks of where instructions and data are being read from within the address space.

But I don't understand how to create or obtain this mask. Are you able to help with mangling the data? Any scripts or process would be greatly received.

Also, if you have more tips for using Ghidra for this type of cps2 decomp that would be greatly appreciated. Maybe you are only using IDA pro, so no matter, but Ghidra seems to be pretty popular these days. So far, I've loaded a concatenated and byte-swapped version of https://cps2.avalaunch.net/downloads/ssf2xjdi.zip into the Ghidra, and manually labelled the first 32 vector and referenced functions. I've then done an auto-analyze with all of the default options. I'm not sure if all of the program code has been found or not yet.

ghidra.png


All the best
Jeremy
 
Back
Top