Alright, just going to bluntly state what I've tried and found out (or rather, mostly didn't) so far. Maybe it's useful for somebody, maybe not. I may have been looking in spots far away from the actual cause of the error.
So what I first had to sort out was that the ROM_REGION 0x20000 sram / ROM_LOAD section in pgm.cpp (
see) is a fallback, and only effectively loaded when ~/.mame/nvram/[romset_id]/sram doesn't exist. Otherwise, the latter file is loaded into sram instead.
That file's size is 128K = 0x20000 (corresponding to the above), but if you fill it with, for instance, FE*0x20000, start the emulation and look into the program space memory, you can see that this is somehow spread out between 0x800000 and 0x8FFFFF (meaning everything in-between is FEFE... and so on).
In
pgm.cpp, L315, the "sram" region is mapped to 0x800000..0x81ffff, and I'm not sure yet where that discrepancy comes from.
That line, conveniently, is marked with the comment "Main Ram", which is, well, true
If you let the emulation run, you'll very quickly see that all sorts of stuff happens here, it's not that just data that one would casually identify as non-volatile (scores etc.) is hold there, as I had previously assumed.
So you can't just e.g. wpset 0x800000,0x100000,rw (or wpset 0x800000,0x20000,rw, for that matter) and call it a day, it will just be hit all the time. Instead, you actually have to get at least some sense about where things are stored (who would've thought).
Therefore, what I came up with next was
1. I deleted ~/.mame/nvram/kovsh/sram, and re-created it, filling it completely with 0xFE bytes only so I could recognize it later
2. booted into WL, waiting until the scoreboard comes up, and then creating a dump of the 0x800000..0x8FFFFF memory region
3. hard-reset, booted into BL, waiting for the scoreboard to come up, leading to the default scores being saved to sram
4. hard-reset again, booted into WL and created a second dump, at the same spot where I did the first
Then I compared the two dumps. There are many, many differences. It's not that BL just saves some score info and maybe some additional miscellaneous stuff. I don't know what it all entails, but it's definitely more than that.
I wrote a Python script that identifies all differing, contiguous regions between the two dumps and generates code that produces a log message if an incoming address lies within one of the regions (saying which region it was etc.). I shoved that code into all the high-level read/write functions in the 68k emulator in MAME, hoping to get a lead on an imagined "saving routine" that way. Will gladly share a MAME fork in case somebody's interested, but it's nothing special, really.
What I can see this way is certainly interesting, but doesn't help, because A) none of the regions are hit when playing through the neuralgic spot in stage 5, and more importantly, B) none of the memory regions are hit when a new score is reached. I guess I must have overlooked something.
At this point I'm not sure if going down this route further will bring about any new insight that's useful for the problem at hand.
In fact if pressed I'd say no, other things increasingly lead me to believe that a nvram/sram interaction is not the cause.
Isn't it curious that the exact debug information (program counter etc.) is different each time, but the hang still occurs in the exact same spot as far as the gameplay is concerned?
Btw, I've looked up the instruction at the PC in MAME that was displayed on my real PGM one time the address error happened.
The PC is 0x13D07E; assuming one can just reference between real HW and MAME 1:1, the instruction is "move.b $80380b.l, D1". How can that be an invalid address, I wonder?
I've taken a few other stabs in the dark. If it's of any use at all, at least nobody else has to try:
- Could reproduce that the game still runs splendidly after completely removing the ARM. This also increased performance massively. The fast forward feature could accelerate to ~130% on my setup before, after to ~500%. I don't think the ARM really ran that many more instructions
I'd attribute that to some inefficiencies in MAME coding. Of course, the freeze still doesn't happen.
- Under- & overclocked the 68k emulator to various frequencies (19.9, 20.1, 40, and 16Mhz, each w/ and w/o the ARM). Everything works fine with the expected effects on slowdown. No freeze. My thought was that maybe real HW could have some slightly divergent oscillators, leading to some obscure illegal interaction. The results of course don't say much on how this would behave on real HW.
- In MAME, the PGM emulation's video refresh rate is hardcoded to 60hz. It set it to 59.17hz, according to this thread, this is the refresh rate output by original HW. No freeze.
- One thing I still want to try is, running the modified kovsh set on Finalburn Neo/-Alpha and see how it's digested there. Checksums don't match so it doesn't start (on MAME the -debug switch circumvents that). So far I didn't have the motivation to dig further on this end.
That we don't get the bug in MAME certainly has to do with inaccurate emulation. Some comments in the 68k code don't exactly raise confidence in it, also the boss explosions don't glitch out in WL on MAME as they do on real hardware.
But then again, WL obviously doesn't hang on the original WL hardware. Maybe arcademodbios has changed something in the WL code? Speaking of which, do we have an update on the AMB dual cart
@twistedsymphony?