Chilly Willy said:
Anything I do is MIT unless otherwise stated, or based off an existing base with its own license (obviously, I can't change GPL code to something else).
I've mentioned my licensing in a few places before, but I really should get around to sticking these things in the files. MIT or the new BSD license are fine with me - I want anything I do to be as useful to as many people as possible.
Joe Fenton <jlfenton65@gmail.com> is fine for the author/contact.
Done. I will commit as soon as I actually test it out with some STL examples.
Chilly Willy said:
Which one? I mostly use the standard allocator in libc, or msys - Simple Malloc & Free Functions - by
malkia@digitald.com. I made two versions of msys that are identical, but meant to be used by the separate SH2s - msys for the Master SH2, and ssys for the Slave SH2. That avoids any cache coherency and locking needed to try to share allocators between the two processors.
I'm using the one from:
http://tlsf.baisoku.org/ I haven't tested it, but I'm going to make the necessary changes.
I would say that using locks would be best (that'll add in the framework for threads). Aside from that, why not make the code thread-safe by avoiding the use of global/static variables?
Chilly Willy said:
Make a third allocator from msys called vsys for the VDP1, then just alloc blocks as needed, and reinit the zone to clear everything at once. Maybe FOUR allocators from msys: one for the Master SH2, one for the Slave SH2, one for VDP1, and one for VDP2.
I've thought of altering msys to take a zone for an input argument. Then you could have any number of zones. For VDP1, allocate a block from the main vram zone, then create a new zone using that block for tables and whatnot. Then you can clear everything associated with one zone without affecting the others. Zones inside zones...
Yeah, I was thinking something similar except in a tree structure. You can have subtrees and such of command tables. The tree itself is kept in WORKRAM-H as well as all the command tables (there should be an upper bound on number of command tables in memory).
Priority and order is based on how the tree is to be traversed.
Before the entire tree of command tables is updated. That is, the only the ones that have changed in WORKRAM-H (essentially this tree is a cache of VDP1 VRAM) -- they're sorted properly by the LUT of transfers passed to either of the three SCU DMA levels.
Or they're sorted by using the linked list which tells VDP1 what command table to access next. Chances are, it's going to be a mixture of both.
Example:
So considering adding an X number of command tables in WORKRAM-H at address W. Starting at offset Y in VDP1 VRAM where the first command table is stored, I could traverse the tree and create a LUT of transfers for SCU-DMA (indirect mode):
src: W[X - 1], dst: Y[0], size: 32B
src: W[X - 2], dst: Y[1], size: 32B
src: W[0], dst: Y[2], size: 32B
src: W[1], dst: Y[3], size: 32B
src: W[4], dst: Y[4], size: 32B
src: W[5], dst: Y[5], size: 32B
src: W[6], dst: Y[6], size: 32B
And so on.
Now what if I want to update command table W[5] and delete W[6]? Update them in WORKRAM-H by writing a bit in W[6] that tells VDP1 to
skip the command table. As for W[5], update whatever.
Then do another SCU-DMA transfer of only two transfers:
src: W[5], dst: Y[5], size: 32B
src: W[6], dst: Y[6], size: 32B
I'm going to have to keep track which have changed.
As for allocating memory, yeah that should be done by the standard malloc/free. Textures on the other hand could be done through garbage collection. For example, if I delete W[6] and it used a texture then decrement the ref. counter. Then put it back on the free/used list. I could use that standard allocator just for this purpose! Speed it up since the smallest we'll go is for a 8x1 4-bit texture (padded to be a 8x2 4-bit texture). And this code is in the public domain or MIT/BSD licensed?
What's difficult about texture allocation is the fact that we can allow texture sizes in the Y direction to be not in powers of 2. So we're going to waste some VDP1 VRAM by padding everything.
What do you think? Do you think this is viable for 3D?