libdragon
|
RSP Command queue. More...
Data Structures | |
struct | rspq_overlay_header_t |
The header of the overlay in DMEM. More... | |
struct | rspq_ctx_t |
RSP queue building context. More... | |
Macros | |
#define | rspq_append1(ptr, cmd, arg1) |
Smaller version of rspq_write that writes to an arbitrary pointer. | |
#define | rspq_append2(ptr, cmd, arg1, arg2) |
Smaller version of rspq_write that writes to an arbitrary pointer. | |
#define | rspq_append3(ptr, cmd, arg1, arg2, arg3) |
Smaller version of rspq_write that writes to an arbitrary pointer. | |
Functions | |
DEFINE_RSP_UCODE (rsp_queue,.crash_handler=rspq_crash_handler,.assert_handler=rspq_assert_handler) | |
void | rspq_init (void) |
Initialize the RSPQ library. | |
void | rspq_close (void) |
Shut down the RSPQ library. | |
void * | rspq_overlay_get_state (rsp_ucode_t *overlay_ucode) |
Return a pointer to the overlay state (in RDRAM) | |
rsp_queue_t * | __rspq_get_state (void) |
Return a pointer to a copy of the current RSPQ state. | |
uint32_t | rspq_overlay_register (rsp_ucode_t *overlay_ucode) |
Register a rspq overlay into the RSP queue engine. | |
void | rspq_overlay_register_static (rsp_ucode_t *overlay_ucode, uint32_t overlay_id) |
Register an overlay into the RSP queue engine assigning a static ID to it. | |
void | rspq_overlay_unregister (uint32_t overlay_id) |
Unregister a ucode overlay from the RSP queue engine. | |
void | rspq_next_buffer (void) |
Switch to the next write buffer for the current RSP queue. | |
void | rspq_flush (void) |
Make sure that RSP starts executing up to the last written command. | |
void | rspq_highpri_begin (void) |
Start building a high-priority queue. | |
void | rspq_highpri_end (void) |
Finish building the high-priority queue and close it. | |
void | rspq_highpri_sync (void) |
Wait for the RSP to finish processing all high-priority queues. | |
void | rspq_block_begin (void) |
Begin creating a new block. | |
rspq_block_t * | rspq_block_end (void) |
Finish creating a block. | |
void | rspq_block_free (rspq_block_t *block) |
Free a block that is not needed any more. | |
void | rspq_block_run (rspq_block_t *block) |
Add to the RSP queue a command that runs a block. | |
void | rspq_block_run_rsp (int nesting_level) |
Notify that a RSP command is going to run a block. | |
void | rspq_noop () |
Enqueue a no-op command in the queue. | |
rspq_syncpoint_t | rspq_syncpoint_new (void) |
Create a syncpoint in the queue. | |
bool | rspq_syncpoint_check (rspq_syncpoint_t sync_id) |
Check whether a syncpoint was reached by RSP or not. | |
void | rspq_syncpoint_wait (rspq_syncpoint_t sync_id) |
Wait until a syncpoint is reached by RSP. | |
void | rspq_wait (void) |
Wait until all commands in the queue have been executed by RSP. | |
void | rspq_dma_to_rdram (void *rdram_addr, uint32_t dmem_addr, uint32_t len, bool is_async) |
Enqueue a command to do a DMA transfer from DMEM to RDRAM. | |
void | rspq_dma_to_dmem (uint32_t dmem_addr, void *rdram_addr, uint32_t len, bool is_async) |
Enqueue a command to do a DMA transfer from RDRAM to DMEM. | |
rspq_write_t | rspq_write_begin (uint32_t ovl_id, uint32_t cmd_id, int size) |
Begin writing a new command into the RSP queue. | |
void | rspq_write_arg (rspq_write_t *w, uint32_t value) |
Add one argument to the command being enqueued. | |
void | rspq_write_end (rspq_write_t *w) |
Finish enqueuing a command into the queue. | |
Variables | |
rsp_ucode_t * | rspq_overlay_ucodes [RSPQ_MAX_OVERLAY_COUNT] |
RSPQ overlays. | |
rspq_ctx_t * | rspq_ctx |
Current context. | |
volatile uint32_t * | rspq_cur_pointer |
Copy of the current write pointer (see rspq_ctx_t) | |
volatile uint32_t * | rspq_cur_sentinel |
Copy of the current write sentinel (see rspq_ctx_t) | |
void * | rspq_rdp_dynamic_buffers [2] |
Buffers that hold outgoing RDP commands (generated via RSP). | |
rspq_block_t * | rspq_block |
Pointer to the current block being built, or NULL. | |
volatile int | __rspq_syncpoints_done |
ID of the last syncpoint reached by RSP. | |
RSP Command queue.
This documentation block describes the internal workings of the RSP Queue. This is useful to understand the implementation. For description of the API of the RSP queue, see rspq.h
The RSP queue can be thought in abstract as a single contiguous memory buffer that contains RSP commands. The CPU is the writing part, which appends command to the buffer. The RSP is the reading part, which reads commands and execute them. Both work at the same time on the same buffer, so careful engineering is required to make sure that they do not interfere with each other.
The complexity of this library is trying to achieve this design without any explicit synchronization primitive. The basic design constraint is that, in the standard code path, the CPU should be able to just append a new command in the buffer without talking to the RSP, and the RSP should be able to just read a new command from the buffer without talking to the CPU. Obviously there are side cases where the synchronization is required (eg: if the RSP catches up with the CPU, or if the CPU fins that the buffer is full), but these cases should in general be rare.
To achieve a fully lockless approach, there are specific rules that the CPU has to follow while writing to make sure that the RSP does not get confused and execute invalid or partially-written commands. On the other hand, the RSP must be careful in discerning between a fully-written command and a partially-written command, and at the same time not waste memory bandwidth to continuously "poll" the buffer when it has caught up with the CPU.
The RSP uses the following algorithm to parse the buffer contents. Assume for now that the buffer is linear and unlimited in size.
Given the above algorithm, it is easy to understand how the CPU must behave when filling the buffer:
To manage the queue and implement all the various features, rspq reserves for itself the overlay ID 0x0 to implement internal commands. You can look at the list of commands and their description below. All command IDs are defined with RSPQ_CMD_*
macros.
Internally, double buffering is used to implement the queue. The size of each of the buffers is RSPQ_DRAM_LOWPRI_BUFFER_SIZE. When a buffer is full, the queue engine writes a RSPQ_CMD_JUMP command with the address of the other buffer, to tell the RSP to jump there when it is done.
Moreover, just before the jump, the engine also enqueue a RSPQ_CMD_WRITE_STATUS command that sets the SP_STATUS_SIG_BUFDONE_LOW signal. This is used to keep track when the RSP has finished processing a buffer, so that we know it becomes free again for more commands.
This logic is implemented in rspq_next_buffer.
Blocks are implemented by redirecting rspq_write to a different memory buffer, allocated for the block. The starting size for this buffer is RSPQ_BLOCK_MIN_SIZE. If the buffer becomes full, a new buffer is allocated with double the size (to achieve exponential growth), and it is linked to the previous buffer via a RSPQ_CMD_JUMP. So a block can end up being defined by multiple memory buffers linked via jumps.
Calling a block requires some work because of the nesting calls we want to support. To make the RSP ucode as short as possible, the two internal command dedicated to block calls (RSPQ_CMD_CALL and RSPQ_CMD_RET) do not manage a call stack by themselves, but only allow to save/restore the current queue position from a "save slot", whose index must be provided by the CPU.
Thus, the CPU has to make sure that each CALL opcode saves the position into a save slot which will not be overwritten by nested block calls. To do this, it calculates the "nesting level" of a block at block creation time: the nesting level of a block is defined by the smallest number greater than the nesting levels of all blocks that are called within the block itself. So for instance if a block calls another block whose nesting level is 5, it will get assigned a level of 6. The nesting level is then used as call slot in both all future calls to the block, and by the RSPQ_CMD_RET command placed at the end of the block itself.
The high priority queue is implemented as an alternative couple of buffers, that replace the standard buffers when the high priority mode is activated.
When rspq_highpri_begin is called, the CPU notifies the RSP that it must switch to the highpri queues by setting signal SP_STATUS_SIG_HIGHPRI_REQUESTED. The RSP checks for that signal between each command, and when it sees it, it internally calls RSPQ_CMD_SWAP_BUFFERS. This command loads the highpri queue pointer from a special call slot, saves the current lowpri queue position in another special save slot, and finally clear SP_STATUS_SIG_HIGHPRI_REQUESTED and set SP_STATUS_SIG_HIGHPRI_RUNNING instead.
When the rspq_highpri_end is called, the opposite is done. The CPU writes in the queue a RSPQ_CMD_SWAP_BUFFERS that saves the current highpri pointer into its call slot, recover the previous lowpri position, and turns off SP_STATUS_SIG_HIGHPRI_RUNNING.
Some careful tricks are necessary to allow multiple highpri queues to be pending, see rspq_highpri_begin for details.
There are a few places where the rsqp code is hooked with rdpq to provide for coherent usage of the two peripherals. In particular:
struct rspq_overlay_header_t |
The header of the overlay in DMEM.
This structure is placed at the start of the overlay in DMEM, via the RSPQ_OverlayHeader macros (defined in rsp_queue.inc).
struct rspq_ctx_t |
RSP queue building context.
This structure contains the state of a RSP queue as it is built by the CPU. It is instantiated two times: one for the lwopri queue, and one for the highpri queue. It contains the two buffers used in the double buffering scheme, and some metadata about the queue.
The current write pointer is stored in the "cur" field. The "sentinel" field contains the pointer to the last byte at which a new command can start, before overflowing the buffer (given RSPQ_MAX_COMMAND_SIZE). This is used for efficiently check when it is time to switch to the other buffer: basically, it is sufficient to check whether "cur > sentinel".
The current queue is stored in 3 global pointers: rspq_ctx, rspq_cur_pointer and rspq_cur_sentinel. rspq_cur_pointer and rspq_cur_sentinel are external copies of the "cur" and "sentinel" pointer of the current context, but they are kept as separate global variables for maximum performance of the hottest code path: rspq_write. In fact, it is much faster to access a global 32-bit pointer (via gp-relative offset) than dereferencing a member of a global structure pointer.
rspq_switch_context is called to switch between lowpri and highpri, updating the three global pointers.
When building a block, rspq_ctx is set to NULL, while the other two pointers point inside the block memory.
#define rspq_append1 | ( | ptr, | |
cmd, | |||
arg1 | |||
) |
Smaller version of rspq_write that writes to an arbitrary pointer.
#define rspq_append2 | ( | ptr, | |
cmd, | |||
arg1, | |||
arg2 | |||
) |
Smaller version of rspq_write that writes to an arbitrary pointer.
#define rspq_append3 | ( | ptr, | |
cmd, | |||
arg1, | |||
arg2, | |||
arg3 | |||
) |
Smaller version of rspq_write that writes to an arbitrary pointer.
DEFINE_RSP_UCODE | ( | rsp_queue | , |
. | crash_handler = rspq_crash_handler , |
||
. | assert_handler = rspq_assert_handler |
||
) |
The RSPQ ucode
void rspq_init | ( | void | ) |
Initialize the RSPQ library.
This should be called by the initialization functions of the higher-level libraries using the RSP command queue. It can be safely called multiple times without side effects.
It is not required by applications to call this explicitly in the main function.
void rspq_close | ( | void | ) |
Shut down the RSPQ library.
This is mainly used for testing.
void * rspq_overlay_get_state | ( | rsp_ucode_t * | overlay_ucode | ) |
Return a pointer to the overlay state (in RDRAM)
Overlays can define a section of DMEM as persistent state. This area will be preserved across overlay switching, by reading back into RDRAM the DMEM contents when the overlay is switched away.
This function returns a pointer to the state area in RDRAM (not DMEM). It is meant to modify the state on the CPU side while the overlay is not loaded. The layout of the state and its size should be known to the caller.
To avoid race conditions between overlay state access by CPU and RSP, this function first calls rspq_wait to force a full sync and make sure the RSP is idle. As such, it should be treated as a debugging function.
overlay_ucode | The ucode overlay for which the state pointer will be returned. |
rsp_queue_t * __rspq_get_state | ( | void | ) |
Return a pointer to a copy of the current RSPQ state.
uint32_t rspq_overlay_register | ( | rsp_ucode_t * | overlay_ucode | ) |
Register a rspq overlay into the RSP queue engine.
This function registers a rspq overlay into the queue engine. An overlay is a RSP ucode that has been written to be compatible with the queue engine (see rsp_queue.inc for instructions) and is thus able to execute commands that are enqueued in the queue. An overlay doesn't have a single entry point: it exposes multiple functions bound to different commands, that will be called by the queue engine when the commands are enqueued.
The function returns the overlay ID, which is the ID to use to enqueue commands for this overlay. The overlay ID must be passed to rspq_write when adding new commands. rspq allows up to 16 overlays to be registered simultaneously, as the overlay ID occupies the top 4 bits of each command. The lower 4 bits specify the command ID, so in theory each overlay could offer a maximum of 16 commands. To overcome this limitation, this function will reserve multiple consecutive IDs in case an overlay with more than 16 commands is registered. These additional IDs are silently occupied and never need to be specified explicitly when queueing commands.
For example if an overlay with 32 commands were registered, this function could return ID 0x60, and ID 0x70 would implicitly be reserved as well. To queue the twenty first command of this overlay, you would write rspq_write(ovl_id, 0x14, ...)
, where ovl_id
is the value that was returned by this function.
overlay_ucode | The overlay to register |
void rspq_overlay_register_static | ( | rsp_ucode_t * | overlay_ucode, |
uint32_t | overlay_id | ||
) |
Register an overlay into the RSP queue engine assigning a static ID to it.
This function works similar to rspq_overlay_register, except it will attempt to assign the specified ID to the overlay instead of automatically choosing one. Note that if the ID (or a consecutive IDs) is already used by another overlay, this function will assert, so careful usage is advised.
Assigning a static ID can mostly be useful for debugging purposes.
overlay_ucode | The ucode to register |
overlay_id | The ID to register the overlay with. This ID must be preshifted by 28 (eg: 0x40000000). |
void rspq_overlay_unregister | ( | uint32_t | overlay_id | ) |
Unregister a ucode overlay from the RSP queue engine.
This function removes an overlay that has previously been registered with rspq_overlay_register or rspq_overlay_register_static from the queue engine. After calling this function, the specified overlay ID (and consecutive IDs in case the overlay has more than 16 commands) is no longer valid and must not be used to write new commands into the queue.
Note that when new overlays are registered, the queue engine may recycle IDs from previously unregistered overlays.
overlay_id | The ID of the ucode (as returned by rspq_overlay_register) to unregister. |
void rspq_next_buffer | ( | void | ) |
Switch to the next write buffer for the current RSP queue.
This function is invoked by rspq_write when the current buffer is full, that is, when the write pointer (rspq_cur_pointer) reaches the sentinel (rspq_cur_sentinel). This means that we cannot safely write any more new command in the buffer (the remaining bytes are less than the maximum command size), and thus a new buffer must be configured.
If we're creating a block, we need to allocate a new buffer from the heap. Otherwise, if we're writing into either the lowpri or the highpri queue, we need to switch buffer (double buffering strategy), making sure the other buffer has been already fully executed by the RSP.
void rspq_flush | ( | void | ) |
Make sure that RSP starts executing up to the last written command.
RSP processes the command queue asynchronously as it is being written. If it catches up with the CPU, it halts itself and waits for the CPU to notify that more commands are available. On the contrary, if the RSP lags behind it might keep executing commands as they are written without ever sleeping. So in general, at any given moment the RSP could be crunching commands or sleeping waiting to be notified that more commands are available.
This means that writing a command via rspq_write is not enough to make sure it is executed; depending on timing and batching performed by RSP, it might either be executed automatically or not. rspq_flush makes sure that the RSP will see it and execute it.
This function does not block: it just make sure that the RSP will run the full command queue written until now. If you need to actively wait until the last written command has been executed, use rspq_wait.
It is suggested to call rspq_flush every time a new "batch" of commands has been written. In general, it is not a problem to call it often because it is very very fast (takes only ~20 cycles). For instance, it can be called after every rspq_write without many worries, but if you know that you are going to write a number of subsequent commands in straight line code, you can postpone the call to rspq_flush after the whole sequence has been written.
void rspq_highpri_begin | ( | void | ) |
Start building a high-priority queue.
This function enters a special mode in which a high-priority queue is activated and can be filled with commands. After this function has been called, all commands will be put in the high-priority queue, until rspq_highpri_end is called.
The RSP will start processing the high-priority queue almost instantly (as soon as the current command is done), pausing the normal queue. This will also happen while the high-priority queue is being built, to achieve the lowest possible latency. When the RSP finishes processing the high priority queue (after rspq_highpri_end closes it), it resumes processing the normal queue from the exact point that was left.
The goal of the high-priority queue is to either schedule latency-sensitive commands like audio processing, or to schedule immediate RSP calculations that should be performed right away, just like they were preempting what the RSP is currently doing.
It is possible to create multiple high-priority queues by calling rspq_highpri_begin / rspq_highpri_end multiple times with short delays in-between. The RSP will process them in order. Notice that there is a overhead in doing so, so it might be advisable to keep the high-priority mode active for a longer period if possible. On the other hand, a shorter high-priority queue allows for the RSP to switch back to processing the normal queue before the next one is created.
void rspq_highpri_end | ( | void | ) |
Finish building the high-priority queue and close it.
This function terminates and closes the high-priority queue. After this command is called, all following commands will be added to the normal queue.
Notice that the RSP does not wait for this function to be called: it will start running the high-priority queue as soon as possible, even while it is being built.
void rspq_highpri_sync | ( | void | ) |
Wait for the RSP to finish processing all high-priority queues.
This function will spin-lock waiting for the RSP to finish processing all high-priority queues. It is meant for debugging purposes or for situations in which the high-priority queue is known to be very short and fast to run. Also note that it is not possible to create syncpoints in the high-priority queue.
void rspq_block_begin | ( | void | ) |
Begin creating a new block.
This function begins writing a command block (see rspq_block_t). While a block is being written, all calls to rspq_write will record the commands into the block, without actually scheduling them for execution. Use rspq_block_end to close the block and get a reference to it.
Only one block at a time can be created. Calling rspq_block_begin twice (without any intervening rspq_block_end) will cause an assert.
During block creation, the RSP will keep running as usual and execute commands that have been already added to the queue.
rspq_block_t * rspq_block_end | ( | void | ) |
Finish creating a block.
This function completes a block and returns a reference to it (see rspq_block_t). After this function is called, all subsequent rspq_write will resume working as usual: they will add commands to the queue for immediate RSP execution.
To run the created block, use rspq_block_run.
void rspq_block_free | ( | rspq_block_t * | block | ) |
Free a block that is not needed any more.
After calling this function, the block is invalid and must not be called anymore.
block | The block |
void rspq_block_run | ( | rspq_block_t * | block | ) |
Add to the RSP queue a command that runs a block.
This function runs a block that was previously created via rspq_block_begin and rspq_block_end. It schedules a special command in the queue that will run the block, so that execution of the block will happen in order relative to other commands in the queue.
Blocks can call other blocks. For instance, if a block A has been fully created, it is possible to call rspq_block_run(A)
at any point during the creation of a second block B; this means that B will contain the special command that will call A.
block | The block that must be run |
void rspq_noop | ( | void | ) |
Enqueue a no-op command in the queue.
This function enqueues a command that does nothing. This is mostly useful for debugging purposes.
rspq_syncpoint_t rspq_syncpoint_new | ( | void | ) |
Create a syncpoint in the queue.
This function creates a new "syncpoint" referencing the current position in the queue. It is possible to later check when the syncpoint is reached by the RSP via rspq_syncpoint_check and rspq_syncpoint_wait.
bool rspq_syncpoint_check | ( | rspq_syncpoint_t | sync_id | ) |
Check whether a syncpoint was reached by RSP or not.
This function checks whether a syncpoint was reached. It never blocks. If you need to wait for a syncpoint to be reached, use rspq_syncpoint_wait instead of polling this function.
[in] | sync_id | ID of the syncpoint to check |
void rspq_syncpoint_wait | ( | rspq_syncpoint_t | sync_id | ) |
Wait until a syncpoint is reached by RSP.
This function blocks waiting for the RSP to reach the specified syncpoint. If the syncpoint was already called at the moment of call, the function exits immediately.
[in] | sync_id | ID of the syncpoint to wait for |
void rspq_wait | ( | void | ) |
Wait until all commands in the queue have been executed by RSP.
This function blocks until all commands present in the queue have been executed by the RSP and the RSP is idle. If the queue contained also RDP commands, it also waits for those commands to finish drawing.
This function exists mostly for debugging purposes. Calling this function is not necessary, as the CPU can continue adding commands to the queue while the RSP is running them. If you need to synchronize between RSP and CPU (eg: to access data that was processed by RSP) prefer using rspq_syncpoint_new / rspq_syncpoint_wait which allows for more granular synchronization.
void rspq_dma_to_rdram | ( | void * | rdram_addr, |
uint32_t | dmem_addr, | ||
uint32_t | len, | ||
bool | is_async | ||
) |
Enqueue a command to do a DMA transfer from DMEM to RDRAM.
rdram_addr | The RDRAM address (destination, must be aligned to 8) | |
[in] | dmem_addr | The DMEM address (source, must be aligned to 8) |
[in] | len | Number of bytes to transfer (must be multiple of 8) |
[in] | is_async | If true, the RSP does not wait for DMA completion and processes the next command as the DMA is in progress. If false, the RSP waits until the transfer is finished before processing the next command. |
void rspq_dma_to_dmem | ( | uint32_t | dmem_addr, |
void * | rdram_addr, | ||
uint32_t | len, | ||
bool | is_async | ||
) |
Enqueue a command to do a DMA transfer from RDRAM to DMEM.
[in] | dmem_addr | The DMEM address (destination, must be aligned to 8) |
rdram_addr | The RDRAM address (source, must be aligned to 8) | |
[in] | len | Number of bytes to transfer (must be multiple of 8) |
[in] | is_async | If true, the RSP does not wait for DMA completion and processes the next command as the DMA is in progress. If false, the RSP waits until the transfer is finished before processing the next command. |
|
externinline |
Begin writing a new command into the RSP queue.
This command initiates a sequence to enqueue a new command into the RSP queue. Call this command passing the overlay ID and command ID of the command to create. Then, call rspq_write_arg once per each argument word that composes the command. Finally, call rspq_write_end to finalize and enqueue the command.
A sequence made by rspq_write_begin, rspq_write_arg, rspq_write_end is functionally equivalent to a call to rspq_write, but it allows to create bigger commands, and might better fit some situations where arguments are calculated on the fly. Performance-wise, the code generated by rspq_write_begin + rspq_write_arg + rspq_write_end should be very similar to a single call to rspq_write, though just a bit slower. It is advisable to use rspq_write whenever possible.
Make sure to read the documentation of rspq_write as well for further details.
ovl_id | The overlay ID of the command to enqueue. Notice that this must be a value preshifted by 28, as returned by rspq_overlay_register. |
cmd_id | Index of the command to call, within the overlay. |
size | The size of the commands in 32-bit words |
|
externinline |
Add one argument to the command being enqueued.
This function adds one more argument to the command currently being enqueued. This function must be called after rspq_write_begin; it should be called multiple times (one per argument word), and then rspq_write_end should be called to terminate enqueuing the command.
See also rspq_write for a more straightforward API for command enqueuing.
w | The write cursor (returned by rspq_write_begin) |
value | New 32-bit argument word to add to the command. |
|
externinline |
Finish enqueuing a command into the queue.
This function should be called to terminate a sequence for command enqueuing, after rspq_write_begin and (multiple) calls to rspq_write_arg.
After calling this command, the write cursor cannot be used anymore.
w | The write cursor (returned by rspq_write_begin) |