From 53c0f1d69f3a6d95d61e34380b655b7cfd29ea9e Mon Sep 17 00:00:00 2001 From: ISSOtm Date: Wed, 6 Oct 2021 18:22:34 +0200 Subject: [PATCH 1/3] Overhaul "Pixel FIFO" article into "Rendering Internals" Also avoid describing SameBoy internals, instead relying on it when otherwise corroborated, or on schematics and/or test ROMs when possible. Restructure the article to describe behavior more than components, especially in a way that is more friendly to someone not knowing what all the components are about. Add a diagram, too, and move the mode timing diagram to the STAT article, where it belongs just as well, but where it will be more visible and thus more useful. --- custom/style.css | 11 +- src/Power_Up_Sequence.md | 2 +- src/Rendering_Internals.md | 408 ++++++++++++++++++++++++++++++++++ src/STAT.md | 2 +- src/SUMMARY.md | 2 +- src/Scrolling.md | 2 +- src/Tile_Maps.md | 7 +- src/imgs/src/ppu_overview.svg | 181 +++++++++++++++ src/pixel_fifo.md | 260 ---------------------- 9 files changed, 605 insertions(+), 270 deletions(-) create mode 100644 src/Rendering_Internals.md create mode 100644 src/imgs/src/ppu_overview.svg delete mode 100644 src/pixel_fifo.md diff --git a/custom/style.css b/custom/style.css index fd8537cb..4ca09fd1 100644 --- a/custom/style.css +++ b/custom/style.css @@ -24,9 +24,9 @@ html { body { font-family: "Inter"; - /* Enable some font features for Inter (https://rsms.me/inter/#features/calt) */ + /* Enable some font features for Inter (https://rsms.me/inter/#features) */ font-feature-settings: "ss01", /* Alternate (Open) digits */ "ss02", - /* Disambiguation gliphs */ "case"; + /* Disambiguation glyphs */ "case", /* No contextual alternatives (e.g. 3x9 → 3×9) */ "calt" 0; /* Case alternates */ letter-spacing: -0.005em; /* equals -0.5% */ @@ -112,6 +112,13 @@ code { margin: 25px 0px 25px 0px; } +/* Classes for custom table styling */ + +table.compact th { + padding: 3px 5px; +} + + /* Global CSS variables */ :root { diff --git a/src/Power_Up_Sequence.md b/src/Power_Up_Sequence.md index f7716922..088fdd3a 100644 --- a/src/Power_Up_Sequence.md +++ b/src/Power_Up_Sequence.md @@ -106,7 +106,7 @@ It is speculated that this may be debug remnants. The boot ROM is responsible for the automatic colorization of monochrome-only games when run on a GBC. -When in DMG compatibility mode, the [CGB palettes](<#LCD Color Palettes (CGB only)>) are still being used: the background uses BG palette 0 (likely because the entire [attribute map](<#BG Map Attributes (CGB Mode only)>) is set to all zeros), and objects use OBJ palette 0 or 1 depending on bit 4 of [their attribute](<#Byte 3 — Attributes/Flags>). +When in DMG compatibility mode, the [CGB palettes](<#LCD Color Palettes (CGB only)>) are still being used: the background uses BG palette 0 (likely because the entire [attribute map](<#BG Map attributes (CGB Mode only)>) is set to all zeros), and objects use OBJ palette 0 or 1 depending on bit 4 of [their attribute](<#Byte 3 — Attributes/Flags>). [`BGP`, `OBP0`, and `OBP1`](<#LCD Monochrome Palettes>) actually index into the CGB palettes instead of the DMG's shades of grey. The boot ROM picks a compatibility palette using an ID computed using the following algorithm: diff --git a/src/Rendering_Internals.md b/src/Rendering_Internals.md new file mode 100644 index 00000000..f7226b88 --- /dev/null +++ b/src/Rendering_Internals.md @@ -0,0 +1,408 @@ +# Rendering Internals + +The Game Boy's PPU is the component responsible for feeding the LCD (= the screen) with pixels. +This document describes how the PPU renders pixels. + +::: tip Terminology + +A "dot" is the unit of time within the PPU. +One "dot" is one 4 MiHz cycle, i.e. a unit of time equal to 1 ∕ 4194304 of a second. +The duration of one "dot" is independent of [CGB double speed](<#FF4D — KEY1 (CGB Mode only): Prepare speed switch>). + +When it is stated that a certain action *lengthens mode 3*, it implies that mode 0 (HBlank) is shortened to make up for the additional time spent in mode 3, as shown in [this diagram](<#STAT modes>). + +::: + +::: warning Timings caution + +Timings here are not tested by a single test ROM (made especially difficult by their resolution being finer than M-cycles). +The information here was largely obtained from an emulator that passes `intr_2_mode0*` from [this test suite](https://github.com/wilbertpol/mooneye-gb/tree/b78dd21f0b6d00513bdeab20f7950e897a0379b3/tests/acceptance/gpu), but not all of it has been verified from e.g. [hardware schematics](https://github.com/furrtek/DMG-CPU-Inside). + +::: + +## Overview + +{{#include imgs/src/ppu_overview.svg:2:}} + +The Game Boy's rendering process, at its core, works using two queues of pixels, also known as the **pixel [FIFO](https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics))s**: one for "background" pixels, one for [OBJ](#Objects) pixels[^real_fifos]. +(The Window largely piggybacks on the BG rendering mechanism, more on that below.) + +Every "dot", one pixel is shifted off of both FIFOs, and one of them is selected for output. +Its corresponding palette is then applied, and the resulting signal sent to the LCD. + +When a FIFO needs to be refilled, it calls on the **Pixel Slice Fetcher** to fetch a slice of 8 pixels, that is, one row from a tile. +The BG FIFO is refilled every time it becomes empty; the OBJ FIFO instead requests a refill when an OBJ should start being drawn on the current scanline. + +Since the Pixel Slice Fetcher is shared by both FIFOs, when both of them need to be refilled at the same time, pixels temporarily stop being output until both have been served. + +The Pixel Slice Fetcher is told which of the 384 tiles to fetch one slice from, as well as which slice of that tile's 8, by the "fetchers"—again, one for the background and window, another for OBJs. +These "fetchers" also directly transmit some metadata to the FIFOs[^real_fifo_refilling], such as the palette, priority, etc. + +[^real_fifos]: +Actually, there are more than 2 FIFOs. +For example, on DMG, there are [two FIFOs for BG pixel indices](https://raw.githubusercontent.com/furrtek/DMG-CPU-Inside/master/Schematics/32_BG_PIXEL_SHIFTER.png), [two for OBJ pixels](https://raw.githubusercontent.com/furrtek/DMG-CPU-Inside/master/Schematics/33_SPRITE_PIXEL_SHIFTER.png), [one for OBJ palette bits](https://raw.githubusercontent.com/furrtek/DMG-CPU-Inside/master/Schematics/34_SPRITE_PALETTE_SHIFTER.png), and [one for OBJ-to-BG priority bits](https://raw.githubusercontent.com/furrtek/DMG-CPU-Inside/master/Schematics/26_BACKGROUND.png). +However, since many are clocked and refilled together, such as the first two or the latter four, it's easier to treat them as a single FIFO that groups all that info under one "pixel". + +[^real_fifo_refilling]: +Again, since the two conceptual FIFOs are really a collection of a bunch of hardware FIFOs, what actually happens is that the "fetchers" directly refill some of these FIFOs, and instruct the Pixel Slice Fetcher to refill the corresponding pixel *data* FIFO. + +## The FIFOs + +Each FIFO can hold up to 8 pixels, the width of one tile. +The BG FIFO and Pixel Slice Fetcher work together to ensure that the former never runs out of pixels; the OBJ FIFO is only refilled when an OBJ is "hit". + +Each pixel in a FIFO is composed of four properties: +- Color index: a value between 0 and 3. +- Palette: + - **On DMG**, this contains the palette bit from [OAM attributes](<#Byte 3 — Attributes/Flags>). Of course, only the OBJ FIFO has this. + - **On CGB**, a value between 0 and 7 (for BG, the palette bits from [BG attributes](<#BG Map attributes (CGB Mode only)>); for OBJ, the palette bits from [OAM attributes](<#Byte 3 — Attributes/Flags>)). +- Source OBJ: only applies to the OBJ FIFO on CGB. This contains the ID of the OBJ the pixel originated from. +- Priority: + - OBJ FIFO: holds the OBJ-to-BG Priority bit from [the OBJ's attributes](<#Byte 3 — Attributes/Flags>). + - BG FIFO, **on CGB only**: holds the OBJ-to-BG priority bit from [the tile's attributes](<#BG Map attributes (CGB Mode only)>). + +Every scanline, the following occurs in order: +1. *Mode 2*: + 1. OAM is scanned for Y positions in range (based on [`LY`](<#FF44 — LY: LCD Y coordinate \[read-only\]>) and [`LCDC`](<#FF40 — LCDC: LCD control>)); the X coordinate is not checked! + 2. The first 10 matches get their X and Y coordinates stored in an "OBJ slot". + The fact that there are 10 such slots is why only 10 objects can be displayed per scanline[^more_than_10]. + + This operation takes 2 dots per OBJ, which, multiplied by 40 OBJs in OAM, gives Mode 2's length of 80 dots. + +2. *Mode 3*: + + During Mode 3, on each dot, the FIFOs are clocked, and one pixel output to the LCD. + The Pixel Slice Fetcher continuously runs in parallel to refill the BG FIFO. + If the OBJ FIFO needs to be refilled, both FIFOs temporarily stop being clocked while the OBJ FIFO "steals" the Pixel Slice Fetcher to get its pixels. + + Additionally, in the middle of the scanline, the window may be triggered; this is described in further detail [below](#). + +3. *Mode 0*: + + Once the last pixel has been output, the PPU releases the VRAM bus, and does nothing while it waits for the scanline to end. + +The PPU embarks both a vertical counter (exposed as [`LY`](<#FF44 — LY: LCD Y coordinate \[read-only\]>)), *and* a horizontal counter, which will be referred to as "`LX`" henceforth. + +[^more_than_10]: +Since these 10 object slots are only refilled during Mode 2, and there appears to be no way to manipulate the PPU mode in the middle of a scanline, it seems that this limitation cannot be worked around. + +## Pixel Slice Fetcher + +The fetcher grabs a row of 8 pixels at a time to be fed to either FIFO. +Data is fetched from VRAM one byte at a time, thus pixels are always fetched 8 at a time[^fetch_size] (hence the FIFOs' width). + +The following steps are executed, in this order: + +1. [Get tile ID](<#Get tile ID>) +1. [Get tile row (low)](<#Get tile row (low)>) +1. [Get tile row (high)](<#Get tile row (high)>) +1. [Push pixels](<#Push pixels>) + +[^fetch_size]: +Well, since pixels are 2bpp, two fetches are necessary—but one still ends up with 8 pixels, since the VRAM data bus is 8-bit. + +### Get tile ID + +This step determines which background/window tile to fetch pixels from. +This step is not executed by the Pixel Slice Fetcher itself, but rather by the active "fetcher": + +#### BG fetcher + +During this step, a tilemap is sampled to determine which tile to fetch. + +The address read depends on whether the BG fetcher is in ["BG mode" or "Window mode"](<#>): + + + + + + + + + + + + + + +
Mode1514131211109876543210
BG10011LCDC bit 3(LY + SCY) ∕ 8(LX + SCX) ∕ 8
WindowLCDC bit 6"Window Y" ∕ 8LX ∕ 8
+ +::: tip Wrapping + +The two additions in the "BG mode" row are carried out in 8 bits, i.e. modulo 256. +Due the division by 8, the modulo is essentially 32, i.e. a tilemap's width. +This is what causes the background to wrap around, both vertically and horizontally. + +::: + +A byte is read from the computed address, and is forwarded to the Pixel Slice Fetcher as a tile ID. +Color models read from the computed address twice: from VRAM bank 0 to get the tile ID, and from VRAM bank 1 to get the attributes.[^banks_parallel] + +::: tip Raster effects + +Interestingly, unlike e.g. the NES' PPU, great care has been taken to ensure that the BG fetcher re-reads as many registers as possible (`SCY`, `LCDC`, etc.). +This may have been insight from the former console, on which [proper "raster splits" are quite tricky](https://www.nesdev.org/wiki/PPU_scrolling#Split_X_scroll) due to a lot of internal caching. + +::: + +This step takes 2 dots, with the VRAM access(es) being performed on the second. + +[^banks_parallel]: +Both appear to be accessed during the same dot, which implies that the PPU can access both VRAM banks at the same time. + +#### OBJ fetcher + +Since the Pixel Slice Fetcher is normally used continuously for the BG FIFO, the OBJ fetcher waits to take control until two conditions are met: +- The Pixel Slice Fetcher is attempting to [push pixels](<#Push pixels>) +- The BG FIFO is not empty + +Once both conditions are fulfilled, the OBJ FIFO takes over, discarding the pixels slices already fetched. +Note that if the BG FIFO is empty, the Pixel Slice Fetcher immediately switches to [Get tile ID](<#Get tile ID>) when refilling it, so the OBJ fetcher will wait for 6 additional dots. + +OAM is then read for the tile ID and attributes (TODO: timings are unknown); if the PPU cannot access OAM (for example due to [OAM DMA](<#OAM DMA Transfer>)), $FF is read. + +If the [OBJ size](<#LCDC.2 — OBJ size>) is 8×16, the bottom bit of the tile ID is overridden depending on whether the upper or lower half of the OBJ was hit. +(See [gate `GEJY`]() in the schematics.) +Additionally, if the OBJ is [flipped vertically](<#Byte 3 — Attributes/Flags>), this override will be inverted (gates `WUKY` and `WAGO`). + +::: warning LCDC bit 1 + +[`LCDC` bit 1](<#LCDC.1 — OBJ enable>) toggles whether OBJs are displayed, but the implementation is very different on DMG and CGB. +On all models, `LCDC` bit 1 controls whether pixels from the OBJ FIFO are selected; however, **on monochrome models**, `LCDC` bit 1 being off also causes the OBJ fetcher to be disabled entirely. + +This differs in two important ways: +- On DMG, clearing `LCDC` bit 1 causes OBJs not to incur any Mode 3 length penalties; on CGB, Mode 3 length is not affected by `LCDC` bit 1. +- Setting the bit back to 1 in the middle of an OBJ being (putatively) displayed will cause it to appear on CGB, but not on DMG, since its pixels aren't in the OBJ FIFO. + +And importantly as well, **this behavior remains in the Color's compatibility mode**, making software behave potentially differently. + +::: + +### Get tile row (low) + +During this step, the first of the slice's two bitplanes is fetched. +The address being read from is computed like this: + + + + + + + + + + + + + + + + + +
Mode1514131211109876543210
BG100See belowTile ID(LY + SCY) mod 80
Window"Window Y" mod 8
OBJ0LY - OBJ Y mod 8
+ +For BG/Window tiles, bit 12 depends on [LCDC bit 4](<#LCDC.4 — BG and Window tile data area>). +If that bit is set ("$8000 mode"), then bit 12 is always 0; otherwise ("$8800 mode"), it is the negation of the tile ID's bit 7. +The full logical formula is thus: `!((LCDC & $10) || (tileID & $80))` (see [gate `VUZA`](https://github.com/furrtek/DMG-CPU-Inside/blob/f0eda633eac24b51a8616ff782225d06fccbd81f/Schematics/25_VRAM_INTERFACE.png) in the schematics). + +On CGB, which VRAM bank the byte is fetched from, is determined from the attributes' bit 3 ([BG](<#BG Map attributes (CGB Mode only)>), [OBJ](<#Byte 3 — Attributes/Flags>)). + +If the tile is flipped vertically (attributes bit 6), bits 1–3 of the address are inverted. +If the tile is flipped horizontally (attributes bit 5), the byte read is flipped around. +(The timing of horizontal flipping has not been verified.) + +This step takes 2 dots. + +### Get tile row (high) + +Exactly the same as [Get tile slice (low)](<#Get tile row (low)>), except the following byte is fetched (i.e. bit 0 of the address is 1 instead of 0). +This step takes 2 dots as well. + +#### Bitplane desync + +Interesting phenomena can be triggered by changing the address' "parameters" between the two bitplane reads, called "bitplane desyncing". +Since VRAM and OAM cannot be modified during Mode 3 (though OAM DMA can change what the PPU reads from OAM), the parameters that can be changed are [`SCY`](<#FF42–FF43 — SCY, SCX: Viewport Y position, X position>) and [`LCDC bit 4`](<#LCDC.4 — BG and Window tile data area>). + +Modifying `SCY` causes the second bitplane (and also the first one, depending on timing) to be read from a different Y offset within the tile than normal. +This does not occur starting with CGB revision D, including AGBs: `SCY` is internally latched during the tilemap read, so both bitplanes are always read correctly. +(Compare [CGB-C](https://github.com/mattcurrie/mealybug-tearoom-tests/blob/70e88fb90b59d19dfbb9c3ac36c64105202bb1f4/expected/CPU%20CGB%20C/m3_scy_change.png) and [CGB-D](https://github.com/mattcurrie/mealybug-tearoom-tests/blob/master/expected/CPU%20CGB%20D/m3_scy_change.png).) + +Modifying `LCDC` bit 4 exhibits much more complex behavior, [explained in this document by mattcurrie](https://github.com/mattcurrie/mealybug-tearoom-tests/blob/70e88fb90b59d19dfbb9c3ac36c64105202bb1f4/the-comprehensive-game-boy-ppu-documentation.md#tile_sel-bit-4). + +### Push pixels + +Once the fetcher reaches this state, it will attempt to push the two bytes it read, plus associated metadata, into the target FIFO on every dot. + +The BG FIFO will only accept pixels when it's empty. + +The OBJ FIFO needs to "merge" the pixels being pushed with the pixels it already contains. +This decision leads to what is conceptually known as ["OBJ-to-OBJ priority"](<#Drawing priority>). +Note that overwriting a pixel entails not only replacing its ID, but also any attached attributes, such as the palette, etc. + +- On DMG and on CGB in Non-CGB Mode, the algorithm is simply that only pixels with an ID of 0 (= transparent pixels) are overwritten. + Since OBJ pixels are inserted as the OBJs are encountered horizontally, pixels inserted earlier, thus from objects with lower X positions, will have priority. +- On CGB in CGB Mode, OBJ pixels also store the ID of the OBJ they originated from, and get overwritten by pixels from OBJs with lower IDs. + +## Selector + +Every time both FIFOs are clocked, the selector decides whether to retain the pixel from the BG or OBJ FIFO. + +The selection follows the following rules: +1. **In CGB Mode**, if [`LCDC` bit 0 (priority enable)](<#CGB Mode: BG and Window master priority>) is reset, pick the BG pixel. +1. **In Non-CGB Mode**, if [`LCDC` bit 0 (BG & Window enable)](<#Non-CGB Mode (DMG, SGB and CGB in compatibility mode): BG and Window display>) is reset, pick the OBJ pixel. +1. If [`LCDC` bit 1 (OBJ enable)](<#LCDC.1 — OBJ enable>) is reset, pick the BG pixel. ⚠️ See [above](<#OBJ fetcher>) for a note about this bit. +1. **In CGB Mode**, if the BG pixel has [its priority bit](<#BG Map attributes (CGB Mode only)>) set, and its ID is not 0, pick the BG pixel. +1. If the OBJ pixel has [its priority bit](<#Byte 3 — Attributes/Flags>) set, and the BG pixel's ID is not 0, pick the BG pixel. +1. If the OBJ pixel is 0, pick the BG pixel; otherwise, pick the OBJ pixel. + +Once a pixel has been selected, the corresponding palette is applied: +- **Non-CGB Mode**: BG pixels use [`BGP`](<#FF47 — BGP (Non-CGB Mode only): BG palette data>); OBJ pixels use `OBP0` if [attributes bit 4](<#Byte 3 — Attributes/Flags>) is reset, and `OBP1` otherwise. +- **CGB Mode**: [Palette](<#LCD Color Palettes (CGB only)>) n is used, where n is bits 0–2 of the pixel's corresponding attributes ([BG](<#BG Map attributes (CGB Mode only)>), [OBJ](<#Byte 3 — Attributes/Flags>)). + BG palettes are used for BG pixels, OBJ palettes are used for OBJ pixels. + +The pixel's 2-bit ID is used to index the 4-color palette, and the resulting color is sent to the LCD. + +## LCD + +Besides the pixels, a few signals are sent to the LCD. + +### ICD2 + +... + +## Mode 3 Operation + +As stated before the pixel FIFO only operates during mode 3 (pixel +transfer). At the beginning of mode 3 both the background and OAM FIFOs +are cleared. + +### The Window + +When rendering the window the background FIFO is cleared and the fetcher +is reset to step 1. When WX is 0 and the SCX & 7 > 0 mode 3 is shortened +by 1 dot. + +When the window has already started rendering there is a bug that occurs +when WX is changed mid-scanline. When the value of WX changes after the +window has started rendering and the new value of WX is reached again, +a pixel with color value of 0 and the lowest priority is pushed onto the +background FIFO. + +### Objects (sprites) + +The following is performed for each object on the current scanline if +LCDC.1 is enabled (this condition is ignored on CGB) and the X coordinate +of the current scanline has a object on it. If those conditions are not +met then object fetching is [aborted](<#Sprite Fetch Abortion>). + +At this point the [fetcher](<#Pixel Slice Fetcher>) is advanced one step +until it's at step 5 or until the background FIFO is not empty. Advancing +the fetcher one step here lengthens mode 3 by 1 dot. This process may +be [aborted](<#Sprite Fetch Abortion>) after the fetcher has advanced a +step. + +When SCX & 7 > 0 and there is a object at X coordinate 0 of the current +scanline then mode 3 is lengthened. The amount of dots this lengthens +mode 3 by is whatever the lower 3 bits of SCX are. After this penalty is +applied object fetching may be aborted. Note that the timing of the +penalty is not confirmed. It may happen before or after waiting for the +fetcher. More research needs to be done. + +After checking for objects at X coordinate 0 the fetcher is advanced two +steps. The first advancement lengthens mode 3 by 1 dot and the second +advancement lengthens mode 3 by 3 dots. After each fetcher advancement +there is a chance for a object fetch abortion to occur. + +The lower address for the row of pixels of the target object tile is now +retrieved and lengthens mode 3 by 1 dot. Once the address is retrieved +this is the last chance for object fetch abortion to occur. Exiting +object fetch lengthens mode 3 by 1 dot. The upper address for the +target object tile is now retrieved and does not shorten mode 3. + +At this point [VRAM Access](<#VRAM Access>) is checked for the lower and +upper addresses for the target object. Before any mixing is done, if the +OAM FIFO doesn't have at least 8 pixels in it then transparent pixels +with the lowest priority are pushed onto the OAM FIFO. Once this is done +each pixel of the target object row is checked. On CGB, horizontal flip +is checked here. If the target object pixel is not white and the pixel in +the OAM FIFO *is* white, or if the pixel in the OAM FIFO has higher +priority than the target object's pixel, then the pixel in the OAM FIFO +is replaced with the target object's properties. + +Now it's time to [render a pixel](<#Pixel Rendering>)! The same process +described in Sprite Fetch Abortion is performed: a pixel is rendered and +the fetcher is advanced one step. This advancement lengthens mode 3 by 1 +dot if the X coordinate of the current scanline is not 160. If the X +coordinate is 160 the PPU stops processing objects (because they won't be +visible). + +Everything in this section is repeated for every object on the current +scanline unless it was decided that fetching should be aborted or the +X coordinate is 160. + +### Pixel Rendering + +This is where the background FIFO and OAM FIFO are mixed. There are +conditions where either a background pixel or a object pixel will have +display priority. + +If there are pixels in the background and OAM FIFOs then a pixel is +popped off each. If the OAM pixel is not transparent and LCDC.1 is +enabled then the OAM pixel's background priority property is used if it's +the same or higher priority as the background pixel's background priority. + +Pixels won't be pushed to the LCD if there is nothing in the background +FIFO or the current pixel is pixel 160 or greater. + +If LCDC.0 is disabled then the background is disabled on DMG and the +background pixel won't have priority on CGB. When the background pixel +is disabled the pixel color value will be 0, otherwise the color value +will be whatever color pixel was popped off the background FIFO. When the +pixel popped off the background FIFO has a color value other than 0 and +it has priority then the object pixel will be discarded. + +At this point, on DMG, the color of the pixel is retrieved from the BGP +register and pushed to the LCD. On CGB when [palette access](<#CGB Palette Access>) +is blocked a black pixel is pushed to the LCD. + +When a object pixel has priority the color value is retrieved from the +popped pixel from the OAM FIFO. On DMG the color for the pixel is +retrieved from either the OBP1 or OBP0 register depending on the pixel's +palette property. If the palette property is 1 then OBP1 is used, +otherwise OBP0 is used. The pixel is then pushed to the LCD. On CGB when +palette access is blocked a black pixel is pushed to the LCD. + +The pixel is then finally pushed to the LCD. + +### CGB Palette Access + +At various times during PPU operation read access to the CGB palette is +blocked and a black pixel pushed to the LCD when rendering pixels: +- LCD turning off +- First HBlank of the frame +- When searching OAM and index 37 is reached +- After switching from mode 2 (oam search) to mode 3 (pixel transfer) +- When entering HBlank (mode 0) and not in double speed mode, blocked 2 dots later no matter what + +At various times during PPU operation read access to the CGB palette is +restored and pixels are pushed to the LCD normally when rendering pixels: +- At the end of mode 2 (oam search) +- For only 2 dots when entering HBlank (mode 0) and in double speed mode + +::: tip Note + +These conditions are checked only when entering STOP mode and the +PPU's access to CGB palettes is always restored upon leaving STOP mode. + +::: + +### Sprite Fetch Abortion + +Sprite fetching may be aborted if LCDC.1 is disabled while the PPU is +fetching an object from OAM. This abortion lengthens mode 3 by the amount +of dots the previous instruction took plus the residual dots left for +the PPU to process. When OAM fetching is aborted a pixel is [rendered](<#Pixel Rendering>), +the [fetcher](<#Pixel Slice Fetcher>) is advanced one step. This advancement +lengthens mode 3 by 1 dot if the current pixel is not 160. If the +current pixel is 160 the PPU stops processing objects because they won't +be visible. diff --git a/src/STAT.md b/src/STAT.md index b3f8bce4..9770c74e 100644 --- a/src/STAT.md +++ b/src/STAT.md @@ -2,7 +2,7 @@ :::tip TERMINOLOGY -A *dot* is the shortest period over which the PPU can output one pixel: is it equivalent to 1 T-cycle on DMG or on CGB Single Speed mode or 2 T-cycles on CGB Double Speed mode. On each dot during mode 3, either the PPU outputs a pixel or the fetcher is stalling the [FIFOs](<#Pixel FIFO>). +A *dot* is the shortest period over which the PPU can output one pixel: is it equivalent to 1 T-cycle on DMG or on CGB Single Speed mode or 2 T-cycles on CGB Double Speed mode. On each dot during mode 3, either the PPU outputs a pixel or the fetcher is stalling the [FIFOs](<#Rendering Internals>). ::: diff --git a/src/SUMMARY.md b/src/SUMMARY.md index e2eb5caa..4f666257 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -23,7 +23,7 @@ - [Scrolling](./Scrolling.md) - [Palettes](./Palettes.md) - [Rendering](./Rendering.md) - - [Pixel FIFO](./pixel_fifo.md) + - [Rendering internals](./Rendering_Internals.md) - [Audio](./Audio.md) - [Audio Registers](./Audio_Registers.md) - [Audio Details](./Audio_details.md) diff --git a/src/Scrolling.md b/src/Scrolling.md index 08cb52c2..98b69207 100644 --- a/src/Scrolling.md +++ b/src/Scrolling.md @@ -46,7 +46,7 @@ scanline. ### Scrolling -The scroll registers are re-read on each [tile fetch](<#Get Tile>), except for the low 3 bits of SCX, which are only read at the beginning of the scanline (for the initial shifting of pixels). +The scroll registers are re-read on each [tile fetch](<#Get tile ID>), except for the low 3 bits of SCX, which are only read at the beginning of the scanline (for the initial shifting of pixels). All models before the CGB-D read the Y coordinate once for each bitplane (so a very precisely timed SCY write allows "desyncing" them), but CGB-D and later use the same Y coordinate for both no matter what. diff --git a/src/Tile_Maps.md b/src/Tile_Maps.md index 4083463e..0960b1d0 100644 --- a/src/Tile_Maps.md +++ b/src/Tile_Maps.md @@ -5,10 +5,9 @@ The Game Boy contains two 32×32 tile maps in VRAM at the memory areas `$9800-$9BFF` and `$9C00-$9FFF`. Any of these maps can be used to display the Background or the Window. -## Tile Indexes +## Tile indices -Each tile map contains the 1-byte indexes of the -tiles to be displayed. +Each tile map contains the 1-byte indices of the tiles to be displayed. Tiles are obtained from the Tile Data Table using either of the two addressing modes (described in [VRAM Tile Data](<#VRAM Tile Data>)), which @@ -17,7 +16,7 @@ can be selected via [the LCDC register](<#FF40 — LCDC: LCD control>). Since one tile has 8×8 pixels, each map holds a 256×256 pixels picture. Only 160×144 of those pixels are displayed on the LCD at any given time. -## BG Map Attributes (CGB Mode only) +## BG Map attributes (CGB Mode only) In CGB Mode, an additional map of 32×32 bytes is stored in VRAM Bank 1 (each byte defines attributes for the corresponding tile-number map diff --git a/src/imgs/src/ppu_overview.svg b/src/imgs/src/ppu_overview.svg new file mode 100644 index 00000000..54aabd8d --- /dev/null +++ b/src/imgs/src/ppu_overview.svg @@ -0,0 +1,181 @@ + + + + + + + + (X, Y) positions + + Attributes + + 8-bit tile ID + + (9-bit tile ID, Y offset) + + Pixel row (2 bytes) + (Some arrows have been + omitted to avoid cluttering + the diagram too much.) + + OAM + + VRAM + + Palettes + + + Tile data + + Tile maps + Attr maps + + PPU + + + + + + + + + + + OBJ slots + + OBJ fetcher + + BG fetcher + + SCX + + SCY + + WX + + WY + + LY + + Pixel slice fetcher + + BG FIFO + + OBJ FIFO + + Selector + + LCD + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/src/pixel_fifo.md b/src/pixel_fifo.md deleted file mode 100644 index 900f8fa4..00000000 --- a/src/pixel_fifo.md +++ /dev/null @@ -1,260 +0,0 @@ -# Pixel FIFO - -## Introduction - -FIFO stands for *First In, First Out*. The first pixel to be pushed to the -FIFO is the first pixel to be popped off. In theory that sounds great, -in practice there are a lot of intricacies. - -There are two pixel FIFOs. One for background pixels and one for object -(sprite) pixels. These two FIFOs are not shared. They are independent -of each other. The two FIFOs are mixed only when popping items. Objects -take priority unless they're transparent (color 0) which will be -explained in detail later. Each FIFO can hold up to 16 pixels. The FIFO -and Pixel Fetcher work together to ensure that the FIFO always contains -at least 8 pixels at any given time, as 8 pixels are required for the -Pixel Rendering operation to take place. Each FIFO is manipulated only -during mode 3 (pixel transfer). - -Each pixel in the FIFO has four properties: -- Color: a value between 0 and 3 -- Palette: on CGB a value between 0 and 7 and on DMG this only applies to objects -- Sprite Priority: on CGB this is the OAM index for the object and on DMG this doesn't exist -- Background Priority: holds the value of the [OBJ-to-BG Priority](<#Object Attribute Memory (OAM)>) bit - -## FIFO Pixel Fetcher - -The fetcher fetches a row of 8 background or window pixels and queues -them up to be mixed with object pixels. The pixel fetcher has 5 steps. -The first four steps take 2 dots each and the fifth step is attempted -every dot until it succeeds. The order of the steps are as follows: - -- Get tile -- Get tile data low -- Get tile data high -- Sleep -- Push - -### Get Tile - -This step determines which background/window tile to fetch pixels from. -By default the tilemap used is the one at $9800 but certain conditions -can change that. - -When LCDC.3 is enabled and the X coordinate of the current scanline is -not inside the window then tilemap $9C00 is used. - -When LCDC.6 is enabled and the X coordinate of the current scanline is -inside the window then tilemap $9C00 is used. - -The fetcher keeps track of which X and Y coordinate of the tile it's on: - -If the current tile is a window tile, the X coordinate for the window -tile is used, otherwise the following formula is used to calculate -the X coordinate: ((SCX / 8) + fetcher's X coordinate) & $1F. Because of -this formula, fetcherX can be between 0 and 31. - -If the current tile is a window tile, the Y coordinate for the window -tile is used, otherwise the following formula is used to calculate -the Y coordinate: (currentScanline + SCY) & 255. Because of this formula, -fetcherY can be between 0 and 255. - -The fetcher's X and Y coordinate can then be used to get the tile from -VRAM. However, if the PPU's access to VRAM is [blocked](<#VRAM Access>) -then the value for the tile is read as $FF. - -CGB can access both tile index and the attributes in the same clock -dot. - -### Get Tile Data Low - -Check LCDC.4 for which tilemap to use. At this step CGB also needs to -check which VRAM bank to use and check if the tile is flipped vertically. -Once the tilemap, VRAM and vertical flip is calculated the tile data -is retrieved from VRAM. However, if the PPU's access to VRAM is -[blocked](<#VRAM Access>) then the tile data is read as $FF. - -The tile data retrieved in this step will be used in the push steps. - -### Get Tile Data High - -Same as Get Tile Data Low except the tile address is incremented by 1. - -The tile data retrieved in this step will be used in the push steps. - -This also pushes a row of background/window pixels to the FIFO. This -extra push is not part of the 8 steps, meaning there's 3 total chances to -push pixels to the background FIFO every time the complete fetcher steps -are performed. - -### Push - -Pushes a row of background/window pixels to the FIFO. Since tiles are 8 -pixels wide, a "row" of pixels is 8 pixels from the tile to be rendered -based on the X and Y coordinates calculated in the previous steps. - -Pixels are only pushed to the background FIFO if it's empty. - -This is where the tile data retrieved in the two Tile Data steps will -come in handy. Depending on if the tile is flipped horizontally the -pixels will be pushed to the background FIFO differently. If the tile -is flipped horizontally the pixels will be pushed LSB first. Otherwise -they will be pushed MSB first. - -### Sleep - -Do nothing. - -### VRAM Access - -At various times during PPU operation read access to VRAM is blocked and -the value read is $FF: -- LCD turning off -- At scanline 0 on CGB when not in double speed mode -- When switching from mode 3 to mode 0 -- On CGB when searching OAM and index 37 is reached - -At various times during PPU operation read access to VRAM is restored: -- At scanline 0 on DMG and CGB when in double speed mode -- On DMG when searching OAM and index 37 is reached -- After switching from mode 2 (oam search) to mode 3 (pixel transfer) - -NOTE: These conditions are checked only when entering STOP mode and the -PPU's access to VRAM is always restored upon leaving STOP mode. - -## Mode 3 Operation - -As stated before the pixel FIFO only operates during mode 3 (pixel -transfer). At the beginning of mode 3 both the background and OAM FIFOs -are cleared. - -### The Window - -When rendering the window the background FIFO is cleared and the fetcher -is reset to step 1. When WX is 0 and the SCX & 7 > 0 mode 3 is shortened -by 1 dot. - -When the window has already started rendering there is a bug that occurs -when WX is changed mid-scanline. When the value of WX changes after the -window has started rendering and the new value of WX is reached again, -a pixel with color value of 0 and the lowest priority is pushed onto the -background FIFO. - -### Sprites - -The following is performed for each object on the current scanline if -LCDC.1 is enabled (this condition is ignored on CGB) and the X coordinate -of the current scanline has an object on it. If those conditions are not -met then object fetching is [canceled](<#Object Fetch Canceling>). - -At this point the [fetcher](<#FIFO Pixel Fetcher>) is advanced one step -until it's at step 5 or until the background FIFO is not empty. Advancing -the fetcher one step here lengthens mode 3 by 1 dot. This process may -be [canceled](<#Object Fetch Canceling>) after the fetcher has advanced a -step. - -When SCX & 7 > 0 and there is an object at X coordinate 0 of the current -scanline then mode 3 is lengthened. The amount of dots this lengthens -mode 3 by is whatever the lower 3 bits of SCX are. After this penalty is -applied object fetching may be canceled. Note that the timing of the -penalty is not confirmed. It may happen before or after waiting for the -fetcher. More research needs to be done. - -After checking for objects at X coordinate 0 the fetcher is advanced two -steps. The first advancement lengthens mode 3 by 1 dot and the second -advancement lengthens mode 3 by 3 dots. After each fetcher advancement -there is a chance for an object fetch cancel to occur. - -The lower address for the row of pixels of the target object tile is now -retrieved and lengthens mode 3 by 1 dot. Once the address is retrieved -this is the last chance for object fetch cancel to occur. Exiting -object fetch lengthens mode 3 by 1 dot. The upper address for the -target object tile is now retrieved and does not shorten mode 3. - -At this point [VRAM Access](<#VRAM Access>) is checked for the lower and -upper addresses for the target object. Before any mixing is done, if the -OAM FIFO doesn't have at least 8 pixels in it then transparent pixels -with the lowest priority are pushed onto the OAM FIFO. Once this is done -each pixel of the target object row is checked. On CGB, horizontal flip -is checked here. If the target object pixel is not white and the pixel in -the OAM FIFO *is* white, or if the pixel in the OAM FIFO has higher -priority than the target object's pixel, then the pixel in the OAM FIFO -is replaced with the target object's properties. - -Now it's time to [render a pixel](<#Pixel Rendering>)! The same process -described in Object Fetch Canceling is performed: a pixel is rendered and -the fetcher is advanced one step. This advancement lengthens mode 3 by 1 -dot if the X coordinate of the current scanline is not 160. If the X -coordinate is 160 the PPU stops processing objects (because they won't be -visible). - -Everything in this section is repeated for every object on the current -scanline unless it was decided that fetching should be canceled or the -X coordinate is 160. - -### Pixel Rendering - -This is where the background FIFO and OAM FIFO are mixed. There are -conditions where either a background pixel or an object pixel will have -display priority. - -If there are pixels in the background and OAM FIFOs then a pixel is -popped off each. If the OAM pixel is not transparent and LCDC.1 is -enabled then the OAM pixel's background priority property is used if it's -the same or higher priority as the background pixel's background priority. - -Pixels won't be pushed to the LCD if there is nothing in the background -FIFO or the current pixel is pixel 160 or greater. - -If LCDC.0 is disabled then the background is disabled on DMG and the -background pixel won't have priority on CGB. When the background pixel -is disabled the pixel color value will be 0, otherwise the color value -will be whatever color pixel was popped off the background FIFO. When the -pixel popped off the background FIFO has a color value other than 0 and -it has priority then the object pixel will be discarded. - -At this point, on DMG, the color of the pixel is retrieved from the BGP -register and pushed to the LCD. On CGB when [palette access](<#CGB Palette Access>) -is blocked a black pixel is pushed to the LCD. - -When an object pixel has priority, the color value is retrieved from the -popped pixel from the OAM FIFO. On DMG the color for the pixel is -retrieved from either the OBP1 or OBP0 register depending on the pixel's -palette property. If the palette property is 1 then OBP1 is used, -otherwise OBP0 is used. The pixel is then pushed to the LCD. On CGB when -palette access is blocked, a black pixel is pushed to the LCD. - -The pixel is then finally pushed to the LCD. - -### CGB Palette Access - -At various times during PPU operation read access to the CGB palette is -blocked and a black pixel pushed to the LCD when rendering pixels: -- LCD turning off -- First HBlank of the frame -- When searching OAM and index 37 is reached -- After switching from mode 2 (oam search) to mode 3 (pixel transfer) -- When entering HBlank (mode 0) and not in double speed mode, blocked 2 dots later no matter what - -At various times during PPU operation read access to the CGB palette is -restored and pixels are pushed to the LCD normally when rendering pixels: -- At the end of mode 2 (oam search) -- For only 2 dots when entering HBlank (mode 0) and in double speed mode - -:::tip Note - -These conditions are checked only when entering STOP mode and the -PPU's access to CGB palettes is always restored upon leaving STOP mode. - -::: - -### Object Fetch Canceling - -Object fetching may be canceled if LCDC.1 is disabled while the PPU is -fetching an object from OAM. This canceling lengthens mode 3 by the amount -of dots the previous instruction took plus the residual dots left for -the PPU to process. When OAM fetching is canceled, a pixel is [rendered](<#Pixel Rendering>), and -the [fetcher](<#FIFO Pixel Fetcher>) is advanced one step. This advancement -lengthens mode 3 by 1 dot if the current pixel is not 160. If the -current pixel is 160 the PPU stops processing objects because they won't -be visible. From 21b12c1a509db69c454d46545bb7a48f36df86df Mon Sep 17 00:00:00 2001 From: Antonio Vivace Date: Tue, 7 Jan 2025 10:50:53 +0100 Subject: [PATCH 2/3] Update src/Rendering_Internals.md Co-authored-by: Estus --- src/Rendering_Internals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/Rendering_Internals.md b/src/Rendering_Internals.md index f7226b88..0f1bdf88 100644 --- a/src/Rendering_Internals.md +++ b/src/Rendering_Internals.md @@ -3,7 +3,7 @@ The Game Boy's PPU is the component responsible for feeding the LCD (= the screen) with pixels. This document describes how the PPU renders pixels. -::: tip Terminology +:::tip Terminology A "dot" is the unit of time within the PPU. One "dot" is one 4 MiHz cycle, i.e. a unit of time equal to 1 ∕ 4194304 of a second. From cf657df5b627bda678bd246b5e257686c27da275 Mon Sep 17 00:00:00 2001 From: Antonio Vivace Date: Tue, 7 Jan 2025 10:52:46 +0100 Subject: [PATCH 3/3] Update src/Rendering_Internals.md Co-authored-by: Quinn <3379314+quinnyo@users.noreply.github.com> --- src/Rendering_Internals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/Rendering_Internals.md b/src/Rendering_Internals.md index 0f1bdf88..63ff474d 100644 --- a/src/Rendering_Internals.md +++ b/src/Rendering_Internals.md @@ -5,7 +5,7 @@ This document describes how the PPU renders pixels. :::tip Terminology -A "dot" is the unit of time within the PPU. +A *dot* is the unit of time within the PPU. One "dot" is one 4 MiHz cycle, i.e. a unit of time equal to 1 ∕ 4194304 of a second. The duration of one "dot" is independent of [CGB double speed](<#FF4D — KEY1 (CGB Mode only): Prepare speed switch>).