(This is believed to be correct for MX30LF4G28AD storage. We have records of devices being shipped with MT29F4G08ABAFA chips. Other chips might be organized in a different way, although that's not too likely.)

Scrambling

Specific bit patterns can cause interference between NAND cells, which leads to decreased read reliability. To minimize that effect data gets stored randomized (scrambled) on chip by applying a pseudorandom XOR mask to each page.

These XOR bit patterns are sections of the PRBS-15 sequence (X^15 + X^14 + 1 or 0xC001/49153) with seed values commonly in use in the industry (0x576A, 0x05E8, 0x629D,... or often written in a bit-reversed form as 0x2B75, 0x0BD0, 0x5CA3,.... One caveat here is someone at Mediatek must have made a typo and as a result products use 0x484F instead of 0x48F4 for the 23rd value.). Patterns repeat after 64 pages (1 erase block), so only the first 64 seeds are used.

Also the NAND Flash Interface in this SoC seems to have a register too small to fit an entire page. As a result every page (of 4352 bytes) gets processed as 4 partial pages (chunks/subpages of 1088 bytes) using the same XOR mask.

(For the image at hand the XOR masks were initially recovered using statistical analysis. Then the pattern was misidentified as a 120 bit LFSR. The latter turned out to be 8 cycles of PRBS-15 - which does not end on byte boundary so each cycle starts shifted by an additional bit.)

ECC

A few bit errors are expected in some of the pages due to the nature of raw NANDs. These occasional bitflips get corrected by the hardware ECC controller of the SOC.

The MT8516 does it in 1024 byte chunks at a time for 4K page sizes. The BCH codec uses the following parameters: t=32, prim_poly=17475 (m=14, n=16383 implied)

The first 8 unused bytes of the OOB/spare area also get included in the calculation.

The integrated ECC controller seems to read bits in the 'wrong order'. That means each byte of the flash dump image has to be reversed temporarily for the error correction step, then back.

(The primitive polynomial thus turns out to be X^14 + X^10 + X^6 + X + 1, or 100010001000011 in binary, 0x4443, or 042103 in octal representation. As per the MATLAB Help Center that is the default for GF(2^14).)

Simple decoders can be found here.

(Left the notes here from earlier below. Please note those parameters apply to other SOCs using 512 byte chunks: t=12, prim_poly=8219)

Initial research

The data on the NAND flash is encoded, likely using the hardware ECC flash interface of the CPU.

Assumption is ECC BCH encoding is used - see also page 1034 of this pdf and that parts of the MT CPUs are the same (e.g. messages (from the preloader here)[https://github.com/prshkr07/Thunder-Kernel/tree/master/mediatek/platform/mt6582/preloader/src] seems to be very similar to the UART logs of the dot, including spelling errors in the UART log [TOOL] <UART> receieved data: () and here)

The BCH codec module is implemented in GF(2^13) defined by primitive polynomial X^13 + X^4 + X^3 + X + 1.

GF = Galois Field/Finite field Wikipedia on BCH code

So given this, can we decode a set of bytes for which we know the output - the Android boot image should start with ANDROID!.

From the article here,

if g(x)=x8+x4+x3+x2+1 --> 0x11D in hex, 100011101 in binary then X^13 + X^4 + X^3 + X + 1 --> 10000000011011

Decoding NAND flash - jvandewiel/no-alexa GitHub Wiki

Scrambling

ECC

Initial research

⚠️ GitHub.com Fallback ⚠️

Decoding NAND flash - jvandewiel/no-alexa GitHub Wiki

Scrambling

ECC

Initial research

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️