Cache_Read_Enable - mhightower83/Arduino-ESP8266-misc GitHub Wiki

WIP

New info to integrate in

The NONOS SDK 3.0 support more IRAM but I failed to find anything on my searches, except for an RTOS reference. It is in the NONOS SDK API Reference.

Since ESP8266_NonOS_SDK_v3.0, we have added a new feature to enable using iRAM as memory, which can provide about 17 KB extra memory (but the cache size will decrease to be only 16 KB)

They also indicate adding the new iRAM to the heap, which I find curious.

After setting as above, iRAM is enabled as the first chosen memory by default. os_malloc, os_zalloc and os_calloc will allocate from iRAM first, and dRAM will be the next available memory when iRAM is used up.

I am now wondering if part of the reason they implemented the exception handler to support unaligned access non-32-bit access (load/store exception) for flash strings was so they could use the logic to support strings byte and short variables in iRAM as well. I think this could also affect the performance of early heap allocations coming from the iRAM. IMO iRAM should not be at the beginning of the heap. Maybe at the end to handle OOM. I think this added RAM should be assigned to task that will mostly access it as 32-bit values and avoid the overhead of handling exceptions.

ESP8266 ICACHE vs IRAM

The documentation description of these is thin. This is my interpretation of what they mean and their relationship to Cache_Read_Enable

What is it?

The ESP8266 has a total of 64K of instruction memory, IRAM. This 64K of IRAM is composed of one dedicated 32K block of IRAM and two 16K blocks of IRAM. The last two 16K blocks of IRAM are flexible and can be used as a transparent cache for external flash memory. They can either be used for IRAM or an instruction cache for executing code out of flash, ICACHE.

The ICACHE implementation specifics are unknown. The general concept is similar to a lot of other virtual memory systems. You have a virtual address space, an I/O storage device, and a small block of memory (a cache). The CPU executes the code out of the cache. When a function call is made to a function that is in the cache, it is called a hit. When the function is not in the cache, that is called a miss. When a miss occurs, the application execution stops, while an I/O operation brings in the missing code to the cache. Then, execution resumes. I'll leave a deeper description up to you and an Internet search engine.

For the ESP8266 we have a virtual 1 megabyte of execution address space starting at 0x40200000. This address space is mapped over a section of your Flash memory that is connected to the SPI bus. The Flash and SPI bus are the I/O storage device. One or two of the 16K blocks of the flexible IRAM are repurposed to build the instruction cache (ICACHE).

Cache_Read_enable is the name of the Boot ROM API call that enables this virtual memory execution of code in flash. The NONOS SDK uses both blocks of 16K IRAM to give us 32K of ICACHE. The parameters it passes to Cache_Read_enable assigns both 16K IRAM blocks to support ICACHE. This gives your application a total cache size of 32K.

With a little code wrapping trick, we can reduce the cache size to 16K, allowing us to keep 16K of the f1lexible IRAM (0x40208000), giving us a total of 48K of IRAM. The trade-off of reducing the cache to 16K will be less executable code in the cache. Thus, more cache misses and more SPI bus transfers, slowing execution.

Whether the 16K cache size is a problem will depend on the application and other trade-offs, it is up to you to decide what works best for your application.

Cache_Read_Enable

Cache_Read_Enable as in Instruction Read Cache enable, ICACHE. This function is underdocumented. It has been used by rboot, zboot, and the RTOS SDK has calls to it in the bootloader.

When you select a 16K vs a 32K ICACHE size you hold on to 16K more of your 64K IRAM, giving you 48K to work with. The NONOS SDK has no option to select 16K vs 32K.

The first two arguments appear to specify which 1MB block of the flash to access with the ICACHE.

  • The first argument, map, is partly understood it has three values 0, 1, 2 <=. The value 0 selects the even 1MB block and 1 selects the odd 1MB block, in other words, bit flash address bit20. No guesses on value 2 or greater.
  • The second argument, p, appears to at bit 21. Speculating, it may be bits 23, 22, 21 of the flash address. A three-bit field is cleared in the register for argument two.
  • The third argument, v, hold our center of attention. A value of 0 selects 16K and a value 1 selects 32K ICACHE.

This function enables executing code in flash by using a block of IRAM to create an ICACHE for flash execution. You can select a cache size of 16K or 32K. This will reduce the available IRAM by the same amount. When using the 16K option, IRAM will be at 0x40100000 length 0x8000 and 0x40108000 length 0x4000.

Because the SPI bus is assigned to supporting the virtual memory feature, direct calls to Boot ROM API's for reading/writing/... to flash will not work. The virtual memory operation has to be suspended when direct access to flash is required, as in SPIFF file access. The flash functions supplied by the NONOS SDK will take care of this activity.

If you need to go it alone. You can surround your Boot ROM API calls with Cache_Read_Disable_2 and Cache_Read_Enable_2. The code that performs this access must be in IRAM since the cache is being disabled to gain access to the SPI interface to directly access the flash.

These requirements are handled by the NONOS SDK wrappers of these APIs and might be callable before the SDK is initialized, assuming they have no init requirements.

I suspect that changing the cache size may require a power cycle or EXT_RST to work properly. Since IRAM is back at address 0x40108000 and 0x4010C000 at reboot, I assume there is no problem with changing cache size at boot; however, during a boot cycle, there is an issue. When register 0x3FF00024 is updated, the Boot ROM code ORs in the new bits without clearing out the old. I think for a given boot cycle, all calls to this function need to use the same value for ARG3 until the next reboot.

extern "C" void Cache_Read_Enable(uint8_t map, uint8_t p, uint8_t v);

#ifdef CONFIG_SOC_FULL_ICACHE
#define SOC_CACHE_SIZE 1 // 32KB
#else
#define SOC_CACHE_SIZE 0 // 16KB
#endif

This can be used to reduce the cache size to 16K. It works with NONOS SDK 2.x

#ifdef SELECT_CACHE_SIZE_16K
extern "C" void Cache_Read_Enable(uint32_t map, uint32_t p, uint32_t v);

#define ICACHE_SIZE_16K 0
#define ICACHE_SIZE_32K 1

#ifndef ROM_Cache_Read_Enable
#define ROM_Cache_Read_Enable         0x40004678
#endif

typedef void (*fp_Cache_Read_Enable_t)(uint32_t map, uint32_t p, uint32_t v);
constexpr fp_Cache_Read_Enable_t real_Cache_Read_Enable = (fp_Cache_Read_Enable_t)ROM_Cache_Read_Enable;

extern "C" void IRAM_ATTR Cache_Read_Enable(uint32_t map, uint32_t p, uint32_t v) {
  (void)v;
  real_Cache_Read_Enable(map, p, ICACHE_SIZE_16K);
}

constexpr uint32_t *new_ram = (uint32_t *)0x40108000;
constexpr size_t new_ram_sz = 16*1024;
#endif

Cache_Read_Disable

extern "C" void Cache_Read_Disable(void);

research pointers

TODO: Find URL to this: Note, there are some errors in here. esp8266web/info/libs/bios/SpiFlash.c

PROVIDE	( dport_        = 0x3ff00000);
PROVIDE ( spi0_		= 0x60000200);

extern volatile uint32 dport_[64];		// 0x3ff00000
extern volatile uint32 spi0_[64];		// 0x60000200

#define IDX_SPI_CTRL		2
#define DPORT_BASE		dport_		// 0x3ff00000
#define SPI0_CTRL		spi0_[IDX_SPI_CTRL]

// ROM:400047F0
void Cache_Read_Disable(void)
{
	while(DPORT_BASE[3] & (1<<8)) { // 0x3FF0000C
		 DPORT_BASE[3] &= 0xEFF;
	}
	SPI0_CTRL &= ~SPI_ENABLE_AHB;
	DPORT_BASE[3] &= 0x7E;
	DPORT_BASE[3] |= 1;
	while((DPORT_BASE[3] & 1) == 0);
	DPORT_BASE[3] &= 0x7E;
}

// ROM:40004678
void Cache_Read_Enable(uint32 a2, uint32 a3, uint32 a4)
{
	while(DPORT_BASE[3] & (1<<8)) { // 0x3FF0000C CACHE_FLASH_CTRL_REG
		 DPORT_BASE[3] &= 0xEFF;
	}
	SPI0_CTRL &= ~SPI_ENABLE_AHB; // отключить аппарат "кеширования" flash // disable flash caching device
	DPORT_BASE[3] |= 1; // CACHE_FLUSH_START_BIT
	while((DPORT_BASE[3] & 1) == 0);
	DPORT_BASE[3] &= 0x7E;
	SPI0_CTRL |= SPI_ENABLE_AHB; // включить аппарат "кеширования" flash // enable flash caching device
	uint32 a6 = DPORT_BASE[3];
	if(a2 == 0) {
		DPORT_BASE[3] &= 0xFCFFFFFF;
	}
	else if(a2 == 1) {
		DPORT_BASE[3] &= 0xFEFFFFFF;
		DPORT_BASE[3] |= 0x02000000;
	}
	else {
		DPORT_BASE[3] &= 0xFDFFFFFF;
		DPORT_BASE[3] |= 0x01000000;
	}
	DPORT_BASE[3] &= 0xFBF8FFFF;
	DPORT_BASE[3] |= (a4 << 26) | (a3 << 16);

	if(a4 == 0) {
                // enable 16k IRAM block in SPI Flash cache
                DPORT_BASE[9] |= 8; // 0x3FF00024 включить блок 16k IRAM в кэш SPI Flash 
	} else {
                DPORT_BASE[9] |= 0x18; // 0x3FF00024 включить блок в 32k IRAM в кэш SPI Flash
        }
	if((a6 & 0x100) == 0) do {
		DPORT_BASE[3] |= 0x100;    // CACHE_READ_EN_BIT
	} while((DPORT_BASE[3] & 0x100) == 0);
}

Another site with https://richard.burtons.org/2015/06/12/esp8266-cache_read_enable/ description with an additional reference to the bootloader_support.c that I found.

At Espressif's github ESP8266_RTOS_SDK bootloader_suppot.c has a define for SOC_CACHE_SIZE. And gets from called here

Also has call to Cache_Read_Enable() bootloader_flash.c

These functions get a lot of use, switching back and forth during Flash reads.

#define CACHE_FLASH_CTRL_REG           0x3ff0000C
#define CACHE_FLUSH_START_BIT          BIT0
#define CACHE_EMPTY_FLAG_BIT           BIT1
#define CACHE_READ_EN_BIT              BIT8

#define PERIPHS_SPI_FLASH_CTRL         SPI_CTRL(SPI)
#define SPI 0
#define SPI_CTRL(i)                    (REG_SPI_BASE(i)  + 0x8)
#define REG_SPI_BASE(i)                (0x60000200-i*0x100)

#define SPI_ENABLE_AHB                 BIT17
#define SPI_EXT2(i)                    (REG_SPI_BASE(i)  + 0xF8)
#define SET_PERI_REG_MASK(reg, mask)   WRITE_PERI_REG((reg), (READ_PERI_REG(reg) |   (mask)))
#define CLEAR_PERI_REG_MASK(reg, mask) WRITE_PERI_REG((reg), (READ_PERI_REG(reg) & (~(mask))))

void Cache_Read_Disable_2(void)
{
    CLEAR_PERI_REG_MASK(CACHE_FLASH_CTRL_REG,CACHE_READ_EN_BIT);
    while(REG_READ(SPI_EXT2(0)) != 0) { }
    CLEAR_PERI_REG_MASK(PERIPHS_SPI_FLASH_CTRL,SPI_ENABLE_AHB);
}

void Cache_Read_Enable_2()
{
    SET_PERI_REG_MASK(PERIPHS_SPI_FLASH_CTRL,SPI_ENABLE_AHB);
    SET_PERI_REG_MASK(CACHE_FLASH_CTRL_REG,CACHE_READ_EN_BIT);
}
void Cache_Read_Enable_New(void) __attribute__((alias("Cache_Read_Enable_2")));

Defined in spi_flash_raw.c