The vector table
September 5, 2022 ยท View on GitHub
This article - it is hoped - will help to dispell some misunderstandings and knowledge gaps about one of the most important parts of the flash of an AVR.
What is the vector table?
At the very start of any AVR program, there is what is known as a vector table. This has one entry in it per interrupt (if using one of my cores, I provide a list of these for supported parts), plus what is called the "reset vector" as the first entry.
Location of vector table
The vector table is the very beginning of a program. It is unusual for interrupts to be used by a bootloader.
On classic AVRs, it always starts at 0x0000
No exceptions. Any hex file that starts at 0x0000 and doesn't start with a vector table is cause for concern.
On modern AVRs, it starts at the beginning of the application.
This is 0x0000 when no bootloader is used. Otherwise, this immediately follows the page dedicated to the bootloader, and will still be at the start of any exported hex file.
What's in the vector table?
All of the "vectors" on a part are either 2 bytes long (if flash <= 8k) or 4 bytes (if it's larger than that). Each entry in the vector table contains an rjmp or a jmp instruction (in theory you could have other instructions right here, but you need to go to great lengths if you want to experiment wich such things, beause it's set up by by the avr-libc crt (which is supplied by the toolchain as an object, so you have to rebuild that to change it, which is not a trival proceedure.)
These jumps are the interrupt vectors, and some Microchip documents have used phrasing like "execution is vectored to" to describe the process of getting to this point after an enabled interrupt's interrupt flag gets set while interrupts are enabled.
Because they all contain one of two instructions, you can clearly tell if you're looking at a real vector table from the hex file:
- All 2-byte vectors are 0xEnnn for an rjmp.
- In hex files, because of different endianness, this looks like nnEn (ie, the third hex digit in every opcode in the vector table will be an E per AVR instructionset manual The n's indicate positive or negative offsets from the current location,)
- 4-byte vectors may be that, or they may be full jmps, these are 95n[C-F]nnnn, hence show as n[C-F]95nnnn. The n's form an absolute location, word addressed, where the ISR begins. With 128k or less, the digit marked [C-F] is always C. On 256k parts, it may be C or D.
- All vectors are always filled with these jmp or rjmp instructions. Unused vectors (most of them) will all point to BAD_ISR.
- That makes the vector table by far the most recognizable feature of the hex files to the human eye (maybe the only recognizable feature, unless you intentionally add a thicket of noops as signposts, which having opcode 0x0000 are very recognizable and never generated by the compiler except when inline assembly is rendered, for example in delayMicroseconds().
- You can thus immediately tell whether a hex file was compiled for use with or without a bootloader, or if it is a flash dump from a device that did have a bootloader.
- If the starting address is 0, and it starts with a vector table, it was compiled for use without a bootloader.
- If the starting address is 0x0100 or 0x0200, it had better start with a vector table, otherwise the file will not work. Assuming it does, it's meant for a bootloader (those correspond to 256 or 512 byte bootloaders. Optiboot for both tinyAVR and Dx-series are 512b bootloaders)
- If the starting address is 0 and it doesn't start with a vector table, you should see a vector table not far from there, likely at 0x0200. This indicates a flash file dumoped from a chip that did have a bootloader.
- In this last case, that first page before the vector table is the bootloader (it will likely start with a jump instruction at 0x0000, but that won't be followed by a thick tangle of same).
- At some page boundary (page size varies depending on flash size, see the datasheet), you may see a gap in the data, or number of consecutive empty (0xFFFF) bytes immediately before the end of one page, before code resumes at the start of the next page. The bootloader erases page-by-page, meaning that if you upload a shorter sketch than the previous one, there will be a page containing the end of the new, short application, followed by some empty flash, and then the later pieces of the code previously there. That code is theoretically unreachable in correctly functioning code, however corrupting the stack pointer or contents of the stack can overwrite a return address and jump to a location which may be non-deterministic (ex, if it was overwritten with data from a sensor through an array overrun when taking multiple samples). Without the bootloader, or if it was uploaded to a freshly bootloaded chip, if that value it was overwritten with worked out to an address after the end of the application, it would trigger a dirty reset, but with a bootloader that had previously loaded code, it could land in the debris left behind from previously installed applications. That in turn could easily bounce execution back into some random point in your application.
- That's not an ideal situation, and I'm looking at how I can improve this while keeping the bootloader under 512B
- Note that similar patterns may be seen if some data is stored in MAPPED_PROGMEM on a Dx-series part. This has similar consequences in terms of making "stack smashing" bugs behave more bizarrely. While they render the application unusable either way, this spooky behavior can make it harder to determine that this is what's happening, slowing the debugging process.
- On an AVR Dx-series parts with >32k of flash, if you have declared anything PROGMEM_MAPPED, and didn't fill the flash all the way to there, when you look at the dumped hex files and are looking at a dumped hex file extracted from hardware, you will obser
BAD_ISR
BAD_ISR is an error handling routine. It might be better described as a mishandling routine, though. All it does is jump to 0x0000, as if that did a software reset (it doesn't - it restarts the program, but all the special function registers contain their old values. This is rarely desirable behavior. megaTinyCore and DxCore detect dirty resets, either in the bootloader, or very early in initialization if not using one, and fire a software reset if it is seen, so that things have a prayer of working. Otherwise, the typical result is a hung or bootlooping system. This detection method is the reason why those cores always clear the reset flag register at startup (the value it held is stored in GPIOR0 in case the application needs to know) - if execution arrives back at 0x0000, only if the reset flags were previously reset could we test if any flags were set, and see that none were (and thus no reset had occurred), so that we could know we needed to software reset into a known state. Code across the core and bootloader assumes that the chip is starting from a fresh reset, and if that assumption is violated, all manner of things malfunction.
Reminder: Dirty resets are super critical bugs.
You must treat any case of unexpected resets that are shown to be from the dirty reset trap as a critical bug (that is when after the reset, GPIOR0 indicates that the software reset flag was set on startup and you didn't do a software reset yourself). A random reset from hardware/power reasons, while still a serious defect, should scare you less - and that is because those are exceedingly unlikely to result in arbitrary code execution, while things that get dirty resets from software bugs are almost certain to be capable of executing arbitrary code. Anything getting dirty resets is unfit for release Full stop, no exceptions. If you see a dirty reset, you must not rest until it has been understood and rectified. We try to do that best we can to eliminate the potential for these to leave the chip hung, but that only makes it easier to work with the chip while debugging it (especially if the problem is discovered where it is difficult to powercycle it), but the chance that we will be able to make these events into a clean reset is never 100%, and depends on a near infinitude of details. It is the responsibility of the application programmer to correct the bug in their code, or if they are unable to, to hire a consultant who can, before releasing any products to the public containing that code. These bugs, if studied in sufficient detail, can often be leveraged by adversaries to gain full control over vulnerable systems if they have any sort of network connection, and from there a malicious actor could use that to infect other devices on the network with more sophisticated methods.