Follow @jack_ganssle

What Goes In Must Come Out

FIFOs are hardware analogs of a sort of reverse stack. Here's how they work.

Published in Embedded Systems Programming, July 1993

The logo for The Embedded Muse For novel ideas about building embedded systems (both hardware and firmware), join the 27,000+ engineers who subscribe to The Embedded Muse, a free biweekly newsletter. The Muse has no hype, no vendor PR. It takes just a few seconds (just enter your email, which is shared with absolutely no one) to subscribe.

By Jack Ganssle

It seems no computer text is complete without a comprehensive (read "yawn") discussion of simple data structures. Favorite subjects include stacks, queues and dequeues (sometimes spelled "deque"). Though these are vitally important subjects, the chasm that separates hardware and software design often makes us forget that hardware versions of these concepts have been around for years.

Few cigar-chomping, whisky guzzling, grizzled old TTL veterans may recognize the name "queue", but FIFOs, functionally the same concept, have been a prime part of data communications devices for decades. FIFO is yet another example of the wonderful way we techies torture the English language, converting the adjective phrase "First In First Out" to the noun FIFO. Drifting around the Embedded Systems Conference, overhearing scraps of conversation, I could only think of poor Professor Higgins; our nounized, acronym-ridden speech could make him abandon his Pygmalion quest in frustration. "70% chance precip in plains of Spain".

LIFOs, Last In First Out buffers, seem to be rarely used in embedded hardware. Actually, I can't think of a single example, as the concept seems to violate the nature of real time systems where you want to respond right now to an event. I'm sure someone will read this and prove me wrong. Great!

Of course, we all use LIFOs as software stacks, though without any unique embedded twist. All modern processors give us some sort of hardware to deal with stacks easily. Did you know RCA's ancient 1802 did not? The 1802 was the first ultra-low power micro, which had no inherent subroutine mechanism. The user was forced to write a bit of nasty code to handle calls and returns. I rather like the system pioneered by the PDP-11, and later adopted by Motorola, of providing an auto increment addressing mode so you can use any register as the stack pointer.

In some cases this is a significant advantage. A single stack, as provided by the 80x88 and other families, is not always enough. Years ago I wrote a multitasking compiler targeted at embedded applications (it was a flop); the doubly recursive expression parser used 3 stacks to track operands and their modes. The 68000 version was a breeze to code; others were much more trying.

Hardware FIFOs

A surprising number of embedded systems use FIFOs built of hardware, as opposed to a software implementation. The most common application seems to be data synchronization. If the processor cannot handle the incoming data stream, a FIFO may be a logical addition to queue up the bytes till the CPU is ready for them.

An example is the Compact Disk player. Musical data must be played at a predefined, exact rate, yet the error correction algorithm, platter speed variations, etc., all conspire against this. The FIFOs keep enough data backlogged to insure that despite all of these problems, time domain fidelity is maintained.

Specialty chip vendors, like IDT and Cyprus, make a wide variety of hardware FIFOs in chip form. Interestingly enough, most are 9 bit wide, reflecting their wide usage in the data communications industry (a byte plus parity). Hardware FIFOs are defined by their width (typically 9 bits), depth (how many 9 bit "words" can be queued - typically 1k to 4k), and speed.

Conceptually, a FIFO is nothing more than dual port RAM with whose addresses are created by internal counters. One counter places bytes sequentially in the array, while another tracks removal. Unlike a conventional RAM, you can read and write from most FIFOs at the same time.

The devices are quite simple, consisting of data in and out busses (permitting simultaneous reads and writes), a write line that clocks data in, a read line to clock data out, and one or more flag bits indicating FIFO Full, Empty or sometimes even Half Full.

Each data byte is strobed into the device with the write line. If the FIFO is 1k deep, than, assuming no reads take place, after 1024 write cycles the FIFO Full flag is asserted and each additional write generally deletes the oldest data and clocks in the new. Thus, just like a software FIFO, it keeps only the most recent 1024 data items.

Any read pulls the oldest saved data from the chip. If the device was full, a single read will clear the FIFO Full flag. In most applications data is being written and read from the device asynchronously, the Full, Empty, and Half-Full flags may be toggling erratically to reflect the chaotic nature of the synchronization between input and output devices.

Different devices behave in startling different ways. I've used the IDT 7100 family extensively, and have found that once you exceed the chip's depth, it does in fact not start to save the newest data only. In fact, it stops accepting new data entirely. A close examination of the data sheet shows (always, closely examine the data sheets - it's surprising what you'll find) that once the FIFO fills you must read it before writing to it - the read clears the oldest data out, making room for the new byte. In effect, they've put part of the burden of making a real FIFO on the user. It's not too difficult to implement, but requires close attention to the timing specs.

How do you decide to use a hardware FIFO, as opposed to a purely software implementation handled in an interrupt service routine? I generally prefer software solutions, as the recurring costs (that is, manufacturing costs) are essentially zero, whereas every piece of hardware detracts from the company's bottom line. I know of only two situations where a hardware FIFO makes sense in an embedded system. The first is dealing with high-speed bursts of data. The second is precise synchronizing of data to time.

Burst Data

High speed data bursts can exceed any CPU's processing capability. A hardware FIFO is a natural choice if the data arrives (or leaves) so fast that an interrupt service routine cannot keep up, or would use too much of the total compute resources available.

But beware! FIFOs are no magic bullet. They are primarily for dealing with bursts of high speed data. If the processor cannot keep up with a data stream, than a FIFO will help only if the line goes idle from time to time. In effect, the FIFO spreads the data burst out over a longer time frame, giving the CPU a chance to catch up. Sustained, high speed data, will defeat the FIFO every time.

No doubt there is some clever way to compute the required depth of a FIFO as a function of data rate, available processor time to read the data from the FIFO, and size of the data burst. I'm reminded of a great example from elementary calculus: the perfect Martini is 1 part Vermouth to 5 parts Gin. If a frat house starts pouring x gallons of Gin per minute into a bathtub with a 1 inch hole in it, and 3 minutes later starts adding Vermouth at y gallons per minute, when will the mixture be perfect?

A thorough analysis is probably impossible (of the FIFO problem - the Martini is left as a student exercise), as it will also be a function of the other interrupts in the system. If the CPU is shut down periodically handling lots of high priority external devices, be sure that the FIFO is deep enough to catch all data arriving during the interrupts-off period.

Certainly, if the maximum number of bytes per second is greater than the number the processor can handle, no FIFO will be deep enough to guarantee that every data byte is read.

I've never resorted to a sophisticated mathematical treatment of this, as it's awfully hard to quantify the behavior of the software. Instead, I weigh the parameters and try to get a gut feel for likely depth of the queued data. I know - it's crude, but so far has been effective. It's important, though, to take some measurements after the system is complete to see just how close your estimates are. Look at the Half-Full flag on an oscilloscope - is it always asserted? This would be a sure sign of marginal design.

Time Synchronizing

Another important application is precision time synchronizing of data. Processors cannot measure the exact time of arrival of data items, as even the fastest interrupt structure has some latency. In the best of cases, with interrupts always enabled and the data interrupt given the highest priority, just the uncertainty due to varying instruction execution times will cause perhaps a microsecond or more of dither. If the data could arrive during the servicing of another interrupt, with interrupts off, then this uncertainty could be much, much longer.

By using multiple FIFOs the processor can use an external time base to get exact arrival times of the data. Put the data in one FIFO, and the time base in another. The devices will be organized logically as one very wide FIFO. The time will be queued exactly with the data.

The reverse is true as well. In the example of playing a Compact Disk, any time skew in the data provided to the digital to analog converters will result in signal distortion. You just cannot predict with any certainty the exact timing of a CPU, which can resemble the harried executive responding to many different requests, from interrupts to DMA to refresh generation to irregular instruction timing. So, the processor can write data to FIFOs, while a precision time base clocks data out of them. The FIFO then becomes a metronome-like conductor, taking erratically timed data and playing it back with a regular beat.

Design Considerations

It's easy to connect a FIFO to a microprocessor, but be careful to properly use the FIFO's status flags.

Most designs seem to connect FIFO FULL to an interrupt line. Conceptually simple, this results in the processor being interrupted only after the entire buffer is full. If a little extra latency causes a short delay before the CPU reads the FIFO, then an extra data byte arriving before the FIFO is read will be lost.

An alternative is using the inverse of FIFO EMPTY. A single byte arriving will cause the micro to read the FIFO. This has the advantage of keeping the FIFOs relatively empty, minimizing the chance of losing data. It also makes the biggest demand on CPU time, generating interrupts with practically every byte received.

I think it makes more sense to connect FIFO HALF-FULL, if the signal exists on the FIFOs you've selected, to the interrupt line. HALF-FULL is a nice compromise, deferring processor cycles until a reasonable hunk of data is received, yet leaving free buffer space for more data during the ISR cycles.

Some processors do amazing things to service an interrupt, stacking addresses and vectoring indirectly all over memory. The ISR itself no doubt pushes lots of registers, perhaps also preserving other machine information. If the HALF-FULL line generates the interrupt, than you have the a priori information that lots of other data is already queued and waiting for processor time. Save overhead by making the ISR read the FIFOs until the EMPTY flag is set. You'll have to connect the EMPTY flag to a parallel port, so the software can read it, but the increase in performance is well worth it.

In mission-critical systems it might also make sense to design a simple circuit that latches the combination of FIFO-FULL and an incoming new data item. This overflow condition could be disastrous, and should be signaled to the processor.


I hate to get back on my noise soapbox, but FIFOs seem to be particularly sensitive to electronic noise and glitches. Some older FIFOs had no hysteresis on the write line, so the smallest glitch could cause erratic extra writes. This was particularly apparent with wide systems, where several FIFOs were ganged in parallel to achieve an 18 bit or larger word. A little noise could cause the FIFOs to get out of sequence, with each one at different fill states.

The 7202 1k by 9 bit FIFOs were particularly subject to input noise, though newer designs have largely cured these problems. In one case, a company I know observed a 1 nsec glitch that caused strange clocking. 1 nsec - that's so fast that most scopes will not see it. Even though the problem has been largely eliminated with new silicon, most vendors recommend the use of high quality multilayer circuit boards and careful transmission line design practices around the write line.


It's tempting to resort to a hardware FIFO as a alternative to careful software design. Resist this impulse unless you have a compelling reason! It's surprising just how often designers add hardware to make up for either lazy or poor software design.

If you elect to use a software FIFOing scheme, though, model it like a hardware device. You never know what might get changed later. Access it via device drivers, perhaps even using a software interrupt or OS call to queue data in and out. One mantra of the 90's, a fundamental part of object oriented programming, is surely data abstraction.