Follow @jack_ganssle

Refreshing Software

Refresh is yet one more thing that software can, in some situation, replace.

Published in Embedded Systems Programming, April 1992

The logo for The Embedded Muse For novel ideas about building embedded systems (both hardware and firmware), join the 27,000+ engineers who subscribe to The Embedded Muse, a free biweekly newsletter. The Muse has no hype, no vendor PR. It takes just a few seconds (just enter your email, which is shared with absolutely no one) to subscribe.

By Jack Ganssle

In his wonderful book Microcosm (Touchstone Books, NY, NY), George Gilder predicts that, with a few exceptions, the semiconductor industry will one day concentrate more on the production of modest volume speciality chips than on huge runs of generic ICs. This trend is already apparent, especially in the proliferation of I/O controllers. It doesn't matter if you are using SCSI, Ethernet, SDLC, hard disks, stepper motors, or any of a hundred other peripherals: at least a half dozen vendors offer some highly integrated controller for your application.

Yet sometimes these parts are not really appropriate. A mass produced highly cost sensitive product like an electronic toy generally can't tolerate the relatively high price of these chips. The classic ultra-low cost embedded controller is the electronic greeting card. I'm sure the designers replaced every last fraction of a yen of hardware with smart code.

Several months ago (December 1991) I described how in some applications you can replace a UART with bit banging firmware. Osterberg Consulting (San Marcos, CA) sent me a version of an interrupt-driven software UART for the 8051. Beautifully coded, it uses only about 15% of the processor's time at 4800 baud with an 11.0592 Mhz crystal. It's an example of replacing expensive hardware with a clever idea and a lot of high tech elbow grease.

Lots of other I/O can be handled inside of the processor. Be sure you understand the magnitude of the software before starting, though. I aged about a decade doing a software implementation of GP-IB some years back - it just wasn't worth the grief.


A lot of embedded systems use what is in effect a hardware state machine to continuously write data to displays or other hardware. For example, a VGA card constantly copies a stream of bits from video memory to the CRT. 60 times a second the hardware repaints the screen, fooling your eye into thinking it sees a stable display.

Where bit rates are lower and costs are paramount it might make sense to replace the hardware state machines with firmware. Video, however, is so fast it is unrealistic to consider using code to refresh the screen.


Light Emitting Diodes (LEDs) are common output devices on inexpensive embedded systems. Both seven segment displays and ascii arrays are used.

Seven segment displays are, as the name implies, seven "lines" formed of LEDs, arranged in such a fashion that by judicious line selection all numbers and some characters can be displayed. They are about the cheapest way to generate numeric results. Ascii displays, on the other hand, are composed of lots of little LED bulbs ("a thousand points of light"), which can show any alphanumeric value. They are considerably more expensive than the simpler seven segment displays.

Some of these come with internal latches and drivers, so they can essentially be just hung on the computer's data bus. Quite a few have no internal electronics. The designer must provide both a driver (an amplifier that converts the computer's logic levels to much higher LED currents) and an interface to the computer bus.

The interface is quite a problem. Consider the case where a system includes 8 digits of seven segment displays. Each one needs a high power driver chip and a latch to hold the value written to the digit by the program. This could amount to as may as 16 chips!

A better solution (one that is used by most cheap systems including digital watches) is to arrange the displays in a matrix. The seven segment display is, after all, little more than seven diodes with one end connected together. 8 wires come from the package: 1 common connection point (the "digit enable" line), and 7 individual segment wires.

If we're putting 8 of these displays in a system, tie each of the seven segment leads of each package together. The result is a new level of abstraction: a package of 8 displays, with 7 segment enables and 8 digit enables coming out. If you put power on one of the digit enables and a seven segment code on the segment bus, then one display, corresponding to the powered digit line, will light.

Connect the 8 digit enables to 8 high power drivers (one IC), and to an octal latch on the computer bus (one more chip). Tie the 7 segment bus lines to another driver and latch (2 more chips). Now we're talking 4 chips instead of 16.

The firmware turns on any single digit by sending a seven segment code to the segment latch and a 1-of-8 select to the digit latch. The computer can obviously turn on any one display at a time, but there is no provision to turn them all on simultaneously.

The secret lies in the eye's persistence. The software should turn on one digit for a few milliseconds, then do the next one, and so on through the entire array. By repeating this cycle at a high speed the eye is fooled into thinking all of the digits are on, when really only one is at any point in time. It's a little like TV, where a complete picture is formed by a rapidly moving dot.

You can buy controller chips to handle this display multiplexing, but why bother? Use spare processor time (if any!) to sequence the refresh cycle.

Lashed up as described, the entire array of displays looks like two I/O ports to the code. The digit select port is always all zeroes with only a single one set, the position of which selects one of the 8 displays. The segment port is just the seven segment code required by the currently-selected display.

Use a timer to generate a sequence of interrupts. What? You don't have a timer? You can sometimes create a "fake" interrupt by doing calls in the code's main-line, but it can be tough to insure calls come often enough in all operating modes to keep the displays flashing fast enough.

In general I'd take a timer over any other peripheral. With a timer you can do wondrous things; generate accurate bit patterns, run a preemptive real time operating system, and the like. A timer can help make up for a lot of deficiencies in the hardware, but it's awfully hard to make the software run well in the absence of a timer.

As an aside... no matter how small your embedded system is, seriously consider putting at least a simple real time operating system in. A tiny RTOS uses practically no resources (other than a timer interrupt and a bit of memory). An RTOS is ideal for responding to real time events. However, far too many embedded systems start off with no RTOS only to have one shoehorned in in desperation late in the development cycle. It's a lot easier to sta`t of with an RTOS and use only a little of its power than to rewrite the code to adopt to one later.

To resume: on each timer interrupt simply change the digit port to select the next display. Put the appropriate segment code in the other port. Then return. The interrupt service routine will be short and fast, demanding little of the processor. The 12 chips we saved earlier cost little in CPU overhead.

Of course, use a sane approach to handling the ports. Rule 1 of interrupt handling is to keep the service routine short and simple! Too many applications force the ISR to convert an ascii or numeric code to the segment selection values on every interrupt. This is foolish.

Build a little table with one entry per digit (8 in the case we've been discussing). The table is global to both the interrupt service routine and to a driver called every time the firmware wishes to change the displayed value.

The driver most likely will accept an 8 digit string of character or integer data from the calling routine. It converts this to 8 segment values, one per digit, and places these in the table. It's short and sweet.

The ISR looks like:

    1. Push registers
    2. Put a zero to digit port
    3. Load pointer to table
    4. Load value from table[pointer]
    5. Put value to segment port
    6. Increment pointer (modulo table) and save
    7. Load digit byte
    8. shift left and save it
    9. put to digit port
    10. restore registers
    11. return

    On some processors the ISR will be not many more instructions than the 11 steps shown.

    Don't forget step 2. While not strictly needed (depending on the system's speed), if left out the incorrect value will be written to one digit for a few microseconds, perhaps creating a ghost image.

    The refresh rate is a function of the number of displays (more displays need a faster update) and the persistence of the eye. For 10 or so digits I find a 1 millisecond update rate more than adequate. A 1 msec ISR that takes, say 15 microseconds to run, requires only 1.5% of the CPU's time.

    Though I've focussed on LED displays, the same technique works on Liquid Crystal Displays (LCDs). However, a lot of big LCD displays with multiple ascii characters include on-board refresh, removing any need for software support.

    DRAM Refresh

    Dynamic RAMs (DRAMs, pronounced "Dee RAM") are the cheapest form of high speed rewritable data storage. They are composed of a single transistor per bit. Each "gate", or transistor input, is insulated from the substrate by a tiny non-conductive deposit. This forms a capacitor which memorizes the last value written to the transistor.

    Obviously, there are no perfect insulators. The capacitance of the junction is so tiny that the charge bleeds off within a few milliseconds. In other words, without help, all of the cells in the DRAM forget in the blink of an eye.

    Like LED displays, DRAM cells are arranged in an X-Y matrix. A simple read from every row (X line) once every few milliseconds suffices to recharge the capacitors and keep the contents of the device intact. This refresh cycle is crucial to proper operation of any DRAM, although it adds a layer of complexity to the hardware.

    If it seems that DRAMs are a tenuous affair, remember that there is good reason for the approach. A DRAM cell needs only a single transistor; three less than the simplest static RAM. As a result, DRAMs always offer much higher memory density. The technologies always move more or less in lockstep, with static densities about 4 years behind that of dynamics.

    Conventional refresh controller ICs include a counter that generates all row addresses and feeds these to the DRAM chips as required. Several chips are used, as modern 1 mb DRAM chips need a 9 bit refresh cycle (512 row addresses). Most 1 mb DRAM chips need all 512 row addresses every 8 milliseconds to guarantee data retention.

    A lot of embedded systems eliminate the need for a distinct DRAM controller by using a DMA channel to manage the refresh. The original PC works this way. It's interesting to look through the BIOS listings. RAM is just not available until the BIOS programs the DMA controller to start refresh cycles going.

    DMA is the perfect solution to the refresh problem. Generate null DMA reads from sequential addresses. Program the controller to run over and over, without computer intervention. This is especially attractive on modern high integration controllers like the 80186 with built-in DMA channels.

    Still, some systems might not have a spare DMA channel. It is possible to generate refresh completely under software control, but pay careful attention to the firmware's timing. Though I've never built a system around software refresh, I've seen several successful implementations.

    The trick is to write a really tight interrupt service routine that does little more than a read from incrementing addresses - fast.

    A timer invokes the refresh interrupt service routine. The interrupt time is dictated by the specifications of the DRAM chips. Take the 1 mb Hitachi HM511000 for example. It requires 512 refresh cycles, all of which must be completed in 8 milliseconds. This works out to one complete interrupt service every 15.6 microseconds. While blazingly fast, it is not (quite) impossible. Be wary of other interrupting devices that could create untenable latency problems.

    The ISR must be highly optimized to present minimal CPU overhead. Typically, it should contain the following steps:

      1. save processor state
      2. load next refresh address
      3. do a read from that address
      4. increment and store refresh address
      5. restore processor state
      6. return

      If your entire application is in assembly language you can greatly shrink the ISR by dedicating a register to the refresh address. This removes step 5. In Z80 assembly language, the ISR could look like:

      	push	af	; save processor state
      	ld	a,(bc)	; read and refresh
      	inc	bc	; next refresh address
      	pop	af
      	reti		; ret from interrupt

      Register pair BC is the refresh address. Though the DRAMs really only need a 9 bit counter, it is much faster to just let BC wrap through 16 bits.

      An assembler that counts T states is really handy in this sort of application to ease figuring how long the ISR takes to run. The old SLR assembler had this feature, but I don't know of a modern product that supports it.


      Don't get me wrong. I am a firm believer in using complex I/O controllers in most applications. However, where appropriate, software can and in some cases should replace the external hardware.

      Actually, my biggest objection to these big I/O chips is the seemingly hundreds of control registers some of these monsters sport. We programmers can spend weeks trying to convert a cryptic 50 page data sheet into working code. Someday, the vendors will recognize that their job is not to make chips, but to provide value to the customer. Then, they'll give us useable canned code packages along with the raw hardware.