Real Time Programming
Basics of Real Time. Originally in Embedded Systems Programming, July, 1998.
|For novel ideas about building embedded systems (both hardware and firmware), join the 27,000+ engineers who subscribe to The Embedded Muse, a free biweekly newsletter. The Muse has no hype and no vendor PR. It takes just a few seconds (just enter your email, which is shared with absolutely no one) to subscribe.
By Jack Ganssle
Quick - come up with a one-sentence definition of "embedded"! With embedded PCs becoming common, and even MIPs chips buried into inexpensive consumer products, "embedded" is a term whose meaning is ever more nebulous.
So too for the designation "Real Time", a term whose meaning is more in the mind of the beholder than cast in linguistic concrete. In fact, the community recognizes this confusion by defining two sorts of real-time - "hard" and "soft".
A hard real time task or system is one where an activity simply must be completed - always - by a specified deadline. The deadline may be a particular time or time interval, or may be the arrival of some event. Hard real time tasks fail, by definition, if they miss such a deadline.
Notice this definition makes no assumptions about the frequency or period of the tasks. A microsecond or a week - if missing the deadline induces failure, then the task has hard real time requirements.
"Soft" real time, though, has as definition as weak as it's name. By convention it's those class of systems that are not hard real time, though generally there are some sort of timeliness requirements. If missing a deadline won't compromise the integrity of the system, if generally getting the output in a timely manner is acceptable, then the application's real time requirements are "soft". Sometimes soft real time systems are those where multi-valued timeliness is acceptable: bad, better and best responses are all within the scope of possible system operation.
Interrupts are inexorably linked with real time systems, as only the interrupt bypasses the time-consuming tedium of polling multiple asynchronous inputs. Yet a surprising number of very fast applications are crippled by the overhead associated with servicing interrupts. Though chip vendors spec interrupt latency in terms of the time the hardware needs to recognize the external event, to firmware folks a more useful measure is time-from-input to the time we're doing something useful, which may be many dozens of clock cycles. The multiple levels of vectoring needed by the average processor, plus important housekeeping like context pushing, are all ultimately overhead incurred before the code starts doing something useful.
Similarly, real time operating systems (RTOS) are one of the most important and common tools in the real time arsenal. Yet the RTOS too provides no guarantee of real time response.
The first rule of real time design is to know the worst case performance requirements of each activity! and only then select the right implementation (CPU, hardware design, and firmware organization). It's important to think in the time domain as well as in that of the conventional procedural.
Enter the RTOS
A real time application may employ the lowliest of 4 bit CPUs running a simple polled loop. In fact, these tend to be the most deterministic of all systems as their simple requirements are easy to understand, and to satisfy in a timely manner.
Fortunately - for our job security - most products today manage multiple independent inputs, outputs, and activities. Sure, with enough work a suitably convoluted polling program can handle many complex real time operations. We can also write our code entirely in hex codes.
Whenever an application manages multiple processes and devices, whenever one handles a variety of activities, an RTOS is a logical tool that lets us simplify the code and help it run better.
Consider the difficulty of building, say, a printer. Without an RTOS one monolithic hunk of code would have to manage the door switches and paper feeding and communications and the print engine - all at the same time. Add an RTOS and individual tasks each manage one of these activities; except for some status information no task needs to know much about what any other one is doing. In this case the RTOS allows us to partition our code in the time domain (each of these activities are running concurrently) and procedurally (each task handles one thing).
An important truism of software engineering that code complexity - and thus development time - grows much faster than program size. Any mechanism that segments the code into many small independent pieces reduces the complexity; after all, this is why we write with lots of functions and not one huge main() program. Clever partitioning yields better programs faster, and the RTOS is probably the most important way to partition code in the time dimension.
At its simplest level an RTOS is a context switcher. You break your application into multiple tasks and allow the RTOS to execute the tasks in a manner determined by its scheduling algorithm. A round robin scheduler typically allocates more or less fixed chunks of time to the tasks, executing each one for a few milliseconds or so before suspending it and going to the next ready task in the queue. In this way all tasks get their fair shot at some CPU time.
Another sort of scheduler is one using RMA - Rate Monotonic Analysis. If the CPU is not completely performance bound, it's sometimes possible to guarantee hard real time response by giving each task a priority inversely proportional to the task's period.
Regardless of scheduling mechanism, all RTOSes include priority schemes so you can statically and dynamically cause the context switcher to allocate more or less time to tasks. Important or time-critical activities get first shot at running. Less important housekeeping tasks run only as time allows. Your code sets the priorities; the RTOS takes care of starting and running the tasks.
If context switching were the only benefit of an RTOS then none would be more than a few hundred bytes in size. Novice users all too often miss the importance of the sophisticated messaging mechanisms that are a standard part of all commercial operating systems. Queues and mailboxes let tasks communicate safely.
"Safely" is important, as global variables, the old standby of the desperate programmer, are generally a Bad Idea and are deadly in any interrupt-driven system. We all know how globals promote bugs by being available to every function in the code; with multitasking systems they lead to worse conflicts as several tasks may attempt to modify a global all at the same time.
Instead, the operating system's communications resources let you cleanly pass a message without fear of its corruption by other tasks. Properly implemented code lets you generate the real time analogy of OOP's first tenant: encapsulation. Keep all of the task's data local, bound to the code itself and hidden from the rest of the system.
For instance, one challenge faced by many embedded systems is managing system status info. Generally lots and lots of different inputs, from door switches to the results of operator commands, affect total status. Maintain the status in a global data structure and you'll surely find it hammered by multiple tasks. Instead, bind the data to a task, and let other tasks set and query it via requests send through queues or yes"> mailboxes.
Is this slower than using a global? Sure. Uses more memory, too. Just as we make some compromises in selecting a compiler over an assembler, proper use of an RTOS trades off a bit of raw CPU horsepower for better code that's easier to understand and maintain.
Most operating systems give you tools to manage resources. Surely it's a bad idea for multiple tasks to communicate with a UART or similar device simultaneously. One way to control this is to lock the resource - often using a semaphore or other RTOS-supplied mechanism - so only one task at a time can access the device.
Resource locking and priority systems lead to one of the perils of real time systems: priority inversion. This is the deadly condition where a low priority task blocks a ready and willing high priority task.
Suppose the systems is more or less idle. A background, perhaps unimportant, task asks for and gets exclusive access to a comm port. It's locked now, dedicated to the task till released. Suddenly an oh-my-god interrupt occurs that starts off the system's highest priority and most critical task. It, too, asks for exclusive comm port access, only to be denied that by the OS since the resource is already in use. The high priority task is in control; the lower one can't run, and can't complete it's activity and thus release the comm port. The least important activity of all has blocked the most important!
Most operating systems recognize the problem and provide a work-around. For example in VxWorks you can use their mutual exclusion semaphores to enable "priority inheritance". The task that locks the resource runs at the priority of the highest priority task that is blocked on the same resource. This permits the normally less-important task to complete, so it can unlock the resource and allow the high priority task to do its thing.
Surveys indicate that even today vast numbers of developers write their own RTOSes, a fact that's hard to reconcile with our apparent devotion to software reuse. Something like 80 vendors offer a staggering array of operating systems, ranging from tiny versions for PIC-like CPUs to ones that provide complete windowing GUIs.
Pricing is all over the map, as some vendors sell the RTOS outright, while others require royalty payments. Some provide only the binary image of the operating systems; others come with full source. Comparing RTOS prices is difficult at best because of the wide range of pricing models, different CPUs supported, and varying support options. Suffice to say that most RTOSes sell for several thousand dollars. Royalty payments, if any, run around a few bucks per unit or less. And be assured that a commercial RTOS is available for just about any CPU.
Memory requirements are just as diverse, with smaller versions requiring only a few K; others run into the megabytes.
A new wrinkle in the RTOS market appeared last year when Microsoft released Windows CE, which is targeted at applications served by some embedded RTOSes. It's already common in PDAs and similar barely-embedded products. Will we see it take over more of the truly embedded market? That's a question that only Bill Gates and Las Vegas can answer; suffice to say that today the product's real time response is pretty dismal. Microsoft has announced a program to improve CE's performance, targeting sub-50 microsecond thread latencies by mid-1999.
An appeal of CE is it's built-in GUI, something more and more low-end systems are crying out for. Microsoft is not the only vendor of GUIs, though. QNX, for example, long a vendor of RTOSes for embedded x86 systems, sells their Photon microGUI which delivers a POSIX API in a reasonably-sized footprint. Unlike CE, QNX's product offers fast multitasking fast normal">today.
Using an RTOS also brings new perils. One of the more underreported ones is stack allocation. Most of us are familiar with the scientific way we decide how big the stack should be on the system (take a guess and hope). With an RTOS the problem is multiplied since every task has its own stack.
It's feasible, though tedious, to compute stack requirements when coding in assembly language by counting calls and pushes. C - and even worse C++ - obscure these details. Runtime calls further distance our understanding of stack use. Recursion, of course, can blow stack requirements sky high.
Given that it's difficult at best to pick a logical stack value, it makes sense to be prepared to observe stack use after you build the system to insure that a push doesn't run off the stack into critical variables.
Since stack size is a guess, write your code from the very start to find stack problems. In the startup code or whenever defining a task fill the task's stack with a unique signature like 0x55AA. Then, probe the stacks occasionally using your debugger and see just how many of the assigned locations have been used (the 0x55AA will be gone). Knowledge is power.
Since the stack is a source of trouble it's reasonable to be paranoid and not allocate buffers and other sizable data structures as automatics. Watch out! Malloc(), a quite logical alternative, brings it's own set of problems. A program that dynamically allocates and frees lots of memory - especially variably-sized blocks - will fragment the heap. At some point it's quite possible to have lots of free heap space, but so fragmented that malloc() fails. If your code does not check the allocation routine's return code to detect this error, it will fail horribly. Of course, detecting the error will also no doubt result in a horrible failure, but gives you the opportunity to show an error code so you'll have a chance of understanding and fixing the problem.
Sometimes an RTOS will provide alternative forms of malloc(), which let you specify which of several heaps to use. If you can constrain your memory allocations to standard-sized blocks, and use one heap per size, fragmentation won't occur.
Garbage collection - which compacts the heap from time to time - is almost unknown in the embedded world. It's one of Java's strengths and weaknesses, as the time spent compacting the heap generally shuts down all tasking. See P.J. Plaugher's recent articles on garbage collection for his view of real time solutions.
Any real time design obviously has fast normal">time as an integral part of the system's success. It's na've to use conventional procedural debuggers, which are targeted at looking at static code and data, to deal with finding the unique problems associated with a time-based design. If you're not prepared to measure time you're ignoring an aspect of the system every bit as important as the difference between "==" and "=" in the C code.
Most simple development tools like ROM monitors and BDM debuggers are too deprived of hardware resources to provide much insight into system timing. An exception is HP's new 16600A tool, which blends a BDM-like CPU probe with a fast logic analyzer. The BDM part gives access to your code and target operation (a high level debugger correlates to the original source). The logic analyzer supports tracing and time tagging of events.
In-circuit emulators, of course, have long included deep trace buffers with time stamp information included. Event timers track time from event A to event B. Though an emulator is the most expensive of all debugging tools it's also the most versatile.
However, many modern processors with integrated cache, pipelines and the like make traditional emulation so difficult and expensive that it's sometimes not an option. Yet the real time issues are more severe than ever due to the increasing complexity of systems. Some tool vendors have taken to Instrumenting your code to extract necessary timing information. Applied Microsystem's CodeTEST products, for example, include a preprocessor that seeds information-generating instructions into your source code, which the tool then detects in real time. The cost is about a 10% performance penalty, but it does make the invisible visible by giving you a time-based look at the firmware.
Debugging changes further when using an RTOS. Suppose you're debugging task A. Single stepping through that chunk of code, should the other tasks still run at full speed? Suppose a communications task gets stopped every time you set a breakpoint anywhere? That might cause a catastrophic loss of data leading to loss of sync with other processors.
An almost incestual series of relationships have sprung up between RTOS vendors, debugger companies, and compiler folks to insure that no matter what RTOS and compiler mix you use, a tool exists that is aware of the RTOS's internal tables. The debuggers use this information to allow you to work at a very high level, setting breakpoints symbolically on different tasks as you glean details about messages and task states from appropriate displays.
Some RTOSes, like VRTX and pSOS come with pseudo-"agents", small kernals loaded into your code, that communicate with your debugger to pass back all sorts of neat debug info in real time. Essentially running as a separate task, the agent lets you stop a single task as the rest of the code continues to run. Timers and their ISRs continue to run, comm routines are unaffected by debugging, and even DMA activities continue intact.
Many low-cost ROM monitors and ROM emulators support these agents. In fact, many RTOSes, such as RTXC, come with a complete debugger designed to let you work with your real time application in real time.
In "From the Earth to the Moon" Jules Verne described a never-ending battle between the canon makers and armor vendors. The dynamics of competition served to keep both sorts of products in relative balance! and the engineers eternally frustrated.
The embedded industry is no different. As processors get faster and cheaper we seem to stress them just as hard as we did in the 8080 days, since applications are getting more complex and demanding. We now have, though, the tools needed (in the form of RTOSes, debuggers, profiles and the like) to both design a well-structured real time system, and to understand the time-based behavior of these systems.
If you're not using an RTOS in your embedded designs today, you surely will be tomorrow. Get familiar with the concepts, as designing tasking code requires a somewhat different view - the time domain view - than conventional procedural programming. Check out Jean LaBrosse's free uC/OS; the companion book is as good of an introduction to using an RTOS as you're likely to find. See www.micrium.com.
Improvements to these tools come almost daily. Keep on top of the field to avoid the fate of the dinosaurs.