The Embedded Muse 331

Go here to sign up for The Embedded Muse.

The Embedded Muse
Issue Number 331, June 19, 2017
Copyright 2017 The Ganssle Group

Editor: Jack Ganssle, jack@ganssle.com

Jack Ganssle, Editor of The Embedded Muse

You may redistribute this newsletter for non-commercial purposes. For commercial use contact jack@ganssle.com.

Contents

Editor's Notes
Quotes and Thoughts
Tools and Tips
Freebies and Discounts
XML or Binary for Config Data?
Dealing With Complex I/O
Point/Counterpoint on Embedded Ransomware
Jobs!
Joke for the Week
Advertise with us
About The Embedded Muse

Editor's Notes

This issue marks 20 years of The Embedded Muse. TEM #1 came out June 16, 1997. A lot has changed in this industry in the intervening decades! I want to thank all of the readers for your support, your contributions, and your thoughtful emails over the years.

Normally, the Muse goes out twice a month, but that will drop to a single issue each in July and August, as it kicks back for some summer fun.

Quotes and Thoughts

"Heuristic is an algorithm in a clown suit. It's less predictable, it's more fun, and it comes without a 30-day, money-back guarantee." -
Steve McConnell

Tools and Tips

Please submit clever ideas or thoughts about tools, techniques and resources you love or hate. Here are the tool reviews submitted in the past.

Freebies and Discounts

Enter the contest via this link.

XML or Binary for Config Data?

In the last issue I posed a question asked by a reader: does it make sense to store configuration information in binary or in XML? That elicited a flood of replies, with most readers suggesting the use of JSON. Here are some of the correspondents' thoughts:

Thor Thau wrote:

I have some thoughts about storing config data and binary vs XML format.

JSON: I personally prefer JSON over XML as a human-readable format. It does still have a flexible structure with tagged fields but alot lower data overhead.

I have used JSON to great success in a project with configuration data modified via an embedded HTTP server because JSON is easily integrated with JavaScript. JavaScript was used to request the JSON configuration from the microcontroller which the user could modify and JavaScript would push the updated JSON configuration back to the microcontroller.

Protocol Buffers: Protocol Buffers was initially developed as binary format/protocol where new fields could easily be added and it is not required to know all fields when parsing.

https://developers.google.com/protocol-buffers/docs/overview

The format has evolved and it is possible to automatically generate serialization and deserialization code. The code is generated from .proto files describing the structure. The Nanopb project is a Protocol Buffers implementation for embedded systems.

https://github.com/nanopb/nanopb

Protocol Buffers adds two of the pros mentioned for XML to a binary format - Manage changes and easy update by hand. Maybe the format could also help with handling unicode text strings.

I am considering using Protocol Buffers for my next project or look into one of the newer binary formats like Cap'n Proto or FlatBuffers.

Stuart Donnan is also a fan of protocol buffers:

I think XML is a good choice when you have lots of memory available and a solid library for parsing the contents. Rolling your own software here is a sure way to create security problems.

For lower capability systems binary encoding is almost a necessity due to resource consumption. In the past I have used Google Protocol Buffers for these applications. From https://developers.google.com/protocol-buffers/ :

Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler.

It has many advantages:

Compact binary serialization.
Simple schema definition via a single file that can be used by all applications using the format.
Future proof definitions handle updates / different versions easily.
Existing libraries are compact (2-20KiB depending on features and choice) and some avoid dynamic allocation.
Many supported languages for host system tooling (Python, Java, etc).

Check out the Google developer documentation to get a feel for the capabilities. Multiple third party libraries exist for use on embedded systems in C or C++.

Frank Hunleth contributed:

Regarding config file formats, I'd also suggest taking a look at libconfuse (https://github.com/martinh/libconfuse). It supports reading and writing textual configuration files with key/value pairs, lists, and hierarchical data. I've found it a little easier to work with and integrate than XML and INI-style parsers. On top of that, the library's maintainer is very responsive to issues and feature requests.

Mat Bennion wrote:

On the question of XML in embedded systems, I've done this a number of times now. It's always important to use the right tool for the job. But several aspects of XML make it more useful than binary. I should say first that I've moved from XML to JSON, which has most of the advantages but is more compact. The following points apply to both:

1. It is ubiquitous across all platforms so you no longer have to write a custom parser for your format.

2. It's widely understood by users; you no longer have to teach suppliers how your bespoke format works.

3. The use of schemas lets you define and validate your XML/JSON unambiguously; tools like Visual Studio will read your schema and validate / auto-suggest as you type.

4. You can mix-and-match opaque binary data using base64 elements which will be handled by your COTS encode/decode library.

5. Possibly the killer reason is that the cloud is based around JSON / XML; if you're using binary in the cloud, you'll probably just end up writing a parser to translate it into JSON anyway.

I can recommend the pugixml library (http://pugixml.org/) as being simple and effective for embedded use and free (MIT license).

Nathan Menhorn votes for XML:

At my organization, we use both XML and binary. All configurations are done on a computer and saved to and read from disk in XML. This allows for easier configuration management of the configuration files, version control, going back to test old configurations, making default configurations, upgrades, etc. Also, if a user doesn't have a configuration tool installed they still will be able to read the XML and know what's going on with the configuration. When the configuration is written to a device, it gets converted to binary and is stored in the device's memory in a binary format for efficiency purposes.

We /used/ to save and store configuration files in binary but all that did was create a lot more trouble than it was worth. The main issue is that you needed to know how to read the file in order to figure out the configuration.

Stjepan Henc is also a fan of JSON:

I believe XML usually has no advantages over JSON for simple data exchanges and storage. JSON is much simpler and has a much smaller overhead.

We (my company) deployed the JSMN (pronounced jasmin) small footprint JSON library in a Cortex-M3 based IoT prototype. We also used it for some on-board SPI communication between the Cortex-M3 and a Nios II processor. The library is MIT licensed, and can be downloaded from github (https://github.com/zserge/jsmn).

The reason we went with JSON was to ease debugging, allow easy extension of the protocol and make it easier to integrate with the host applications (be it mobile or anything else).

Most higher level languages have widely used JSON libraries, and JSON is in my opinion replacing XML as the default data exchange format on the web.

Security was an optional on that project, I would like to know what you and possibly your readers think about the JSMN library security-wise.

Scott Nowell has a down-to-Earth example:

I didn't think of that, but have recently used it. I have some configuration documents in XML that I was trying to understand. I wanted to convert them to .h files.

I found an on-line converter, https://codebeautify.org/xmlviewer, and saw that json was one of the options.

The problem with the XML is that it is hard to find the data in the tags. JSON is much cleaner and lays out more like C. Here is a comparison:

          <ar:Partitions> 
             <ar:Partition> 
               <ar:PartitionDefinition Name="systemManagement"  Identifier="1"/> 
               <ar:PartitionPeriodicity Duration="20000000"  Period="100000000"/> 
               <ar:MemoryRegions> 
                 <ar:MemoryRegion Type="RAM" Size="1048576"  
      Name="mainMemory" 
                 AccessRights="READ_WRITE"/> 
                 <ar:MemoryRegion Type="Flash" Size="524288"  
      Name="Flash" 
                 AccessRights="READ_ONLY"/> 
               </ar:MemoryRegions> 
               <ar:PartitionPorts> 
                 <ar:PartitionPort> 
                   <ar:QueuingPort MaxMessageSize="30" 
      Name="Stat_2Dq" 
                   MaxNbMessage="30" Direction="DESTINATION"/> 
                 </ar:PartitionPort> 
                 <ar:PartitionPort> 
                   <ar:QueuingPort MaxMessageSize="30" 
      Name="Stat_3Dq" 
       
JSON 
               "Partitions": { 
                  "Partition": [ 
                      { 
                         "PartitionDefinition": { 
                            "_Name": "systemManagement", 
                            "_Identifier": "1", 
                            "__prefix": "ar" 
                         }, 
                         "PartitionPeriodicity": { 
                            "_Duration": "20000000", 
                            "_Period": "100000000", 

 
 
 

                         }, 
                         "MemoryRegions": { 
                            "MemoryRegion": [ 
                                 { 
                                    "_Type": "RAM", 
                                    "_Size": "1048576", 
                                    "_Name": "mainMemory", 
                                    "_AccessRights": "READ_WRITE", 
                                    "__prefix": "ar" 
                                 }, 
                                 { 
                                     "_Type": "Flash", 
                                    "_Size": "524288", 
                                    "_Name": "Flash", 
                                    "_AccessRights": "READ_ONLY", 
                                    "__prefix": "ar" 
                                 } 
                            ], 
                            "__prefix": "ar" 
                         }, 
                         "PartitionPorts": { 
                            "PartitionPort": [ 
                                 { 
                                    "QueuingPort": { 
                                       "_MaxMessageSize": "30", 
                                       "_Name": "Stat_2Dq", 
                                       "_MaxNbMessage": "30", 
                                       "_Direction": "DESTINATION", 
                                       "__prefix": "ar" 
                                    }, 
                                    "__prefix": "ar" 
                                 }, 
                                 { 
                                    "QueuingPort": { 
                                       "_MaxMessageSize": "30", 
                                       "_Name": "Stat_3Dq",,

Jim Donelson wrote:

We have found JSON to be relatively light weight enough to use in embedded projects.

JSON is Like XML because:

Both JSON and XML are "self describing" (human readable)
Both JSON and XML are hierarchical (values within values)
Both JSON and XML can be parsed and used by lots of programming languages
Both JSON and XML can be fetched with an XMLHttpRequest

JSON is Unlike XML because:

JSON doesn't use end tag
JSON is shorter
JSON is quicker to read and write
JSON can use arrays

The biggest difference is:

XML has to be parsed with a big XML parser. JSON can be parsed by a smaller and less complicated parser

Why JSON is Better Than XML

XML is much more difficult to parse than JSON.

And Charles Manning sent this:

XML is not really human readable. Most XML front-ends that read/write xml rewrite (and reformat) the whole file every time they change a field making it very difficult to see changes. Try storing Eclipse project files in source control to see what I mean.

Consider JSON, which is supported in C by libjson. https://www.w3schools.com/js/js_json_xml.asp

JSON is similar to XML, but is more lightweight, more human friendly and with less angle brackets and stuff to poke your eyes out.

Dealing With Complex I/O

In Muse 329 we discussed complex I/O on modern MCUs. Mat Bennion had some thoughts on this:

I've been using TI's Starterware and Processor Development Kit for the AM335x (Sitara) processors for the past few months. It's a mixed bag - literally "mixed", with little coherence between the various parts. A few examples:

The sample code for the Ethernet peripheral (a fairly complex device with two ports and hardware switching capability) comes with a fully-featured TCP/IP stack that doesn't use memory copies - thus requiring a complicated memory management system that spans right from the top-level sockets down to the interrupt service routine. In the likely event of not wanting to use that stack, you need to dismantle 30,000 lines of code to get to the Ethernet driver. A simpler example would have been much more useful.
I really wanted to find a simple bootloader example, but failed. There's plenty of support for building Linux boot images, but little or nothing bare bones. The one example I did find, I couldn't program, because the FlashWriter tool works with some dev boards, but not others. There's also missing documentation, e.g. I've found a description of what options can be selected by the jumpers on the dev board, but not which jumper position corresponds to which option!
The PDK documentation has been produced via Doxygen, so is really just a collection of function header comments - there's no overview or structure.
The structure of the source files is not explained, and it's complex, with a typical sample application pulling files from multiple parts of the installation tree.
A colleague used one of the samples to get text displayed on the dev board's LCD panel in minutes - so that feels like a success. But given how much is going on behind the scenes (e.g. font rendering), it's unlikely to be very re-usable; I imagine you'd be better off running Linux if you were planning to do graphics, to take advantage of well-maintained, standardised drivers.

In essence, it's 10 or 20 sample applications that may or may not be relevant to you and are likely to include far more than your want, so most of your effort is spent in stripping them down, rather than understanding the key points about the peripheral. They feel like a half-way house: I imagine that users will either want to run Linux out-of-the-box, in which case these applications are irrelevant as "someone" will already have written a full suite of drivers, or they'll be running bare metal, in which case these applications are irrelevant as they're too complicated.

I'd have preferred much simpler examples, each one self-contained and well-documented in a coherent folder structure. I didn't see any signs of a GUI for configuring the peripherals and I can't really envisage how this would work given the diversity of usages. However, my hardware friends did use the pin mux tool to generate C code, so that seems worthwhile.

Point/Counterpoint on Embedded Ransomware

Martin Thompson had some comments about an article on security that ran last issue. His comments are in italics, prefixed by the initials MJT.

What are possible counter measures?

The most basic pre-requisite for an attack as described here is the knowledge about the specific microcontroller and bootloader mechanism used. This information can be obtained by either monitoring/tracing the CAN/CANopen communication during the firmware update process or by access to a computer that has this information stored. Protecting these in the first place has the highest priority.

The designer has to make sure that the firmware update process is not easy to reengineer just by monitoring the CAN/CANopen communication of a firmware update procedure. Things that we can often learn just by monitoring a firmware reprogramming cycle:

How is the bootloader activated? Often the activation happens through a specific read/write sequence.
Counter measure: Only allow authorized partners to activate the bootloader, best by using encryption such as CANcrypt or at least a challenge/response mechanism that is not repetitive.

[MJT] It is a common mistake to conflate encryption with authentication. What is important when protecting against this attack is to make sure that only authentic code is allowed to execute. It should not matter (for security) if the attacker can see the code (which is what encryption protects against), it is vital that the code is protected against tampering. Cryptographic Signatures are the usual way to achieve this, not encryption.

A non-repetitive challenge response to "open up" the communications with the bootloader is an excellent idea, but the calculation of the correct response must involve some cryptographic primitive and some secret material (ie a key) that the attacker cannot get hold of. Using something like a CRC because it "looks hard" to "predict" is a classic mistake.

What file format is used? ".hex" or binary versions of it can easily be recognized.
Counter measure: Use encryption or authentication methods to prohibit that "any" code can be loaded by your own bootloader.

[MJT] Again: encryption does not help. Authentication is what is required. And then it doesn't matter how easily recognisable the format that you use is.

What CRC is used? Often a standard-CRC stored at end of the file or loadable memory.
Counter measure: If file format doesn't use encryption, at least encrypt the CRC or better use a cryptographic hash function instead of a plain CRC.

[MJT] Even an encrypted CRC provides small protection against tampering. A hash on its own will also not help as the attacker can just manipulate the hash to match their tampered firmware. Some kind of secret (the key) must be employed. Examples are using a cryptographic signature or a *keyed* hash (also known as a Message Authentication Code).

Protecting the secrets is the "key", if you'll excuse the pun, to security. This is known as Kerckhoff's principle - even if the attacker knows everything about how your system operates, if they do not have the keys they can still not influence it.

And it's also the hard bit that is glossed over in many security treatments (eg the immobiliser chip that puts the same key in every keyfob, meaning that once one keyfob is broken you can start any car with that style of immobiliser!)

Does it make sense to grind markings off the CPU chip? But with so many devices using Cortex-M parts, a smart attacker could make some assumptions about the processor type.

[MJT] It does makes life harder for the first attacker, but once someone has figured it out, the internet means everyone knows - Kerckhoff still applies, even if your attacker knows what chips you have used, they should still not be able to successfully attack you without the secret key material.

If there's a debug port, should that be closed off?

[MJT] IMHO: yes. It should at least be password protected. And (also IMHO) with a different password for every instance of your controller. Yes, this is a pain! (There is an inevitable trade-off between security and convenience, we are moving to a world where security has to begin to take a higher precedence). This is particularly important if there are secret keys stored in the processor which could just be read out using the debugger (unless they are protected by some additional measures, like debugger censoring, or a Hardware Security Module (HSM)), but it provides another layer of protection against attackers whatever.

Given that so many devices now sport nice GUIs and connectivity, Linux is a logical choice of operating systems. But it is big and vulnerable, so how does one manage Linux patches and upgrades? I can't help but wonder if it makes sense to use an RTOS coupled with GUI/networking packages from the RTOS vendor. These typically have a smaller attack surface than a big OS.

[MJT] That could be of value for many systems. As always, there is a trade-off, there is no perfect security, you "just" have to decide what is sufficient for your product. But don't underestimate the ingenuity of attackers, and don't make newbie mistakes by rolling your own security, take advantage of the well-documented ways of doing things.

I found this series of challenges instructive – demonstrating simple flaws that have been seen widely in real systems, and how easy they can be to attack…

https://cryptopals.com/

Thor Johnson weighed in on this as well:

Some comments on Embedded Ransomware -- we've already seen some in the wild:
Stuxnet (impressive amount of work)
LG Smart TV Android Ransomware

Stuxnet didn't demand payment, but it sabotaged centrifuges causing all manner of damage.

The LG was a stock android ransomware hack that was a royal PITA to fix/unbrick the TV (especially since LG wasn't being very helpful).

We've also seen hacks against routers (though most run a real operating system under the hood, so there's some debate about "embeddedness"), and we've seen some attacks on cars (the Jeep incident that was mentioned). And we've seen attacks published (though I don't think malware-in-the-wild) for insulin pumps.

And a lot of time, security isn't fully baked in -- in the HVAC world, even though the Trane Summit PC software asks for a username/password, the raw protocols underneath (eg BACnet) have no idea about such logins -- if you have access and can send a BACnet command, the device will do what you ask of it (even the big master-control-devices) -- the protocol has no understanding of security (and I'll be guilty as well; our protocol products are Modbus because "at the end of the day, everyone can speak Modbus over RS485" -- so you could mess up the water meter by changing the scaling and everything).

I think the biggest thing about "ransomware" vs bricking/griefware is that up till now, you didn't have a user interface that you could use to demand payment (and tracking down payments was easier 'till now) -- if the elevator system is bricked, you gotta fix it (which may be quite hard if you have several controllers re-infecting each other). Paying a bitcoin to unbrick your lift probably never entered your mind (even if it was possible), but...

The connected world opens up all manner of shenanigans -- even if the insulin pump's firmware is secured, it's designed to take a signal from your pendant/phone/pc so that you can adjust the dose; if there's a security breach in the pendant/phone/pc or protocol talking to the pump (like using BACnet or Modbus), a "pay 10 bitcoins or you're gonna die" is a very real possibility, so securing the firmware is not enough.

Jobs!

Let me know if you’re hiring embedded engineers. No recruiters please, and I reserve the right to edit ads to fit the format and intent of this newsletter. Please keep it to 100 words. There is no charge for a job ad.

Joke For The Week

Note: These jokes are archived at www.ganssle.com/jokes.htm.

Paul Carpenter sent a link to the ideal keyboard for a certain kind of programmer: http://devhumor.com/media/the-only-keyboard-most-quot-programmers-quot-need

Advertise With Us

Advertise in The Embedded Muse! Over 27,000 embedded developers get this twice-monthly publication. .

About The Embedded Muse

The Embedded Muse is Jack Ganssle's newsletter. Send complaints, comments, and contributions to me at jack@ganssle.com.

The Embedded Muse is supported by The Ganssle Group, whose mission is to help embedded folks get better products to market faster.