Reliability - JawaharT/Best-Practices-On-Azure-Sphere GitHub Wiki
Introduction
Reliability can be considered to be performing consistently and be relied on to be accurate, from the amount of memory used to power management across a period of time.
For a system to be reliable, it has to be consistent in its functionality for a period of time. In embedded system development, this period of time is usually for multiple years. Traditional devices such as home thermostats are difficult and expensive to upgrade in the same consistency as smartphones. Also, there are environmental concerns associated with frequent upgrades due to the issues of old electronics disposal, these include:
- Heavy metals such as Mercury, Lead, Beryllium, and Cadmium found in PCBs can pollute groundwater if disposed of incorrectly
- Plastics that are non-biodegradable
- Heated e-waste can release toxins contributing to global warming
Therefore, it is important to be able to keep your devices as long as possible to avoid the environmental issues described above as well as to reduce frequent hardware upgrades. One of the ways to perform this is to understand how your IoT devices will perform beyond its intended lifespan through software reliability practices.
Best Practices for Reliable C Development
To reliably take advantage of Azure Sphere and write C that can last a long time which is important for IoT and Edge devices. Here are some C related best practices for your next project:
-
Create an Enum for exitcodes for debugging purposes as well as any potential errors that may occur in the future once deployed. This can also be used as a plan to make sure any possible error that could occur be handled appropriately (easy for troubleshooting). There are pre-defined exit codes as Standard in C such as '0' for Success.
-
It is ideal to use as many static and constants as possible and avoid dynamic variables
-
Embedded systems will have a reusable philosophy, and in the case of Azure Sphere, it is ideal to design C programs with methods that can be reused for simplicity, debugging, maintenance, which will in turn help create effective software updates to the program as there will be a less likely chance the program will need to be refactored
-
Perform software and hardware tests such as static code analysis and multimeter tests to make sure your connected electronics work before development begins
These can help keep hardware run longer also assist in keeping components last.
To keep an embedded system reliable for as long as possible it is important as a developer to find and create appropriate timeslots for updates to:
-
Latest version of Azure Sphere SDK, to keep it as secure as possible
-
Enable 'cloud-test' command which will also disable the C GDB debugger to put the board onto its deployment state:
-
Remove the debugging server
-
Microsoft and cloud-loaded application updates are enabled
-
Device is restarted, starts running the application installed
-
If other connections are detected the Security built into the operating system will prevent it from further damage and all data remains uncorrupted
-
-
Obtain run-time telemetry to remotely diagnose any devices
-
Also it is ideal to update devices in the field to fix existing or to provide new functionality
What are Watchdog Timers?
These are simple countdown timers after a specific time interval has elapsed inside the SOC, if the board has gone beyond its intended usage. For example, if the program has executed incorrectly, corrupted data or memory. Watchdog Timers can also be useful to detect software anomalies and automatically reset the board if any are detected, especially if the hardware is used in harsh environments.
Testing
Testing is an important aspect of software and hardware development, this is because it provides analytics on whether or not to trust the device when deployed. Testing from a software standpoint is important to consider the Test-Driven Development (TDD) approach. This approach suggests writing tests before the algorithm itself because it provides the baseline for finding any possible errors or missing requirements which will ensure that that it works as intended once solved.
However Azure Sphere development does not work in a similar way to desktop applications, sometimes it may or may not be possible to write programs that can have unit tests written around functions without significantly changing the software itself as it is difficult to perform in embedded systems.
For Embedded Systems
Create long-term test environments for specific staged releases. But if it is possible to write your code that can be tested with unit tests then it is recommended.
However hardware testing is more appropriate, where the code is uploaded to an embedded system (PCB), external mechanisms can be developed to test the original system that has been designed. These hardware tests can range from simple multimeter tests to make sure components like Inductors, Capacitors, and resistors work before beginning device development to remove the possibility of faulty components.
Summary
From this unit, you have learned the best ways to keep your devices running as long as possible with the most minimal amount of degradation to the hardware possible, including the importance of testing. It is best to keep these tips in mind as you continue this module and on your projects.