Error Handling and Resolving: What You Need to Know

This is Part 1 of 2 of our Error Handling mini-series. You can download a PDF of both (extended) posts here in our Error Handling Handbook.

error_handlingApplication programmers and system developers are accountable to operators, engineers, and managers for the entire functionality of systems. This includes systems’ ease of use, UI usefulness and presentation, Takt time, uptime, and troubleshooting. At the same time, programmers and developers must keep in mind that systems may be used in-house or by external clients and users both nationally and internationally.

Oftentimes, end users are unaware of how these systems actually work, and are unable to determine when things have gone wrong or are about to go wrong. Based on this, proper error handling is essential. Errors that aren’t reported or recovered can mean wasted time, money, and resources in troubleshooting the UUT, rebooting computers, or running tests without data collection. Errors that are not resolved can cause outputs to rail, processes to go out of control, and systems to not shut down, resulting in disastrous consequences.

3 Components of Error Handling 

Reporting: Error reporting is a minimum necessity that involves passing error messages or prompts to the user. Without proper reporting, errors can go undetected and can result in lost time and resources and even catastrophe. For example, consider a system that is intended to write a data log file to a directory while the operator performs a two-hour test.

However, an error occurs when the system attempts to use a directory that doesn’t exist. The result: the entire test must be rerun after it is discovered that the directory was non-existent. A simple error prompt to the operator indicating the issue could have saved a lot of time. Error messages should be well developed, clear, and be important enough to appear as pop-ups on the screen. Sometimes errors messages are cryptic and mean nothing to users, and sometimes users pay little attention to error messages because they are used to quickly clicking on the pop-ups just to get them off the screen

Recovering: Another component of error handling is error recovering or correcting, which involves anticipating any problems that may occur in the code, and correcting the undesirable behavior. Applying error correcting, programmers can identify reasons systems may fail by creating “error traps” and writing the code to fix those errors automatically.

In the previous example of a program writing to a non-existent directory, developers can anticipate that a data directory may not exist, and develop code to create a directory when the write to file function reports an error. As another example, in serial communication, programmers can build into the software a check for a null response to a data request.

Upon getting this condition, the code can flush the port and resend the data request. While error correcting may seem like a great way to code, there are errors that cannot be fixed without human intervention. In addition, programmers may spend a vast amount of time in coding trying to account for every possible error that might occur.

Resolving: The third component is error resolving, which typically involves stopping the operator’s process or shutting down the system gracefully while the error is investigated. Error resolution consists of deciding what action to take when an error is reported. The next section looks at this component in further detail.

3 Levels of Error Resolving

The decision and actions involved in Error Resolving depend on the type and cause of each error. Errors typically fall into one of three categories, each with an increasingly higher level of severity and resolution needed.

Runtime, user, or input errors: These are caused by non-desired or invalid input data, states, or conditions as a result of invalid characters in filenames, poor UI development (such as not specifying units of measure like mA or A), or poor UI presentation (giving the operator every control and indicator at every point in the operation). These errors have a low level of severity and do not require shutting down the process or system. These errors can and should be detected, reported, recovered, and prevented.

Process errors: Process errors can be caused by conditions that are beyond the scope of operation, safety trips, communication errors, or data buffer overflow. They have a mid-level of severity, indicating that the safe or valid operation of a system has been compromised and the process should be gracefully stopped.

System errors: Unrecoverable system or resource errors can cause loss of system control and cannot be resolved without human intervention. These errors can be caused by DAQ resource issues such as inability to respond or initialize, unresponsive smart devices, or loss of feedback or control of external systems or devices. System errors have a high level of severity and require a system shutdown until they can be resolved.

To continue learning about error handling, download our handbook below. 

New Call-to-action

Topics: Software & Mobile