3P1 Case Study.
3P1 were engaged to fault-find a product within the buildings industry. The product had been developed by their internal team, before then being maintained by a contractor. The product had been running for over 15 years while suffering from a dangerous fault that was previously declared unfixable. The product lifts weights of around 100kg and it’s safety-critical, with lives depending upon the product’s correct operation.
The challenging option of a complex multi-stage installation.
The product lifts a weight up to around 100kg and requires very careful installation on site. Our team discussed the product’s on-site installation with the facility’s landlord. They advised us that their advisor would require a consultancy fee. The electrical work needed to be signed off by the installer, then also by the landlord. The landlord would require removal and remediation were 3P1 to leave the premises. Installation takes around 3 hours.
There was a second option of installing the product with a dummy load. This would involve using a similar mechanical weight without fixing it to the walls. On investigation, this option also required the support of a similar weight, which needed to be procured, shipped on-site and maintained. This option was not considered attractive – there was nowhere to put the dummy weight, and if it toppled there was nowhere safe for it to fall.
Troubleshooting the fault
The product consisted of two Printed Circuit Boards (PCBs) connected by Controller Area Network (CANBus). Communication between the PCBs and the outside world is mostly polled general-purpose input/output (GPIO). By the time the signal reaches the microcontroller unit (MCU), it appears as GPIO lines going up and down. The two firmware files were written as single C files. Both applications use one timer and the CANBus interface, and no other interrupts.
Both applications use electrically erasable programmable read-only memory (EEPROM) to store application settings. Running the embedded application for any length of time could cause hardware damage, since pausing the MCU could leave a drive control signal in an unsafe state. The tools used only allow one instance to run at a time, so we would require two software licenses, running on two separate machines, to debug hardware that doesn’t really want to be debugged. Having investigated the problem, our team understood why various other contractors and consultants had investigated the issue and deemed the problem “unfixable”.
An innovative solution to an “unsolvable” hardware problem.
Having carefully analysed all options and worked through several scenarios, our team opted to solve the issues by taking an alternative approach. We used the desktop, negating the need for hardware altogether.
FreeRTOS can run functions as tasks, so we can have two functions (the “main” functions from the microcontrollers) running at the same time. We can use FreeRTOS message queues to allow the two tasks to communicate. We can use FreeRTOS timers just like microcontroller timers. The microcontroller toolchain treats GPIO accesses like variable reads and writes, so all we needed to do was create our own variables. We used the desktop file system just like EEPROM is used to store settings.
We built a working software model to develop our solution:
The two C files were renamed and appropriate #ifdef statements were included to remove peripheral register accesses.
The two C files are instead run as tasks.
Timers are set up to run the timer ISRs from callbacks.
The two C files are instead run as tasks.
CANBus messaging is replaced with message queues in each direction.
GPIO registers are replaced with variables with the same names, with getter and setter functions externally accessible.
Adopting this approach removed the complexities of the internal hardware workings supporting the applications. Those challenges are removed and translated into controls that can be easily managed using the desktop program.
Simulating the fault and testing our solution.
Having reached this point testing the product’s setup became relatively simple:
Scripts were created to carry out the usual power-on sequence. Files were created on disk to simulate the sequence of button presses and hardware signals leading to the failure reported in the field. Armed with this information it was easy to see the problem. One of the GPIOs was being polled from the main routine, not the interrupt routine. Because of this, a long blocking function in one of the main loops was causing it to be updated out of sequence. The reset signal was being missed and the related state machine was not being updated. Armed with this knowledge, we went back to the main loop and moved the state machine update into the interrupt. This cured the problem, enabling correct operation.
Key Areas of Expertise
Electronics, hardware, software, PCBs, simulation, product testing, MCUs.
3P1 leveraged our electronics simulation experience to gain these benefits:
· Avoid an expensive and complex on-site hardware installation.
· Run two separate MCU applications on the desktop.
· Enabled us to debug two applications running concurrently (which their previous toolchain did not permit).
· The added bonus of now being able to have full simulation control to test different setups.
· Eliminated the need for hardware to troubleshoot and maintain their product.
· We solved an “unfixable” fault, while vastly reducing the time requirement, cost and complexity of the project.
“3P1 team investigated all the options and understood the complexity of our problem and how dangerous it was for our workforce. We’d already tried to have our product fixed many times and our other providers had deemed it unfixable. Their solution enabled us to finally fix the product and as a bonus, they also reduced the time requirement, cost and complexity of installation. We now have a safe workplace for our team, the ability to control the product from the desktop and troubleshoot any problems easily. We are very impressed by their expertise. Our decision to hire a team with the specific skills required has paid off”