With the mind boggling number of intelligent devices announced to deploy in the coming years, the limit of the PC architecture as a baseline becomes more and more acute. Personal computers have been created and further refined during almost two decades of technical improvement with the idea of a human being facing the computer. Obviously, this cannot be applicable to operational computers. Here comes the need for computer health management and remote health management of computers.
A lot of technologies exist for IT servers, now in use in the cloud industry and telecommunication networks. With these expensive and elaborate health data collection, sophisticated user programmable alert profiles and other smart monitoring tools and data aggregators, a single human being can control complete warehouses of alleys with a large number of blades hosted in large black cabinets.
This works in an industry where conditions to invest and master those tools and technologies are met: the operational costs are of paramount importance, the deployed equipment is vastly homogeneous (big supercomputers are made of just a unique type of computing blade), and quality of service is what keeps you in business.
Go / no go / replace – is this still enough?
None of this applies to computers that are currently deployed in many industries. Although based on the same PC architecture, they are all different in shape, functions, firmware, operating system and application content. Most of their current users are quite content with a basic approach to quality of service: go / no go / replace. Either the computer is ON and you expect everything to be okay, or the computer is not and you need to replace it with a spare machine.
And even more remote from the IT industry is our own use of technology, with the small intelligent devices we start to use in our household for entertainment today and more serious stuff tomorrow. How can we trust the technology? How do we know it is working properly?
Computer health management: the user needs to be in focus
Computer health management for the masses still must be created. A future household will be a system of system, a kind of private cloud (or so we would expect it to stay private, but this is a topic for next time). We need simple rules and interfaces for everyone to immediately get the situation under control. We will need a tool like a car dashboard which will interface multiple smart devices with a user (yes there is still a user somewhere: us). This dashboard will work by aggregating information regarding the health of each device of the system.
The health management dashboard: required information
So what could be a synthetic and simple grammar for this health management for the masses?Here is my take on this: a list of simple questions the health management should answer:
Is the device present?
Is the device on?
Is the device configuration the same as the reference config? (Has something changed?)
Is the device operating within its margins? (And this is clearly our role, as ECT manufacturer, just like a car manufacturer, to gather and synthetize this information for the user)
Is there something requiring attention or preventive maintenance?
One last thing I’d like to see happen in the future.
Is the device application payload operating properly?
I know it is a bit extreme, since most approaches today do the exact opposite: the application payload tries to find the underlying computer health information and gather it along its operational data.
Why should we now do the opposite? Because it is the simplest way for a uniformed and smart health management. All these devices use a very similar architecture (processor, memory, storage, network, sensors) although they fulfill a multitude of different missions. Isn’t it obvious that a health management infrastructure for billions of devices should be based on the underlying machine health, augmented with a simple vision of the operational payload health, rather than a hundred of different application status and interfaces? Once we know the computer works okay in the same configuration as the day before,operating within its margin and is running the same code, we can expect the operational application to run and perform. Can’t we?
What is your opinion on this? Would you add more or something different? Please chime in.