Years back when I was still earning a living through writing and maintaining codes, I’d a colleague, a System Engineer who is looking after the servers that our application was running on. As with any application, ours had bugs too. And, occasionally, the bugs surfaced, bring about a service disruption. This affected quite a lot of users, and naturally our first response was to restore the service (incident management).
Our colleague, however, was more focused on problem resolution. A scheduled weekly restart at off-peak hours were not acceptable. Likewise, we cannot be always increasing the heap size to address OOM issue. He would insist on fixing the problem instead of having quick fixes. At that point of time, these did frustrate me.
Back then, I was still green and lacked ITIL knowledge. Now, I begin to appreciate that our quick fix or workarounds were not wrong, our intention was to quickly restore the service so that business continues as usual. My colleague was not wrong either, for insisting on problem resolution.
Through the hard way of learning, I come to realize that not all problems have a solution and sometimes, a workaround could actually become the permanent fix to a problem. Not to mention that problem resolution takes time, and while searching for the answer, business must continue and hence, a workaround if available can provide the required time.
Perhaps what was missing at that time was the ITIL training for both the application teams and our system engineers. If we knew what were incident and problem management, perhaps both teams could have worked together with better synergy.