No way to abort a job that failed on non-existing farm node/workflow

Here you can submit bugreports
Post Reply
veks
Posts: 79
Joined: Fri Oct 25, 2019 6:51 am

No way to abort a job that failed on non-existing farm node/workflow

Post by veks »

Hi,
there's no way to abort a job that's being started in a workflow that's being run on a server that's DOWN.
I've rebooted the main FFAStrans server, services, aborted several times and jobs try to start each time again and again...

How to solve this?

Tnx!
admin
Site Admin
Posts: 1658
Joined: Sat Feb 08, 2014 10:39 pm

Re: No way to abort a job that failed on non-existing farm node/workflow

Post by admin »

Hi veks,

Depending on how your system is configured and where a job was in the process when the server went down, the only host that will respond to abort is the host being down. If the host don't come back we are in a stuck situation where we basically have an uncompleted orphan job not going anywhere.
But you must also be aware that if you have a workflow configured to run on a host that is down it basically wont work until you reconfigure it to run on other available hosts, or bring the host back.

Currently there is no mechanisms for "cleaning" up that mess left by this scenario other than manually delete db files assosiated with the job. This is not complicated but it's not preffered. Will your host never come back up and running?

-steinar
Post Reply