Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.os.linux.misc > #56433 > unrolled thread

setting load limit for atd batch system?

Started byWoozy Song <suzyw0ng@outlook.com>
First post2024-05-24 10:34 +0800
Last post2024-05-24 16:18 +0000
Articles 4 — 3 participants

Back to article view | Back to comp.os.linux.misc


Contents

  setting load limit for atd batch system? Woozy Song <suzyw0ng@outlook.com> - 2024-05-24 10:34 +0800
    Re: setting load limit for atd batch system? Woozy Song <suzyw0ng@outlook.com> - 2024-05-24 12:24 +0800
      Re: setting load limit for atd batch system? Andreas Eder <a_eder_muc@web.de> - 2024-05-24 17:25 +0200
    Re: setting load limit for atd batch system? Rich <rich@example.invalid> - 2024-05-24 16:18 +0000

#56433 — setting load limit for atd batch system?

FromWoozy Song <suzyw0ng@outlook.com>
Date2024-05-24 10:34 +0800
Subjectsetting load limit for atd batch system?
Message-ID<v2oucb$252d9$1@dont-email.me>
So the atd supposedly will not start another job until load factor falls 
below a limit. Different documentation gives the default as 0.8 or 1.5
Now I launch a job that uses 4 cores on a 6-core CPU. If I run top 
command, I see four processes running close to 100%.
Now if I submit another job 10 seconds later, that starts thereby 
overloading the CPU. Documentation suggests setting load limit to more 
than n-1 for n CPU cores, but I think that is intended for single-thread 
jobs. I have tried altering the load limit in atd.service file to all 
sorts of values, but second job keeps starting while the first is 
flogging the CPU. I check with 'ps -ef|grep atd' to see it is using the 
desired load limit. I am aware that the load factor is an average, you 
can see it changes slowly in top/htop/glances. So I also increase the 
delay between jobs to 30 seconds, but still nothing works. So it looks 
like I have to specify a time like 'now+60 minutes' when I submit, 
requiring some guess how long first job runs. I know I can install a 
proper job scheduler such as Some Grid Engine, but that is more work.
This is on Debian 11, by the way.

[toc] | [next] | [standalone]


#56435

FromWoozy Song <suzyw0ng@outlook.com>
Date2024-05-24 12:24 +0800
Message-ID<v2p4p5$25ver$1@dont-email.me>
In reply to#56433
Woozy Song wrote:
> So the atd supposedly will not start another job until load factor falls 
> below a limit. Different documentation gives the default as 0.8 or 1.5
> Now I launch a job that uses 4 cores on a 6-core CPU. If I run top 
> command, I see four processes running close to 100%.
> Now if I submit another job 10 seconds later, that starts thereby 
> overloading the CPU. Documentation suggests setting load limit to more 
> than n-1 for n CPU cores, but I think that is intended for single-thread 
> jobs. I have tried altering the load limit in atd.service file to all 
> sorts of values, but second job keeps starting while the first is 
> flogging the CPU. I check with 'ps -ef|grep atd' to see it is using the 
> desired load limit. I am aware that the load factor is an average, you 
> can see it changes slowly in top/htop/glances. So I also increase the 
> delay between jobs to 30 seconds, but still nothing works. So it looks 
> like I have to specify a time like 'now+60 minutes' when I submit, 
> requiring some guess how long first job runs. I know I can install a 
> proper job scheduler such as Some Grid Engine, but that is more work.
> This is on Debian 11, by the way.

I found the trick: you have to add '-q B' to command, then load-limit 
rule applies (it behaves like batch command instead of at). Otherwise it 
uses default queue 'a' that only uses time without load limit.

[toc] | [prev] | [next] | [standalone]


#56437

FromAndreas Eder <a_eder_muc@web.de>
Date2024-05-24 17:25 +0200
Message-ID<87msofz0yf.fsf@eder.anydns.info>
In reply to#56435
On Fr 24 Mai 2024 at 12:24, Woozy Song <suzyw0ng@outlook.com> wrote:

> Woozy Song wrote:
>> So the atd supposedly will not start another job until load factor falls
>> below a limit. Different documentation gives the default as 0.8 or 1.5
>> Now I launch a job that uses 4 cores on a 6-core CPU. If I run top
>> command, I see four processes running close to 100%.
>> Now if I submit another job 10 seconds later, that starts thereby
>> overloading the CPU. Documentation suggests setting load limit to more
>> than n-1 for n CPU cores, but I think that is intended for single-thread
>> jobs. I have tried altering the load limit in atd.service file to all
>> sorts of values, but second job keeps starting while the first is flogging
>> the CPU. I check with 'ps -ef|grep atd' to see it is using the desired
>> load limit. I am aware that the load factor is an average, you can see it
>> changes slowly in top/htop/glances. So I also increase the delay between
>> jobs to 30 seconds, but still nothing works. So it looks like I have to
>> specify a time like 'now+60 minutes' when I submit, requiring some guess
>> how long first job runs. I know I can install a proper job scheduler such
>> as Some Grid Engine, but that is more work.
>> This is on Debian 11, by the way.
>
> I found the trick: you have to add '-q B' to command, then load-limit rule
> applies (it behaves like batch command instead of at). Otherwise it uses
> default queue 'a' that only uses time without load limit.

I think ot is '-q b', the small letter b for the batch queue (a is for at). 
The other letters are not used by default and just serve to indicate niceness.

'Andreas

-- 
ceterum censeo redmondinem esse delendam

[toc] | [prev] | [next] | [standalone]


#56439

FromRich <rich@example.invalid>
Date2024-05-24 16:18 +0000
Message-ID<v2qekd$2d6e3$2@dont-email.me>
In reply to#56433
In comp.os.linux.misc Woozy Song <suzyw0ng@outlook.com> wrote:
> So the atd supposedly will not start another job until load factor falls 
> below a limit. Different documentation gives the default as 0.8 or 1.5
> Now I launch a job that uses 4 cores on a 6-core CPU. If I run top 
> command, I see four processes running close to 100%.
> Now if I submit another job 10 seconds later, that starts thereby 
> overloading the CPU. Documentation suggests setting load limit to more 
> than n-1 for n CPU cores, but I think that is intended for single-thread 
> jobs. I have tried altering the load limit in atd.service file to all 
> sorts of values, but second job keeps starting while the first is 
> flogging the CPU. I check with 'ps -ef|grep atd' to see it is using the 
> desired load limit. I am aware that the load factor is an average, you 
> can see it changes slowly in top/htop/glances. So I also increase the 
> delay between jobs to 30 seconds, but still nothing works. So it looks 
> like I have to specify a time like 'now+60 minutes' when I submit, 
> requiring some guess how long first job runs. I know I can install a 
> proper job scheduler such as Some Grid Engine, but that is more work.
> This is on Debian 11, by the way.

An alternative to using at and batch (batch is what observes the load 
limit by-the-way) is to install Task Spooler and use it for 'background 
jobs'.  You can tell it to run jobs sequentially, or max X in parallel 
(you get to pick X).

https://viric.name/soft/ts/

You can also submit jobs that "depend upon" other jobs, so that the 
dependent job won't run until the "parent" completes successfully.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.os.linux.misc


csiph-web