Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.databases.postgresql > #854 > unrolled thread

PostgreSQL 10.3 leaving "exiting" (defunct) processes on AIX platform

Started byinverasln@gmail.com
First post2019-03-18 13:28 -0700
Last post2019-03-20 18:24 +0000
Articles 4 — 2 participants

Back to article view | Back to comp.databases.postgresql


Contents

  PostgreSQL 10.3 leaving "exiting" (defunct) processes on AIX platform inverasln@gmail.com - 2019-03-18 13:28 -0700
    Re: PostgreSQL 10.3 leaving "exiting" (defunct) processes on AIX platform Laurenz Albe <laurenz@nospam.pn> - 2019-03-18 23:05 +0000
      Re: PostgreSQL 10.3 leaving "exiting" (defunct) processes on AIX platform inverasln@gmail.com - 2019-03-20 10:53 -0700
        Re: PostgreSQL 10.3 leaving "exiting" (defunct) processes on AIX platform Laurenz Albe <laurenz@nospam.pn> - 2019-03-20 18:24 +0000

#854 — PostgreSQL 10.3 leaving "exiting" (defunct) processes on AIX platform

Frominverasln@gmail.com
Date2019-03-18 13:28 -0700
SubjectPostgreSQL 10.3 leaving "exiting" (defunct) processes on AIX platform
Message-ID<9ca2169a-3b2d-4631-a5ef-a6f20c36fcdd@googlegroups.com>
Curious to know if anyone else has experienced this issue before.

I have a Postgresql 10.3 database on AIX 7.1 platform and this usually does not give us any problems. But after a recent install on a server that's handling about 275 users, we suddenly started seeing "exiting" processes from every call to open the database.

Now when the problem first starts, the exiting processes disappear after a few seconds. This is normal. But the busier the system gets, the longer it takes for them to disappear until it could be stuck there for hours. And the problem is that these exiting processes appear to still use up one or more of the max_connections, leading to a situation where we run out of connections.


Here's an example of what I'm referring to:

# ps -ef |grep exiting |wc -l
    6250

# ps -ef |grep exiting |tail -5
       - 33818386        -   -                     - <exiting>
       - 33949478        -   -                     - <exiting>
       - 34015016        -   -                     - <exiting>
       - 34080578        -   -                     - <exiting>
       - 34211634        -   -                     - <exiting>

# proctree 33818386
4653070    /apps/pg_10.3/bin/postgres
   33818386    <defunct>


According to IBM AIX documentation, the exiting/defunct process will wait until the parent PID replies that it no longer needs the exit status of the subprocess. And so it looks like PostgreSQL may not be sending that reply, or is somehow delayed.

Has anyone else experienced a similar issue? I'm wondering if I need to update my O/S which is already at a pretty current level, or perhaps arrange to update the Postgres database to 10.6 or higher. 

I guess my question: Is this potentially a bug in Postgres and how it releases processes?

Thx

Steve N.

[toc] | [next] | [standalone]


#855

FromLaurenz Albe <laurenz@nospam.pn>
Date2019-03-18 23:05 +0000
Message-ID<q6p86v$rlv$1@dont-email.me>
In reply to#854
On Mon, 18 Mar 2019 13:28:00 -0700, inverasln wrote:

> Curious to know if anyone else has experienced this issue before.
> 
> I have a Postgresql 10.3 database on AIX 7.1 platform and this usually
> does not give us any problems. But after a recent install on a server
> that's handling about 275 users, we suddenly started seeing "exiting"
> processes from every call to open the database.
> 
> Now when the problem first starts, the exiting processes disappear after
> a few seconds. This is normal. But the busier the system gets, the
> longer it takes for them to disappear until it could be stuck there for
> hours. And the problem is that these exiting processes appear to still
> use up one or more of the max_connections, leading to a situation where
> we run out of connections.
> 
> 
> Here's an example of what I'm referring to:
> 
> # ps -ef |grep exiting |wc -l
>     6250
> 
> # ps -ef |grep exiting |tail -5
>        - 33818386        -   -                     - <exiting>
>        - 33949478        -   -                     - <exiting>
>        - 34015016        -   -                     - <exiting>
>        - 34080578        -   -                     - <exiting>
>        - 34211634        -   -                     - <exiting>
> 
> # proctree 33818386 4653070    /apps/pg_10.3/bin/postgres
>    33818386    <defunct>
> 
> 
> According to IBM AIX documentation, the exiting/defunct process will
> wait until the parent PID replies that it no longer needs the exit
> status of the subprocess. And so it looks like PostgreSQL may not be
> sending that reply, or is somehow delayed.

The processes are not zombies yet, they are still dying. 

It seems to be this problem:
https://www.postgresql.org/message-id/flat/554a2676-9b2f-7ecc-d675-
d52f75b5ef4f%40postgrespro.ru#3cd8f5307c2c1004614bc9fb7a526abd

Apparently rebuilding PostgreSQL without mmap support can solve the 
problem.

Yours,
Laurenz Albe

[toc] | [prev] | [next] | [standalone]


#859

Frominverasln@gmail.com
Date2019-03-20 10:53 -0700
Message-ID<049241c0-e486-42a0-add2-68e3e0ca3f6b@googlegroups.com>
In reply to#855
I can confirm that the version of postgres we have installed does indeed contain mmap. I can see when using the "dump -T" comamnd in AIX.

dump -T postgres |egrep "\[Index|mmap"
[Index]      Value      Scn     IMEX Sclass   Type           IMPid Name
[14]    0x00000000   0x0000     0x08   0x0a    0x0          0x0001 mmap

The question now is to how to get that to be not included if we compile postgres. I'll give the details to our development guys and see if this is something that they can try. Thanks for the suggestion and the direction, as it gives us a starting point.

SteveN

[toc] | [prev] | [next] | [standalone]


#860

FromLaurenz Albe <laurenz@nospam.pn>
Date2019-03-20 18:24 +0000
Message-ID<q6u0he$ad5$1@dont-email.me>
In reply to#859
On Wed, 20 Mar 2019 10:53:07 -0700, inverasln wrote:

> The question now is to how to get that to be not included if we compile
> postgres.

I have looked into that in some more detail, and here is what you can do:

- Edit "src/backend/port/sysv_shmem.c" and remove the three lines

  #ifndef EXEC_BACKEND
  #define USE_ANONYMOUS_SHMEM
  #endif

  Then PostgreSQL will be built using System V shared memory.

- Wait for PostgreSQL v12.

  Commit f1bebef60ec8f557324cd3bfc1671da1318de968 has introduced a
  configuration parameter "shared_memory_type" that you can set to
  "sysv" to use System V shared memory.

PostgreSQL v12 is due this fall.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.databases.postgresql


csiph-web