monitoring_thread2.exe CPU usage

mfennermfenner
My monitoring_thread2.exe is using almost 100% of the CPU. We have less than 50 checks. I have updated to 10.1.3.



The CPU only ramps up after about 10 minutes of usage.



I have replaced my monitoring_thread2.exe with the one found here - https://www.dropbox.com/s/1nmq45a0pru889f/monitoring_thread2.exe but that didn't work either.

Comments

  • AdministratorAdministrator
    It seems that a specific check is causing that. Let's try to isolate it.



    Run the monitoring_thread2.exe in debug mode and as soon as it hits 100% stop it. Reply with the debug output (only copy the portion for the last minute prior to killing it)



    1) Stop the ServersCheck Monitoring Service.

    2) From the command prompt in your main ServersCheck directory type:

    monitoring_thread2.exe > debuglog.txt



    Let it run and watch the cpu usage through the task manager. As soon as it hits 100% (after starting up - the 10minutes or so like you explained) then stop it (CTRL+C)
  • mfennermfenner
    Here is the log for the approximate time when the CPU spiked:



    # S-X Wed Apr 10 09:06:54 2013 skipping Availability graphs - only value graphs plotted (true - 1 - 1)

    # S-X Wed Apr 10 09:06:54 2013 skipping SLA graphs - only value graphs plotted (true - 1 - 1)

    # S-X Wed Apr 10 09:06:54 2013 update c:serverscheck_databases1347649418PING-value.rrd 1365610014:0

    # S-9 Wed Apr 10 09:06:54 2013 1347649248PING - setting Threadcheckstarted for thread S-9 to 1365610014 - stored:1365610014

    # S-9 Wed Apr 10 09:06:54 2013 1347649248PING - Starting check - 71761

    # S-X Wed Apr 10 09:06:55 2013 1347649248PING Day Graphs Debug - Refresh rate: 300 - Drawgraphs:1 - 1:Last at 1365610015 (1365610015)

    # S-X Wed Apr 10 09:06:55 2013 skipping Availability graphs - only value graphs plotted (true - 1 - 1)

    # S-X Wed Apr 10 09:06:55 2013 skipping SLA graphs - only value graphs plotted (true - 1 - 1)

    # S-X Wed Apr 10 09:06:55 2013 update c:serverscheck_databases1347649248PING-value.rrd 1365610015:2

    # S-9 Wed Apr 10 09:06:55 2013 save 1347649248PING<!!>1365610015<!!>71762<!!>0<!!>1365610015<!!>

    # S-9 Wed Apr 10 09:06:55 2013 1347649248PING - s:OK - e:Error level: 15 (returned value) greater than 5 - v:2 - t:180

    # S-10 Wed Apr 10 09:06:55 2013 save 1347649418PING<!!>1365610015<!!>71764<!!>0<!!>1365610014<!!>

    # S-10 Wed Apr 10 09:06:55 2013 1347649418PING - s:OK - e:Error level: 27 (returned value) greater than 5 - v:0 - t:181

    # S-9 Wed Apr 10 09:06:56 2013 1347649248PING - finished in 2 sec

    # S-9 Wed Apr 10 09:06:56 2013 1347649248PING - didn't time out

    # S-10 Wed Apr 10 09:06:56 2013 1347649418PING - finished in 2 sec

    # S-10 Wed Apr 10 09:06:56 2013 1347649418PING - didn't time out

    # S-7 Wed Apr 10 09:06:59 2013 1347313261POWERUP - setting Threadcheckstarted for thread S-7 to 1365610019 - stored:1365610019

    # S-7 Wed Apr 10 09:06:59 2013 1347313261POWERUP - Starting check - 2671

    # S-7 Wed Apr 10 09:06:59 2013 Getting data from SensorGateway - hdhpub for 1347313261POWERUP - .1.3.6.1.4.1.17095.3.6.0

    # S-7 Wed Apr 10 09:07:00 2013 SNMP returned string: OK

    # S-X Wed Apr 10 09:07:00 2013 Check down status for 1347313261POWERUP 2672

    # S-X Wed Apr 10 09:07:00 2013 1347313261POWERUP Day Graphs Debug - Refresh rate: 300 - Drawgraphs:1 - 1:Last at 1365610020 (1365610020)

    # S-X Wed Apr 10 09:07:00 2013 skipping Availability graphs - only value graphs plotted (true - 1 - 1)

    # S-X Wed Apr 10 09:07:00 2013 skipping SLA graphs - only value graphs plotted (true - 1 - 1)

    # S-7 Wed Apr 10 09:07:00 2013 save 1347313261POWERUP<!!>1365610020<!!>2672<!!>0<!!>1365610020<!!>

    # S-7 Wed Apr 10 09:07:00 2013 1347313261POWERUP - s:OK - e: - v:OK - t:180

    # S-7 Wed Apr 10 09:07:01 2013 1347313261POWERUP - finished in 2 sec

    # S-7 Wed Apr 10 09:07:01 2013 1347313261POWERUP - didn't time out

    # S-4 Wed Apr 10 09:07:03 2013 1347295361SSLCERT - timed out! Reason:

    # S-7 Wed Apr 10 09:07:05 2013 1347922407HUMIDITY - setting Threadcheckstarted for thread S-7 to 1365610025 - stored:1365610025

    # S-7 Wed Apr 10 09:07:05 2013 1347922407HUMIDITY - Starting check - 68956

    # S-7 Wed Apr 10 09:07:05 2013 Getting data from SensorGateway - hdhpub for 1347922407HUMIDITY - .1.3.6.1.4.1.17095.3.10.0

    # S-7 Wed Apr 10 09:07:05 2013 SNMP returned string: 58.96

    # S-X Wed Apr 10 09:07:06 2013 1347922407HUMIDITY Day Graphs Debug - Refresh rate: 300 - Drawgraphs:1 - 1:Last at 1365610026 (1365610026)

    # S-X Wed Apr 10 09:07:06 2013 skipping Availability graphs - only value graphs plotted (true - 1 - 1)

    # S-X Wed Apr 10 09:07:06 2013 skipping SLA graphs - only value graphs plotted (true - 1 - 1)

    # S-X Wed Apr 10 09:07:06 2013 update c:serverscheck_databases1347922407HUMIDITY-value.rrd 1365610026:58.96

    # S-10 Wed Apr 10 09:07:06 2013 1357770718DNS - setting Threadcheckstarted for thread S-10 to 1365610026 - stored:1365610026

    # S-10 Wed Apr 10 09:07:06 2013 1357770718DNS - Starting check - 31826

    # S-7 Wed Apr 10 09:07:06 2013 save 1347922407HUMIDITY<!!>1365610026<!!>68957<!!>0<!!>1365610026<!!>

    # S-7 Wed Apr 10 09:07:06 2013 1347922407HUMIDITY - s:OK - e:Error level: 80.01 (returned value) greater than 80 - v:58.96 - t:181

    # S-X Wed Apr 10 09:07:06 2013 testing against DNS (type: A) - result:

    # S-X Wed Apr 10 09:07:07 2013 Check down status for 1357770718DNS 31827

    # S-X Wed Apr 10 09:07:07 2013 1357770718DNS Day Graphs Debug - Refresh rate: 300 - Drawgraphs:1 - 1:Last at 1365610027 (1365610027)

    # S-7 Wed Apr 10 09:07:07 2013 1347922407HUMIDITY - finished in 2 sec

    # S-7 Wed Apr 10 09:07:07 2013 1347922407HUMIDITY - didn't time out

    # S-X Wed Apr 10 09:07:07 2013 skipping Availability graphs - only value graphs plotted (true - 1 - 1)

    # S-X Wed Apr 10 09:07:09 2013 skipping SLA graphs - only value graphs plotted (true - 1 - 1)

    # S-10 Wed Apr 10 09:07:09 2013 save 1357770718DNS<!!>1365610029<!!>31827<!!>0<!!>1365610027<!!>

    # S-10 Wed Apr 10 09:07:09 2013 1357770718DNS - s:OK - e:Could not resolve host isatap.hds.ad.ucsd.edu - v: - t:183

    # S-5 Wed Apr 10 09:07:09 2013 1347295425SSLCERT - timed out! Reason:

    # S-7 Wed Apr 10 09:07:09 2013 1358019433PING - setting Threadcheckstarted for thread S-7 to 1365610029 - stored:1365610029

    # S-7 Wed Apr 10 09:07:09 2013 1358019433PING - Starting check - 30426

    # S-X Wed Apr 10 09:07:09 2013 1358019433PING Day Graphs Debug - Refresh rate: 300 - Drawgraphs:1 - 1:Last at 1365610029 (1365610029)

    # S-X Wed Apr 10 09:07:09 2013 skipping Availability graphs - only value graphs plotted (true - 1 - 1)

    # S-X Wed Apr 10 09:07:09 2013 skipping SLA graphs - only value graphs plotted (true - 1 - 1)

    # S-X Wed Apr 10 09:07:09 2013 update c:serverscheck_databases1358019433PING-value.rrd 1365610029:1

    # S-7 Wed Apr 10 09:07:09 2013 save 1358019433PING<!!>1365610029<!!>30427<!!>0<!!>1365610029<!!>

    # S-7 Wed Apr 10 09:07:09 2013 1358019433PING - s:OK - e:Error level: 11 (returned value) greater than 10 - v:1 - t:180

    # S-10 Wed Apr 10 09:07:10 2013 1357770718DNS - finished in 4 sec

    # S-10 Wed Apr 10 09:07:10 2013 1357770718DNS - didn't time out

    # S-7 Wed Apr 10 09:07:10 2013 1358019433PING - finished in 1 sec

    # S-7 Wed Apr 10 09:07:10 2013 1358019433PING - didn't time out

    # S-X Wed Apr 10 09:07:38 2013 importing

    # S-X Wed Apr 10 09:07:39 2013 Importing

    # S-1 Wed Apr 10 09:07:44 2013 1347649574SSLCERT - timed out! Reason:

    # S-X Wed Apr 10 09:07:46 2013 Threads import process completed

    # S-X Wed Apr 10 09:07:46 2013 importing done

    # S-X Wed Apr 10 09:07:46 2013 thread 0 (S-0) IDLE threadtime: 1365610065 - timediff: 1

    # S-X Wed Apr 10 09:07:46 2013 thread 1 (S-1) MONITORING - 1347649574SSLCERT for 110 secs threadtime: 1365609637 - timediff: 429

    # S-X Wed Apr 10 09:07:46 2013 thread 2 (S-2) IDLE threadtime: 1365610065 - timediff: 1

    # S-X Wed Apr 10 09:07:46 2013 thread 3 (S-3) IDLE threadtime: 1365610066 - timediff: 0

    # S-X Wed Apr 10 09:07:46 2013 thread 4 (S-4) MONITORING - 1347295361SSLCERT for 429 secs threadtime: 1365609637 - timediff: 429

    # S-X Wed Apr 10 09:07:46 2013 thread 5 (S-5) MONITORING - 1347295425SSLCERT for 122 secs threadtime: 1365609637 - timediff: 429

    # S-X Wed Apr 10 09:07:46 2013 thread 6 (S-6) IDLE threadtime: 1365610064 - timediff: 2

    # S-X Wed Apr 10 09:07:46 2013 thread 7 (S-7) IDLE threadtime: 1365610065 - timediff: 1

    # S-X Wed Apr 10 09:07:46 2013 thread 8 (S-8) MONITORING - 1347998948SSLCERT for 52 secs threadtime: 1365609638 - timediff: 428

    # S-X Wed Apr 10 09:07:46 2013 thread 9 (S-9) IDLE threadtime: 1365610065 - timediff: 1

    # S-8 Wed Apr 10 09:07:52 2013 1347998948SSLCERT - timed out! Reason:



    Anything in there look suspicious? Thanks
  • AdministratorAdministrator
    Something seems not right as the SSLCert checks all time out.



    Not sure if that is causing it.



    Can you pause them and see if the behavior is still the same?
This discussion has been closed.