close to 100% CPU load

Offline

close to 100% CPU load

Resolved

0 votes

Not sure if it is the same problem as other report, but my server Clearos 6.5 is running close to 100% now for a number of days. I am a newby on managing a server, so I have tried to replicate what I found on this forum to test where the load is coming from.

I think it is related to system-mysqld although top tells me that it is not driving the load to 100%, but it is almost continuously on top of top.



top - 23:30:31 up  3:04,  3 users,  load average: 2.41, 1.99, 2.17

Tasks: 391 total,   2 running, 389 sleeping,   0 stopped,   0 zombie

Cpu(s): 11.9%us,  2.7%sy,  0.0%ni,  0.0%id, 85.4%wa,  0.0%hi,  0.0%si,  0.0%st

Mem:   1030072k total,  1016564k used,    13508k free,     1696k buffers

Swap:  2064376k total,  1418040k used,   646336k free,   101332k cached



  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

 2676 system-m  20   0  686m 537m 3116 S 12.6 53.5   5:45.86 system-mysqld

19078 root      20   0  2988 1240  828 R  0.7  0.1   0:00.13 top

   16 root      20   0     0    0    0 R  0.3  0.0   0:24.71 kblockd/0

   29 root      20   0     0    0    0 S  0.3  0.0   0:39.52 kswapd0

 3860 root      20   0 22908  13m  13m S  0.3  1.3   0:15.73 pmacctd

 3867 root      20   0 22100 9884 5848 S  0.3  1.0   0:07.91 pmacctd

 4457 snort     20   0  297m  13m 3880 S  0.3  1.4   0:34.59 snort

 4729 root      20   0 30956 1864 1460 S  0.3  0.2   0:03.47 X

 4851 root      20   0  3428  200  164 S  0.3  0.0   0:31.87 snortsam

 4872 clearcon  20   0  349m 9228 4108 S  0.3  0.9   0:29.38 gconsole

 4937 plex      20   0  203m 6064 1536 S  0.3  0.6   0:11.08 Plex DLNA Serve

    1 root      20   0  2948  892  784 S  0.0  0.1   0:01.15 init

    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd

In var/log/mysqld.log I find this information:



140502 20:25:33 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended

140502 20:27:48 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql

140502 20:27:48  InnoDB: Initializing buffer pool, size = 8.0M

140502 20:27:48  InnoDB: Completed initialization of buffer pool

140502 20:27:49  InnoDB: Started; log sequence number 0 152806009

140502 20:27:49 [Note] Event Scheduler: Loaded 0 events

140502 20:27:49 [Note] /usr/libexec/mysqld: ready for connections.

Version: '5.1.73'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution

Not sure if it has anything to do with it, but the dashboard in the UI takes ages to load, in particular the charts on memory and CPU usage.

Any help to get me to the next step is appreciated.

Regards, Ronald

In Database

Friday, May 02 2014, 09:37 PM

No. Favourite

Share this post:

Responses (77)

Accepted Answer
Ben Chambers

Offline
Saturday, May 03 2014, 12:53 PM - #Permalink
Resolved

0 votes

Hi Ronald,

Yup..your load is high, but the sluggishness (eg. of dashboard loading) is caused by the extreme wait times on I/O activity (85.4%wa). You're out of memory and hitting swap which only makes the situation worse.

I'd like to see the output of

ps afxw

I'll bet you have more than one script running that is trying to update your reports (parsing out the log file to the system MySQL database).

If you know how to kill processes, you can do that...a Windows TM reboot would also prevent that.

Is your software up-2-date? There was a release about 3-4 weeks ago that prevented multiple instances of some of the report scripts to run.

Finally...more memory never hurts...at 1G...you're right on the edge.

B.
The reply is currently minimized Show

Accepted Answer

Ronald

Offline

Saturday, May 03 2014, 02:02 PM - #Permalink

Resolved

0 votes

Thanks Ben, I will have a look to insert some more memory., you are right it is small. This is because I started with a small redundant box to see if I like to run a server on Clearos, and as I really like the clearos setup the usage is growing.

here is my ouput

[root@stevenaar ~]# ps afxw

  PID TTY      STAT   TIME COMMAND

    2 ?        S      0:00 [kthreadd]

    3 ?        S      0:00  \_ [migration/0]

    4 ?        S      0:02  \_ [ksoftirqd/0]

    5 ?        S      0:00  \_ [migration/0]

    6 ?        S      0:00  \_ [watchdog/0]

    7 ?        S      0:08  \_ [events/0]

    8 ?        S      0:00  \_ [cgroup]

    9 ?        S      0:00  \_ [khelper]

   10 ?        S      0:00  \_ [netns]

   11 ?        S      0:00  \_ [async/mgr]

   12 ?        S      0:00  \_ [pm]

   13 ?        S      0:00  \_ [sync_supers]

   14 ?        S      0:00  \_ [bdi-default]

   15 ?        S      0:00  \_ [kintegrityd/0]

   16 ?        S      1:30  \_ [kblockd/0]

   17 ?        S      0:00  \_ [kacpid]

   18 ?        S      0:00  \_ [kacpi_notify]

   19 ?        S      0:00  \_ [kacpi_hotplug]

   20 ?        S      0:00  \_ [ata_aux]

   21 ?        S      0:00  \_ [ata_sff/0]

   22 ?        S      0:00  \_ [ksuspend_usbd]

   23 ?        S      0:00  \_ [khubd]

   24 ?        S      0:00  \_ [kseriod]

   25 ?        S      0:00  \_ [md/0]

   26 ?        S      0:00  \_ [md_misc/0]

   27 ?        S      0:00  \_ [linkwatch]

   28 ?        S      0:00  \_ [khungtaskd]

   29 ?        D      1:37  \_ [kswapd0]

   30 ?        SN     0:00  \_ [ksmd]

   31 ?        S      0:00  \_ [aio/0]

   32 ?        S      0:00  \_ [crypto/0]

   37 ?        S      0:00  \_ [kthrotld/0]

   39 ?        S      0:00  \_ [kpsmoused]

   40 ?        S      0:00  \_ [usbhid_resumer]

   71 ?        S      0:00  \_ [kstriped]

   99 ?        S      0:00  \_ [ttm_swap]

  100 ?        S<     0:07  \_ [kslowd000]

  101 ?        S<     0:07  \_ [kslowd001]

  121 ?        S      0:00  \_ [scsi_eh_0]

  122 ?        S      0:00  \_ [usb-storage]

  153 ?        S      0:00  \_ [scsi_eh_1]

  154 ?        S      0:00  \_ [scsi_eh_2]

  157 ?        S      0:00  \_ [scsi_eh_3]

  158 ?        S      0:01  \_ [usb-storage]

  180 ?        S      0:00  \_ [scsi_eh_4]

  181 ?        S      0:00  \_ [scsi_eh_5]

  307 ?        S      0:09  \_ [kdmflush]

  309 ?        S      0:00  \_ [kdmflush]

  326 ?        D      0:27  \_ [jbd2/dm-0-8]

  327 ?        S      0:00  \_ [ext4-dio-unwrit]

  706 ?        S      0:00  \_ [kdmflush]

  750 ?        S      0:00  \_ [jbd2/sda1-8]

  751 ?        S      0:00  \_ [ext4-dio-unwrit]

  752 ?        S      0:00  \_ [jbd2/dm-2-8]

  753 ?        S      0:00  \_ [ext4-dio-unwrit]

  754 ?        S      0:00  \_ [jbd2/sdc1-8]

  755 ?        S      0:00  \_ [ext4-dio-unwrit]

  756 ?        S      0:00  \_ [jbd2/sdb-8]

  757 ?        S      0:00  \_ [ext4-dio-unwrit]

  810 ?        S      0:01  \_ [kauditd]

  925 ?        S      0:07  \_ [flush-253:0]

 1194 ?        S      0:00  \_ [rpciod/0]

 1797 ?        S      0:00  \_ [lockd]

 1798 ?        S      0:00  \_ [nfsd4]

 1799 ?        S      0:00  \_ [nfsd4_callbacks]

 1800 ?        S      0:00  \_ [nfsd]

 1801 ?        S      0:00  \_ [nfsd]

 1802 ?        S      0:00  \_ [nfsd]

 1803 ?        S      0:00  \_ [nfsd]

 1804 ?        S      0:00  \_ [nfsd]

 1805 ?        S      0:00  \_ [nfsd]

 1806 ?        S      0:00  \_ [nfsd]

 1807 ?        S      0:00  \_ [nfsd]

 4442 ?        S      0:00  \_ [bluetooth]

25986 ?        S      0:00  \_ [flush-8:32]

    1 ?        Ss     0:01 /sbin/init

  405 ?        S<s    0:00 /sbin/udevd -d

  703 ?        S<     0:00  \_ /sbin/udevd -d

 4544 ?        S<     0:00  \_ /sbin/udevd -d

  972 ?        S<sl   0:05 auditd

  990 ?        Ss     0:00 /sbin/portreserve

  997 ?        Ssl    0:04 /usr/sbin/nslcd

 1010 ?        Sl     0:06 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5

 1175 ?        Ss     0:00 rpcbind

 1200 ?        Ss     0:00 rpc.statd -p 662 -o 2020

 1226 ?        Ss     0:00 dbus-daemon --system

 1237 ?        S      0:01 avahi-daemon: running [stevenaar.local]

 1238 ?        Ss     0:00  \_ avahi-daemon: chroot helper

 1253 ?        Ss     0:01 cupsd -C /etc/cups/cupsd.conf

 1278 ?        Ss     0:00 /usr/sbin/acpid

 1287 ?        Ssl    0:00 hald

 1288 ?        S      0:00  \_ hald-runner

 1333 ?        S      0:00      \_ hald-addon-input: Listening on /dev/input/event3 /dev/input/event0 /dev/input/event1

 1337 ?        S      0:00      \_ hald-addon-acpi: listening on acpid socket /var/run/acpid.socket

 1742 ?        Ssl    0:12 /usr/sbin/slapd -h ldap://127.0.0.1/ ldaps://127.0.0.1 ldaps://192.168.2.111/  -u ldap

 1757 ?        Ss     0:00 winbindd

 1787 ?        S      0:00  \_ winbindd

 3879 ?        S      0:00  \_ winbindd

 1792 ?        Ss     0:00 rpc.mountd -p 892

 1828 ?        Ss     0:00 rpc.idmapd

 1839 ?        Ssl    0:12 /usr/sbin/nscd

 1945 ?        S      0:00 /bin/sh /usr/libexec/ipsec/_plutorun --debug  --uniqueids yes --force_busy no --nocrsend no --strictcrlpolicy no --nat_traversal yes --keep_alive  --p

 1950 ?        S      0:00  \_ /bin/sh /usr/libexec/ipsec/_plutorun --debug  --uniqueids yes --force_busy no --nocrsend no --strictcrlpolicy no --nat_traversal yes --keep_alive

 1952 ?        Sl     0:00  |   \_ /usr/libexec/ipsec/pluto --nofork --secretsfile /etc/ipsec.secrets --ipsecdir /etc/ipsec.d --use-netkey --uniqueids --nat_traversal --virtual_

 2177 ?        S      0:00  |       \_ _pluto_adns

 1951 ?        S      0:00  \_ /bin/sh /usr/libexec/ipsec/_plutoload --wait no --post

 1946 ?        S      0:00 logger -s -p daemon.error -t ipsec__plutorun

 1964 ?        S      0:02 /usr/sbin/dnsmasq -s nl

 2005 ?        Ssl    3:09 java -Xmx80m -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=10 -Djava.library.path=/opt/pcmonitor/bin/../native -jar /opt/pcmonitor/bin/../lib/pcmonitor

 2187 ?        S      0:16 /usr/sbin/snmpd -LS0-6d -Lf /dev/null -p /var/run/snmpd.pid

 2237 ?        S      0:02 arpwatch -u arpwatch -e - -i eth1 -f /var/lib/arpwatch/arp_eth1.dat

 2246 ?        Ssl    0:53 clearsyncd

 2319 ?        Ss     0:00 /usr/sbin/sshd

31433 ?        Ss     0:00  \_ sshd: root@notty

31437 ?        Ss     0:00  |   \_ /usr/libexec/openssh/sftp-server

32358 ?        Ss     0:00  \_ sshd: root@notty

32362 ?        Ss     0:00  |   \_ /usr/libexec/openssh/sftp-server

 4318 ?        Ss     0:00  \_ sshd: root@notty

 4320 ?        Ss     0:00  |   \_ -bash

26762 ?        Ss     0:00  \_ sshd: root@pts/0

26826 pts/0    Ss     0:00      \_ -bash

27573 pts/0    R+     0:00          \_ ps afxw

 2327 ?        Ss     0:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g

 2362 ?        Ssl    0:50 clamd

 2399 ?        S      0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=m

 2501 ?        Sl     0:30  \_ /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --s

 2553 ?        S      0:00 /bin/sh /usr/clearos/sandbox/usr/bin/mysqld_safe --defaults-file=/usr/clearos/sandbox/etc/my.cnf --datadir=/var/lib/system-mysql --socket=/var/lib/sys

 2676 ?        Sl    25:33  \_ /usr/clearos/sandbox/usr/libexec/system-mysqld --defaults-file=/usr/clearos/sandbox/etc/my.cnf --basedir=/usr/clearos/sandbox/usr --datadir=/var/l

 3571 ?        Ss     0:01 /usr/lib/cyrus-imapd/cyrus-master -d

 3637 ?        S      0:00  \_ imapd -s

 3638 ?        S      0:00  \_ imapd -s

 3642 ?        S      0:00  \_ imapd -s

 3643 ?        S      0:00  \_ imapd -s

 3644 ?        S      0:00  \_ imapd -s

 3645 ?        S      0:00  \_ imapd -s

27493 ?        D      0:00  \_ ctl_cyrusdb -c

 3581 ?        Ss     0:00 /usr/sbin/saslauthd -m /var/run/saslauthd -a pam

 3582 ?        S      0:00  \_ /usr/sbin/saslauthd -m /var/run/saslauthd -a pam

 3583 ?        S      0:00  \_ /usr/sbin/saslauthd -m /var/run/saslauthd -a pam

 3584 ?        S      0:00  \_ /usr/sbin/saslauthd -m /var/run/saslauthd -a pam

 3585 ?        S      0:00  \_ /usr/sbin/saslauthd -m /var/run/saslauthd -a pam

 3633 ?        S      0:02 idled

 3722 ?        Ss     0:00 /usr/libexec/postfix/master

 3731 ?        S      0:00  \_ qmgr -l -t fifo -u

20813 ?        S      0:00  \_ pickup -l -t fifo -u

 3734 ?        Ss     0:03 proftpd: (accepting connections)

 3767 ?        Ss     0:00 crond

 2499 ?        S      0:00  \_ CROND

 2500 ?        Zs     0:00      \_ [sh] <defunct>

 3797 ?        Ss     0:00 squid -f /etc/squid/squid.conf

 3800 ?        S      0:12  \_ (squid) -f /etc/squid/squid.conf

 3801 ?        S      0:00      \_ (pam_auth)

 3802 ?        S      0:00      \_ (pam_auth)

 3803 ?        S      0:00      \_ (pam_auth)

 3804 ?        S      0:00      \_ (pam_auth)

 3805 ?        S      0:00      \_ (pam_auth)

 3806 ?        S      0:00      \_ (pam_auth)

 3807 ?        S      0:00      \_ (pam_auth)

 3808 ?        S      0:00      \_ (pam_auth)

 3809 ?        S      0:00      \_ (pam_auth)

 3810 ?        S      0:00      \_ (pam_auth)

 3811 ?        S      0:00      \_ (pam_auth)

 3812 ?        S      0:00      \_ (pam_auth)

 3813 ?        S      0:00      \_ (pam_auth)

 3815 ?        S      0:00      \_ (pam_auth)

 3816 ?        S      0:00      \_ (pam_auth)

 3817 ?        S      0:00      \_ (unlinkd)

 3829 ?        S      0:01 /usr/bin/perl /usr/share/BackupPC/bin/BackupPC -d

 3835 ?        SN     0:00  \_ /usr/bin/perl /usr/share/BackupPC/bin/BackupPC_trashClean

 3842 ?        Ss     0:05 nmbd -D

 3845 ?        S      0:00  \_ nmbd -D

27571 ?        D      0:00  \_ nmbd -D

 3860 ?        Ss     1:26 pmacctd: Core Process [default]

 3865 ?        S      0:41  \_ pmacctd: MySQL Plugin [inbound]

26731 ?        S      0:00  |   \_ pmacctd: MySQL Plugin -- DB Writer [inbound]

26990 ?        S      0:00  |   \_ pmacctd: MySQL Plugin -- DB Writer [inbound]

 3867 ?        S      0:37  \_ pmacctd: MySQL Plugin [outbound]

26991 ?        S      0:00      \_ pmacctd: MySQL Plugin -- DB Writer [outbound]

 3870 ?        Ss     0:00 smbd -D

 3894 ?        S      0:00  \_ smbd -D

12116 ?        S      0:06  \_ smbd -D

 3896 ?        Ss     0:00 dansguardian-av -c /etc/dansguardian-av/dansguardian.conf

 3898 ?        S      0:00  \_ dansguardian-av -c /etc/dansguardian-av/dansguardian.conf

 3899 ?        S      0:00  \_ dansguardian-av -c /etc/dansguardian-av/dansguardian.conf

 3900 ?        S      0:00  \_ dansguardian-av -c /etc/dansguardian-av/dansguardian.conf

 3901 ?        S      0:00  \_ dansguardian-av -c /etc/dansguardian-av/dansguardian.conf

 3902 ?        S      0:00  \_ dansguardian-av -c /etc/dansguardian-av/dansguardian.conf

 3903 ?        S      0:00  \_ dansguardian-av -c /etc/dansguardian-av/dansguardian.conf

 3904 ?        S      0:00  \_ dansguardian-av -c /etc/dansguardian-av/dansguardian.conf

 3905 ?        S      0:00  \_ dansguardian-av -c /etc/dansguardian-av/dansguardian.conf

 3907 ?        S      0:00  \_ dansguardian-av -c /etc/dansguardian-av/dansguardian.conf

 3908 ?        S      0:00  \_ dansguardian-av -c /etc/dansguardian-av/dansguardian.conf

 3914 ?        Sl     0:27 /usr/libexec/dropbox/dropbox study

 3917 ?        Sl     0:25 /usr/libexec/dropbox/dropbox linda

 4096 ?        Ds     2:29 /usr/bin/monitorix -c /etc/monitorix.conf -p /var/run/monitorix.pid

 4310 ?        S      0:00 su -s /bin/sh plex -c . /etc/sysconfig/PlexMediaServer; cd /usr/lib/plexmediaserver; ./'Plex Media Server' > /dev/null 2>&1

 4316 ?        Ss     0:00  \_ sh -c . /etc/sysconfig/PlexMediaServer; cd /usr/lib/plexmediaserver; ./'Plex Media Server' > /dev/null 2>&1

 4323 ?        Sl     0:15      \_ ./Plex Media Server

 4406 ?        SNl    2:06          \_ Plex Plug-in [com.plexapp.system] /var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Content

 4937 ?        Sl     0:53          \_ /usr/lib/plexmediaserver/Plex DLNA Server

 4350 ?        Ss     0:02 /usr/sbin/webconfig

 4355 ?        S      0:24  \_ /usr/sbin/webconfig

 4356 ?        S      0:28  \_ /usr/sbin/webconfig

 4357 ?        S      0:22  \_ /usr/sbin/webconfig

 5026 ?        S      0:29  \_ /usr/sbin/webconfig

 5028 ?        D      0:22  \_ /usr/sbin/webconfig

 5083 ?        D      0:29  \_ /usr/sbin/webconfig

10681 ?        S      0:28  \_ /usr/sbin/webconfig

10684 ?        S      0:26  \_ /usr/sbin/webconfig

10685 ?        S      0:28  \_ /usr/sbin/webconfig

10985 ?        S      0:22  \_ /usr/sbin/webconfig

13488 ?        S      0:25  \_ /usr/sbin/webconfig

13491 ?        S      0:24  \_ /usr/sbin/webconfig

13492 ?        S      0:24  \_ /usr/sbin/webconfig

17255 ?        S      0:07  \_ /usr/sbin/webconfig

 4416 ?        Ss     0:00 pptpd

 4457 ?        Ssl    2:48 snort -i eth1 -u snort -g snort -D -c /etc/snort.conf

 4511 ?        Ssl    1:26 /usr/bin/transmission-daemon -b -t -a *.*.*.* -e /var/log/transmission/transmission.log

 4520 ?        Ss     0:00 /usr/bin/openvt -fwc 1 -- /bin/login -f clearconsole

 4528 ?        Ss     0:00  \_ login -- clearconsole

 4619 tty1     Ss+    0:00      \_ -bash

 4652 tty1     Sl+    0:59          \_ /usr/sbin/tconsole

 4661 ?        Ss     0:00              \_ /bin/sh /usr/bin/startx

 4717 ?        S      0:00                  \_ xinit /var/lib/clearconsole//.xinitrc -- /usr/bin/X :0 -auth /var/lib/clearconsole//.serverauth.4661

 4729 tty7     Ss+    0:22                      \_ /usr/bin/X :0 -auth /var/lib/clearconsole//.serverauth.4661

 4866 ?        Ss     0:00                      \_ sh /var/lib/clearconsole//.xinitrc

 4871 ?        S      0:00                          \_ /usr/bin/ratpoison

 4872 ?        Sl     2:40                          \_ /usr/lib/gconsole/gconsole

 4537 tty2     Ss+    0:00 /sbin/mingetty /dev/tty2

 4539 tty3     Ss+    0:00 /sbin/mingetty /dev/tty3

 4542 tty4     Ss+    0:00 /sbin/mingetty /dev/tty4

 4545 tty5     Ss+    0:00 /sbin/mingetty /dev/tty5

 4549 tty6     Ss+    0:00 /sbin/mingetty /dev/tty6

 4552 ?        Sl     0:00 /usr/sbin/console-kit-daemon --no-daemon

 5011 ?        S      0:00 dbus-launch --autolaunch 6e5ee68e56495e8f74db175c0000001d --binary-syntax --close-stderr

 5018 ?        Ss     0:00 /bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session

 5021 ?        S      0:00 /usr/libexec/gconfd-2

 2567 ?        Ss     0:00 syswatch

 2858 ?        Ss     0:31 snortsam /etc/snortsam.conf

 4363 ?        Ss     0:00 /usr/sbin/httpd

 4368 ?        S      0:00  \_ /usr/sbin/httpd

 4369 ?        S      0:00  \_ /usr/sbin/httpd

 4370 ?        S      0:00  \_ /usr/sbin/httpd

 4371 ?        S      0:00  \_ /usr/sbin/httpd

 4372 ?        S      0:00  \_ /usr/sbin/httpd

 4373 ?        S      0:00  \_ /usr/sbin/httpd

 4374 ?        S      0:00  \_ /usr/sbin/httpd

 4375 ?        S      0:00  \_ /usr/sbin/httpd

The reply is currently minimized Show

Accepted Answer
Nick Howitt

Offline
Saturday, May 03 2014, 04:00 PM - #Permalink
Resolved

0 votes

IDS (snort) and the Proxy (squid) are resource hogs. You are using 1.4GB of swap as well as 1GB of RAM. I'd expect your system to be crawling! I'd suggest at least 4GB RAM and if you can, increasing your swap (not so easy).

PS Can you put your output between [ code ] and [ /code ] removing the spaces between [ and ]. Then you get nicely formatted posts of screen dumps and files.
The reply is currently minimized Show
Accepted Answer

Ronald

Offline
Saturday, May 03 2014, 04:25 PM - #Permalink
Resolved

0 votes

Ok, as far as I am aware the only updates to the server have been the ones pushed by automatic updates, hence my server is up to date. YUM UPDATE and UPGRADE also return nothing.

I am wondering what has changed in the latest updates that make the CPU go mad. It is a server that sits in my home environment and I did not open up for load other than myself

I will try to get some incremental RAM, but what can do in the meantime to take the usage down.

Regards, Ronald
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Saturday, May 03 2014, 05:39 PM - #Permalink
Resolved

0 votes

Try stopping various services and see the effect. You can either look at the dashboard memory usage or do something like:
egrep 'Mem|Cache|Swap' /proc/meminfo
The first things I'd look at are the proxy (squid) and the IDS (snort).

I've just tried playing around with the "top" command. Run "top" then "M". This gives you top by memory usage. Then hit "f" then you can deselect some of the columns and add in things like Swapped Size etc.
The reply is currently minimized Show
Accepted Answer
Tony Ellis

Offline
Saturday, May 03 2014, 06:35 PM - #Permalink
Resolved

0 votes

In addition to Nick's suggestions, here are two more things to try that might provide useful info...

What's eating CPU?

watch -n 5 'ps axf | awk "{ if ( \$3 !~ /S/ ) { print; } }"'

In the following watch the "si" and "so" columns under swap for excessive swap activity. For changes to the command - the first number is the interval in seconds between updates - in this example 5, second number is the number of iterations - example is 32)

vmstat -w 5 32

"man ps" and "man vmstat" for more on these commands...
The reply is currently minimized Show

Accepted Answer

Ronald

Offline

Sunday, May 04 2014, 10:43 AM - #Permalink

Resolved

0 votes

I have been working with the tools both of you proposed an concluded that the 2 major swap and mem consumers are webconfig and system-mysqld. I killed the latter first and immediately mem and swap usage dropped back to normal and so did the waittime. Then I killed webconfig too and not surprisingly everything dropped even further back.

Then I started both again and after a few minutes the server was crawling again. I used htop to have a further look at the tree of mysql usage and looked in the mysqld.log too. I found that the InnoDB engine was creating most log entries.
With Google as my friend I found a couple of posts that were talking about the excessive loads that mysql creates on a server and the InnoDb engine in particular. As I have an app running on the server that is using the InnoDB engine I did not want to take the risk of switching it off entirely and work with MyISAM only.

I copied a couple of lines from a post that seem to address a similar problem as mine and put these in \etc\my.cnf:



## If open-files-limit is set very low, MySQL may increase on its own. Either

## way, increase this if MySQL gives 'too many open files' errors. Setting

## this above 65535 could be unwise (MySQL may crash).

open-files-limit                = 20000



### Cache

thread-cache-size               = 16

table-open-cache                = 4096

table-definition-cache          = 512



## Generally, it is unwise to set the query cache to be larger than 64-128M 

## as the costs associated with maintaining the cache outweigh the performance

## gains. A far superior solution would be to implement memcached, though this

## required modifying the application, among other things.

query-cache-type                = 1

query-cache-size                = 32M

query-cache-limit               = 1M



### Per-thread Buffers

sort-buffer-size                = 1M

read-buffer-size                = 1M

read-rnd-buffer-size            = 2M

join-buffer-size                = 1M



### Temp Tables

tmp-table-size                  = 64M 

max-heap-table-size             = 64M



### Networking

back-log                        = 100

max-connections                 = 50

max-connect-errors              = 10000

max-allowed-packet              = 16M

interactive-timeout             = 600

wait-timeout                    = 180

net_read_timeout        = 30

net_write_timeout       = 30

# This value is the size of the listen queue for incoming TCP/IP connections.

back_log            = 128





#### Storage Engines

## Set this to force MySQL to use a particular engine / table-type

## for new tables. This setting can still be overridden by specifying

## the engine explicitly in the CREATE TABLE statement.

default-storage-engine         = MyISAM



## Makes sure MySQL does not start if InnoDB fails to start. This helps

## prevent ugly silent failures.

innodb                          = FORCE



### MyISAM

## Not sure what to set this to?

## Try running a 'du -sch /var/lib/mysql/*/*.MYI'

## This will give you a good estimate on the size of all the MyISAM indexes.

## (The buffer may not need to set that high, however)

key-buffer-size                 = 2M

## This setting controls the size of the buffer that is allocated when 

## sorting MyISAM indexes during a REPAIR TABLE or when creating indexes 

## with CREATE INDEX or ALTER TABLE.

myisam-sort-buffer-size         = 2M



### InnoDB

## Note: While most settings in MySQL can be set at run-time, many InnoDB

## variables cannot be set at runtime as require restarting MySQL

###

## These settings control how much RAM InnoDB will use. Generally, when using

## mostly InnoDB tables, the innodb-buffer-pool-size should be as large as

## is possible without swapping or starving other processes of RAM. The other 

## two settings usually do not need to be changed, but can help for very large 

## datasets.

innodb-buffer-pool-size         = 285M

innodb-log-buffer-size          = 8M

To be sure I rebooted the server (was not sure if just restarting system-mysqld would do the trick) and from then my machine is OK again. I will try to add some RAM mem to it, but as it is SDRAM PC133 I am nt sure how much I can physically add given the limitation of the board.

I will edit this message if the problem reoccurs after a couple of days.

Regards, Ronald

The reply is currently minimized Show

Accepted Answer
Nick Howitt

Offline
Sunday, May 04 2014, 11:54 AM - #Permalink
Resolved

0 votes

You're beyond what I can really help with now except I can say that /etc/my.cnf is for mysql and /usr/clearos/sandbox/etc/my.cnf is for system-mysql so you may want to make changes to the latter. Also see the note here if you do make any changes.
The reply is currently minimized Show
Accepted Answer

Ronald

Offline
Sunday, May 04 2014, 03:44 PM - #Permalink
Resolved

0 votes

thanks Nick, I applied these to this cnf file too. So far so good. I have not touched the log files or sizes as refered to in the link.
will keep this forum posted if this does not work.

Regards, Ronald
The reply is currently minimized Show
Accepted Answer

rob youngquist

Offline
Wednesday, May 07 2014, 01:35 PM - #Permalink
Resolved

0 votes

Hi Nick. I was having the same issue. My RAM is low but my company does not want me shutting down the server.
I did up the max child processes because our user connections were higher than default and that helped big time.

I was wondering if you know if when I am looking at the GUI System > Resource >Processes and Page output
Is this " dansguardian child process " counts or overall processes on the actual Linux box?

And the Page output, is the actual " page swapping " stats so I know if it goes up due to the increased child processes / RAM usage ?
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Wednesday, May 07 2014, 04:38 PM - #Permalink
Resolved

0 votes

I'm afraid I've no idea what you are looking at. Are you running the Pro version? I suspect that if I I did know what you were looking at, I wouldn't know the answer.
The reply is currently minimized Show
Accepted Answer

rob youngquist

Offline
Wednesday, May 07 2014, 05:47 PM - #Permalink
Resolved

0 votes

It is ClearOS Enterprise 5.2. The Web GUI interface has a System > Resource Reports > page where you can select Processes and Page outputs. I was just wondering if this was " ALL " running processes on our server or just the " dansguardian " child processes. This would be nice if it is the child processes because then I could the history of child process counts instead of the real time,

" ps aux | grep dansguardian-av | wc -l "

which only shows me the current processes at that time.
If the GUI, which shows daily, weekly, monthly etc stats, I can see the max, avg and current child processes which would be very useful.
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Wednesday, May 07 2014, 06:23 PM - #Permalink
Resolved

0 votes

Have you looked at the "top" command from earlier in the thread to monitor memory usage by process. It is a surprisingly powerful tool?
The reply is currently minimized Show
Accepted Answer
Ben Chambers

Offline
Wednesday, May 07 2014, 06:29 PM - #Permalink
Resolved

0 votes

I saw this on a buddy's server just today...Pete told me it was caused by a tool that watches network traffic and imports it to the system-msyql table.

If you run:

ps afxw | grep pmacct

and see 1 or more entries, this is likely what is causing the load.

For now, I just removed this (and the network report unfortunately) until we get a fix out:

service pmacctd stop service system-mysql stop killall -9 pmacctd yum remove pmacct service system-mysql start

Load dropped right away to 0 after this.

B.
The reply is currently minimized Show
Accepted Answer

rob youngquist

Offline
Wednesday, May 07 2014, 08:17 PM - #Permalink
Resolved

0 votes

is " top " historical, like I can see averages or max for an entire day, week or month?
or is it just real time?
With the web GUI, System > Resource Reports > processes, I can see all that in a graph chart as well as current usage.
I just cant find out if the " Processes " refers to ALL processes on the server itself, or " dansguardian " Child Processes.

Also, the response of " network traffic " tool adding to CPU usage is very helpful.

So thank you both for your input.

I am still hoping to get an answer on my initial question on the Web interfaces stats.
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Wednesday, May 07 2014, 09:00 PM - #Permalink
Resolved

0 votes

top is real time only - I believe. I only found the other options looking at the man pages last weekend. You could check them as well.
The reply is currently minimized Show
Accepted Answer

zombu2

Offline
Monday, June 02 2014, 12:07 AM - #Permalink
Resolved

0 votes

well here is what i did to make system-mysqld behave

reset your stats and graphs
system-database reset

get your password for mysql
cat /var/clearos/system_database/reports

then log into mysql
/usr/clearos/sandbox/usr/bin/mysql -u reports -p reports

in mysql enter
alter table network_detail engine=innodb;

type exit and then restart mysql
service system-mysqld restart

been 4 days now and no runaway sql process , the system is nice and calm

and before anyone asks no i did not uninstall network detail etc.
The reply is currently minimized Show
Accepted Answer

Evince

Offline
Monday, June 02 2014, 12:20 PM - #Permalink
Resolved

0 votes

Thanks , Good Information
The reply is currently minimized Show

Accepted Answer

Nick Howitt

Offline

Tuesday, August 05 2014, 09:05 PM - #Permalink

Resolved

0 votes

I've just hit this one

[root@server ~]# ps afxw | grep pmacct

 5559 pts/0    S+     0:00          \_ grep pmacct

17054 ?        Ss    41:11 pmacctd: Core Process [default]

17059 ?        S     22:56  \_ pmacctd: MySQL Plugin [inbound]

 3351 ?        S      0:00  |   \_ pmacctd: MySQL Plugin -- DB Writer [inbound]

 4109 ?        S      0:00  |   \_ pmacctd: MySQL Plugin -- DB Writer [inbound]

 5119 ?        S      0:00  |   \_ pmacctd: MySQL Plugin -- DB Writer [inbound]

17062 ?        S     19:30  \_ pmacctd: MySQL Plugin [outbound]

 5118 ?        S      0:00      \_ pmacctd: MySQL Plugin -- DB Writer [outbound]

17063 ?        Ss    20:19 pmacctd: Core Process [default]

17065 ?        S      1:43  \_ pmacctd: MySQL Plugin [inbound]

17066 ?        S      1:04  \_ pmacctd: MySQL Plugin [outbound]

Is there any ClearOS fix in the pipeline?

The reply is currently minimized Show

Accepted Answer
Nick Howitt

Offline
Wednesday, August 06 2014, 06:14 PM - #Permalink
Resolved

0 votes

I could not find a bug report so I have filed bug 1885. In the meanwhile I've removed the app-network-detail-report packages and pmacct. If following Ben's instructions above note that it is system-mysqld and not system-mysql.
The reply is currently minimized Show
Accepted Answer
Legacy Disabled

Offline
Thursday, August 07 2014, 04:31 PM - #Permalink
Resolved

0 votes

Hi Nick,

This issue should have been resolved in a recent update (here's the tracker link. Does the /var/clearos/network_detail_report/purge_external file exist on your system? If not, then the update was not applied. If so, then something in the update did not work (?).

Short version: The network detail report table for external connections was not getting purged. This caused the report engine to hammer the MySQL system.
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Thursday, August 07 2014, 05:09 PM - #Permalink
Resolved

0 votes

Hi Peter,

I'll have to reinstall to check. Do I need to clean anything out first?
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Thursday, August 07 2014, 05:14 PM - #Permalink
Resolved

0 votes

I've just checked and, strangely, the file does exist even though I've uninstalled the reports!
The reply is currently minimized Show
Accepted Answer
Legacy Disabled

Offline
Thursday, August 07 2014, 06:00 PM - #Permalink
Resolved

0 votes

The should still exist on an uninstall, so that's normal. It looks like the upgrade was performed but the MySQL issue still persisted. We'll keep an eye out for this issue.
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Thursday, August 07 2014, 06:12 PM - #Permalink
Resolved

0 votes

I'll reinstall and see what happens. I think it only maxed out one cpu core so I did not really notice it having a huge impact.

Do you have a comment on zombu2's solution further up the thread?
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Thursday, August 07 2014, 06:36 PM - #Permalink
Resolved

0 votes

Bad news I again have a 100% load, generally on one core but sometimes split between 2 or 3. Can I help to diagnose?
The reply is currently minimized Show
Accepted Answer

Intelliant

Offline
Friday, August 08 2014, 08:51 AM - #Permalink
Resolved

0 votes

Me too facing this same issue. Any detail required to help diagnose this shall be provided.
The reply is currently minimized Show
Accepted Answer
Legacy Disabled

Offline
Friday, August 08 2014, 02:36 PM - #Permalink
Resolved

0 votes

Nick Howitt wrote:

Do you have a comment on zombu2's solution further up the thread?

It's a different way to get to the same issue -- the network_detail_external table is getting too large.
The reply is currently minimized Show
Accepted Answer
Legacy Disabled

Offline
Friday, August 08 2014, 02:56 PM - #Permalink
Resolved

0 votes

It's normal to see MySQL working hard on an update, but it shouldn't last for more the 30-ish seconds. Every 15 minutes, the pmacct daemon does a data dump while the other reports update every 5 minutes. Are you seeing a sustained 100% usage? What's the 15-minute load? Are you running version 1.5.27 of app-network-detail?
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Friday, August 08 2014, 04:39 PM - #Permalink
Resolved

0 votes

Hi Peter,
Yes I see 100% sustained usage. From top:
load average: 1.13, 1.06, 1.01
I have 2 real cores and 2 virtual cores. If I watch this in htop, the system-mysqld load varies between one core at 100% and two cores totalling 100%.

Yes, I am runnig 1.5..27 reinstalled from yum yesterday afternoon:
[root@server ~]# rpm -qa | grep app-network-d app-network-detail-report-1.5.27-1.v6.noarch app-network-detail-report-core-1.5.27-1.v6.noarch
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Saturday, August 09 2014, 01:04 PM - #Permalink
Resolved

0 votes

FWIW, following this post my network_detail_external database has 770196 records. [strike]Following Dave Loper's comments in bug 1885 could the purge routine just be purging from network_detail and not network_detail_external?[/strike]

Also attached is my load average graph. The start of the problem is obvious as is the bit where I temporarily uninstalled the packages.

Attachments:

laod_average.png

laod_average.png
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Saturday, August 09 2014, 08:31 PM - #Permalink
Resolved

0 votes

Hmm,
I've just worked out how to browse the reports in phpMyAdmin (where is root's password kept, I had to use reports'?) and, looking at the oldest entry in network_detail_external, it is dated 26 Jul 2014 so it looks like the data deletion routine is working. Is there any way I can do an sql query to count the records by day? I wonder if the 1.5.27 update suddenly increased the number of records being logged.
The reply is currently minimized Show

Accepted Answer

Peter Finch

Offline

Sunday, August 10 2014, 02:36 AM - #Permalink

Resolved

0 votes

First: There are now several threads on the system-mysqld excessive cpu usage for reports, and I have posted replies to several of those over the past 6 months. We probably need one thread but which one? I picked this one since it was most recently updated.

Second: I apologize for the length of this post but I wanted to make all this diagnostic information available.

Third: System is Community 6.5.0 (Final) with all released updates through today. And I do have the purge file referenced earlier in this thread with a size of 0 and date of Aug 2. (/var/clearos/network_detail_report/purge_external)

Peter

Let’s see what top shows hogging the CPU:

top - 18:00:12 up  5:28,  3 users,  load average: 1.17, 1.17, 1.21

Tasks: 277 total,   2 running, 275 sleeping,   0 stopped,   0 zombie

Cpu(s): 21.0%us, 29.6%sy,  0.0%ni, 47.1%id,  2.2%wa,  0.0%hi,  0.1%si,  0.0%st

Mem:   8050980k total,  7898036k used,   152944k free,  3585296k buffers

Swap:  8191992k total,        4k used,  8191988k free,  2775176k cached



  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

 2091 system-m  20   0 2071m 105m 6472 S 199.8  1.3 373:54.90 system-mysqld

   78 root      39  19     0    0    0 S  0.3  0.0   0:17.07 kipmi0

 1746 ldap      20   0 1155m  84m 5284 S  0.3  1.1   0:24.97 slapd

 2192 ftp       20   0  147m 2112  812 S  0.3  0.0   0:07.43 proftpd

 2223 root      20   0  170m  20m  824 S  0.3  0.3   0:28.34 l7-filter

 2869 clearcon  20   0  657m 126m  33m S  0.3  1.6   0:22.56 gconsole

    1 root      20   0 21448 1556 1252 S  0.0  0.0   0:00.59 init

    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd

. . .

Two of my 4 3GHz cores are fully consumed by system-mysqld. (Yes I let top run for a while and it stayed at 200%.)

So my clearos system, on a very fast box, is absolutely loafing (hardly any cpu usage) beyond the reporting system. Why would/should reports do this? It shouldn't.

Login to MySQL as reports. (If you don't know where to find the password you probably shouldn't be doing this.)

/usr/clearos/sandbox/usr/bin/mysql -ureports -p???

Once logged in, select the reports database for your queries:

use reports;

Display the tables in the reports schema:

mysql> show tables;

+-------------------------+

| Tables_in_reports       |

+-------------------------+

| network                 |

| network_detail          |

| network_detail_external |

| proxy                   |

| proxy_domains           |

| resource                |

+-------------------------+

6 rows in set (0.00 sec)

How many rows are in each of these tables?

mysql> SELECT table_name, table_rows

    -> FROM INFORMATION_SCHEMA.TABLES

    -> WHERE TABLE_SCHEMA = 'reports';

+-------------------------+------------+

| table_name              | table_rows |

+-------------------------+------------+

| network                 |     653394 |

| network_detail          |    1617329 |

| network_detail_external |     885436 |

| proxy                   |    2013755 |

| proxy_domains           |          0 |

| resource                |     185377 |

+-------------------------+------------+

6 rows in set (0.20 sec)

Two of these tables have over a million rows. Let’s see how old the oldest record is for each:

mysql> SELECT timestamp FROM proxy ORDER BY timestamp ASC LIMIT 1;

+---------------------+

| timestamp           |

+---------------------+

| 2013-12-19 10:53:07 |

+---------------------+

1 row in set (0.90 sec)

mysql> SELECT stamp_inserted FROM network_detail ORDER BY stamp_inserted ASC LIMIT 1;

+---------------------+

| stamp_inserted      |

+---------------------+

| 2013-12-21 10:00:00 |

+---------------------+

1 row in set (0.49 sec)

Looks like rows in these tables are not being regularly pruned by the reports purge process as oldest records are from Dec 2013.

Which tables have indexes? Maybe we are querying large tables without indexes.

mysql> SELECT DISTINCT

    ->     TABLE_NAME,

    ->     INDEX_NAME

    -> FROM INFORMATION_SCHEMA.STATISTICS

    -> WHERE TABLE_SCHEMA = 'reports';

+---------------+------------+

| TABLE_NAME    | INDEX_NAME |

+---------------+------------+

| network       | PRIMARY    |

| network       | iface      |

| network       | timestamp  |

| proxy_domains | PRIMARY    |

| proxy_domains | ip         |

| proxy_domains | timestamp  |

| proxy_domains | hostname   |

| resource      | PRIMARY    |

+---------------+------------+

8 rows in set (0.00 sec)

Interestingly, there are no indexes on the two largest tables.

Let’s see what queries the system is currently running. Time is number of seconds in current state.

mysql> show FULL processlist;

+-----+---------+-----------------+---------+---------+------+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Id  | User    | Host            | db      | Command | Time | State    | Info                                                                                                                                                                                             |

+-----+---------+-----------------+---------+---------+------+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

|  21 | reports | localhost:42686 | reports | Query   |    0 | Locked   | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1407603000) = stamp_inserted AND ip_dst='ff02::1:ff4c:29a4'                      |

|  68 | reports | localhost:47829 | reports | Query   |    2 | Locked   | INSERT INTO `network_detail_external` (stamp_updated, stamp_inserted, ip_dst, packets, bytes) VALUES (FROM_UNIXTIME(1407607201), FROM_UNIXTIME(1407606600), 'ff02::1:ff08:ade2', 1, 72)          |

| 115 | reports | localhost:53665 | reports | Query   |    1 | Locked   | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1407610200) = stamp_inserted AND ip_dst='ff02::1:ff04:df52'                      |

| 133 | reports | localhost:54854 | reports | Query   |    3 | Locked   | INSERT INTO `network_detail_external` (stamp_updated, stamp_inserted, ip_dst, packets, bytes) VALUES (FROM_UNIXTIME(1407611701), FROM_UNIXTIME(1407610800), 'ff02::1:ff26:64f6', 2, 144)         |

| 162 | reports | localhost:59025 | reports | Query   |    1 | Locked   | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1407613800) = stamp_inserted AND ip_dst='ff02::1:ff15:46d2'                      |

| 189 | reports | localhost:33090 | reports | Query   |    0 | Locked   | INSERT INTO `network_detail_external` (stamp_updated, stamp_inserted, ip_dst, packets, bytes) VALUES (FROM_UNIXTIME(1407616201), FROM_UNIXTIME(1407615000), 'ff02::1:ffef:f2bc', 1, 72)          |

| 210 | reports | localhost:35334 | reports | Query   |    1 | Locked   | INSERT INTO `network_detail_external` (stamp_updated, stamp_inserted, ip_dst, packets, bytes) VALUES (FROM_UNIXTIME(1407618001), FROM_UNIXTIME(1407616800), 'ff02::1:ffbc:5708', 1, 72)          |

| 236 | reports | localhost:37628 | reports | Query   |    1 | Locked   | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1407618600) = stamp_inserted AND ip_dst='ff02::1:ff43:ef6c'                      |

| 243 | reports | localhost       | reports | Query   |    0 | NULL     | show FULL processlist                                                                                                                                                                            |

| 247 | reports | localhost:38735 | reports | Query   |    4 | Updating | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1407619800) = stamp_inserted AND ip_dst='ff02::1:ff53:20e'                       |

| 256 | reports | localhost:39442 | reports | Query   |    2 | Updating | UPDATE network_detail SET ip='??

?', hostname='wemoswitchc70.local.lan', username='', device_vendor='', device_type='' WHERE (ip_src='192.168.10.170' OR ip_dst='192.168.10.170') AND ip IS NULL |

+-----+---------+-----------------+---------+---------+------+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

11 rows in set (0.00 sec)

So a bunch of queries running. The last two are active updates and sucking up the CPU. The others are inserts and updates blocked while they wait for tables to be unlocked by completion of the two update queries. Adding indexes may help on the updates but may also slow down inserts for large tables...

I welcome feedback from devs...

The reply is currently minimized Show

Accepted Answer

Nick Howitt

Offline

Sunday, August 10 2014, 07:31 AM - #Permalink

Resolved

0 votes

I've worked out how to summarise by dates in sql now and for network_detail_external I get:

mysql> SELECT count(*), DATE(stamp_inserted) DateOnly FROM `network_detail_external` GROUP BY DateOnly;

+----------+------------+

| count(*) | DateOnly   |

+----------+------------+

|    54780 | 2014-07-26 |

|    54253 | 2014-07-27 |

|    53489 | 2014-07-28 |

|    57178 | 2014-07-29 |

|    54701 | 2014-07-30 |

|    52011 | 2014-07-31 |

|    51586 | 2014-08-01 |

|    54122 | 2014-08-02 |

|   110018 | 2014-08-03 |

|    45995 | 2014-08-04 |

|    52052 | 2014-08-05 |

|    32959 | 2014-08-06 |

|    11383 | 2014-08-07 |

|    56079 | 2014-08-08 |

|    51203 | 2014-08-09 |

|    16981 | 2014-08-10 |

+----------+------------+

16 rows in set (1.92 sec)

Unfortunately my history does not go back as far as the 1.5.27 update so I can't prove anything about the latest update. The dip on 07 Aug was when I had removed the network detail report.

For network_detail I get:

mysql> SELECT count(*), DATE(stamp_inserted) DateOnly FROM `network_detail` GROUP BY DateOnly;

+----------+------------+

| count(*) | DateOnly   |

+----------+------------+

|       46 | 2014-05-18 |

|      789 | 2014-05-19 |

|      862 | 2014-05-20 |

|      698 | 2014-05-21 |

|      822 | 2014-05-22 |

|      746 | 2014-05-23 |

|      853 | 2014-05-24 |

|      925 | 2014-05-25 |

|     1112 | 2014-05-26 |

|     1016 | 2014-05-27 |

|      834 | 2014-05-28 |

|      873 | 2014-05-29 |

|      915 | 2014-05-30 |

|      861 | 2014-05-31 |

|      973 | 2014-06-01 |

|      856 | 2014-06-02 |

|      740 | 2014-06-03 |

|      824 | 2014-06-04 |

|      782 | 2014-06-05 |

|      939 | 2014-06-06 |

|     1027 | 2014-06-07 |

|     1097 | 2014-06-08 |

|      868 | 2014-06-09 |

|      797 | 2014-06-10 |

|      828 | 2014-06-11 |

|      824 | 2014-06-12 |

|     1051 | 2014-06-13 |

|     1147 | 2014-06-14 |

|      746 | 2014-06-15 |

|      826 | 2014-06-16 |

|      856 | 2014-06-17 |

|      863 | 2014-06-18 |

|      831 | 2014-06-19 |

|      749 | 2014-06-20 |

|     1016 | 2014-06-21 |

|      902 | 2014-06-22 |

|      957 | 2014-06-23 |

|      668 | 2014-06-24 |

|      587 | 2014-06-25 |

|      537 | 2014-06-26 |

|      662 | 2014-06-27 |

|      613 | 2014-06-28 |

|      524 | 2014-06-29 |

|      612 | 2014-06-30 |

|      501 | 2014-07-01 |

|      576 | 2014-07-02 |

|      446 | 2014-07-03 |

|      547 | 2014-07-04 |

|      673 | 2014-07-05 |

|      796 | 2014-07-06 |

|      616 | 2014-07-07 |

|      545 | 2014-07-08 |

|      666 | 2014-07-09 |

|      643 | 2014-07-10 |

|      834 | 2014-07-11 |

|      948 | 2014-07-12 |

|      882 | 2014-07-13 |

|      688 | 2014-07-14 |

|      652 | 2014-07-15 |

|      749 | 2014-07-16 |

|      678 | 2014-07-17 |

|      647 | 2014-07-18 |

|      757 | 2014-07-19 |

|      621 | 2014-07-20 |

|      739 | 2014-07-21 |

|      813 | 2014-07-22 |

|      562 | 2014-07-23 |

|      531 | 2014-07-24 |

|      625 | 2014-07-25 |

|      577 | 2014-07-26 |

|      410 | 2014-07-27 |

|      441 | 2014-07-28 |

|      519 | 2014-07-29 |

|      515 | 2014-07-30 |

|      530 | 2014-07-31 |

|      481 | 2014-08-01 |

|      668 | 2014-08-02 |

|      509 | 2014-08-03 |

|      551 | 2014-08-04 |

|      639 | 2014-08-05 |

|      383 | 2014-08-06 |

|      211 | 2014-08-07 |

|      768 | 2014-08-08 |

|      408 | 2014-08-09 |

|      100 | 2014-08-10 |

+----------+------------+

85 rows in set (0.05 sec)

This looks reasonable but I am getting different results from Peter Finch. My tables seem to get purged and my network_detail_external is much larger than network_detail.

The reply is currently minimized Show

Accepted Answer
Tim Burgess

Offline
Monday, August 11 2014, 04:43 PM - #Permalink
Resolved

0 votes

I'll try and have a poke around with this tonight

Do you also have the app-network-map installed and configured? from the SQL queries above it maybe tripping over a large number of IPV6 address traffic... Nick do you see similar queries if you run 'show FULL processlist;'

On a quiet system post install I have none.... to alleviate the 'runaway' nature of the script try increasing the time interval in cron (/etc/cron.d/app-network-detail-report) currently scheduled at 5minute intervals
*/5 * * * * root /usr/sbin/networkdetail2db >/dev/null 2>&1
12 3 * * * root /usr/sbin/networkdetailpurge >/dev/null 2>&1
The reply is currently minimized Show

Accepted Answer

Nick Howitt

Offline

Monday, August 11 2014, 05:12 PM - #Permalink

Resolved

0 votes

Hi Tim,

I do have app-network-map installed. I've had a little browse through system-mysql and I did not notice any ipv6 traffic. I've tried exporting the file to a csv file to have a further look but my version of Excel (2003) won't load it as it is way to big.

Here is my process list:

mysql> show FULL processlist;

+-----+---------+-----------------+---------+---------+------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Id  | User    | Host            | db      | Command | Time | State    | Info                                                                                                                                                                      |

+-----+---------+-----------------+---------+---------+------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| 169 | reports | localhost:56274 | reports | Query   |    1 | Updating | UPDATE `network_detail_external` SET packets=packets+2, bytes=bytes+244, stamp_updated=NOW() WHERE FROM_UNIXTIME(1407775200) = stamp_inserted AND ip_dst='49.205.148.242' |

| 170 | reports | localhost:56276 | reports | Query   |    1 | Locked   | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+131, stamp_updated=NOW() WHERE FROM_UNIXTIME(1407775200) = stamp_inserted AND ip_src='88.222.186.3'   |

| 179 | reports | localhost       | NULL    | Query   |    0 | NULL     | show FULL processlist                                                                                                                                                     |

+-----+---------+-----------------+---------+---------+------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

3 rows in set (0.00 sec)

Note it is shorter than when I looked yesterday, perhaps because I rebooted earlier to day to move the kernel on, but my 15min load average is now up to 1.17 (out of 4).

I'll try your approach to cron first but as I am on holiday soon I may have to remove it again.

Does your post indicate you are also having the same issue?

[edit]
I've just loaded the table in Access (which I don't really know) and I am really surprised at the amount of traffic logged and the amount of single byte traffic (pings?) - 526775 out of 878400 packets.
[/edit]

[edit2]
Also 235684 records with NULL hostname
[/edit2]

[edit3]
Changing the cron job did nothing (or very little), and, yes, I did restart crond after the edit.
I notice /var/log/system is getting a lot of:

Aug 11 19:20:01 server networkdetail2db: Unable to start script - currently running.

Aug 11 19:40:01 server networkdetail2db: Unable to start script - currently running.

Aug 11 19:50:01 server networkdetail2db: Unable to start script - currently running

Looking at the system log it seems like one error message is missed after about 50 mins. This means that the script is taking that long to execute!
[/edit3]

The reply is currently minimized Show

Accepted Answer
Tim Burgess

Offline
Monday, August 11 2014, 09:53 PM - #Permalink
Resolved

0 votes

Hi Nick, no I don't have the same symptoms here, but my table is only small so far.

FYI you can browse the system-database contents using the phpMyAdmin web interface (https://clearosip:81/mysql). This is a new feature in ClearOS 6.6 beta 1, select the 'system database' from the drop down and login with your system-mysql root password.

I also have lots of IP entries in network_detail_external but not as much as yours... do you by any chance also host torrents? this might explain the large growth of very small packet connections
The reply is currently minimized Show
Accepted Answer
Tim Burgess

Offline
Monday, August 11 2014, 10:36 PM - #Permalink
Resolved

0 votes

Some more poking around... I also see lots of NULL IP records

I've been experimenting with adding a table index to the mysql query columns to improve the query performance. My hunch is that with large amounts of TCP connections the database size grows rapidly and so does the time taken to query and update the host information in it - hence my question about Torrents. The queries appear to usually use WHERE stamp_inserted, ip_dst OR ip_src... but as I don't have the symptoms I can't tell if these improve things

If you are happy to test could you try running the following on the reports database?

ALTER TABLE `network_detail_external` ADD INDEX(`ip_src`); ALTER TABLE `network_detail_external` ADD INDEX(`ip_dst`); ALTER TABLE `network_detail_external` ADD INDEX(`stamp_inserted`);
The reply is currently minimized Show
Accepted Answer
Tim Burgess

Offline
Monday, August 11 2014, 11:02 PM - #Permalink
Resolved

0 votes

Something else to try...

modifying the pmacctd daemon (I incorrectly thought it was the cron job earlier but that just calls a script to update the mappings in network_detail which is by comparison much smaller than network_detail_external)

With reference to:-
http://wiki.pmacct.net/OfficialExamples

You should have two generate configlets by ClearOS in /etc/pmacctd/pmacctd_ethX.conf - where ethX is your WAN and will log to network_detail_external

In here are two parameters, sql_refresh_time: 900, sql_history: 10m... try reducing the first to say 600 so that only INSERTS and not UPDATES are carried out once per SQL timeslot...
Alternatively drop it right down to 60, which will reduce the amount of traffic kept in memory (at the expense of more frequent but smaller database inserts) to try and reduce the size of data being dumped into system-mysql at any one time

You may need to run 'service pmacctd restart' to implement
The reply is currently minimized Show
Accepted Answer
Tim Burgess

Offline
Monday, August 11 2014, 11:43 PM - #Permalink
Resolved

0 votes

P.S pmacctd daemon gobbles up CPU... on a 50Mbps download from my LAN it accounts for nearly 50% of my CPU time, cumulatively more than snort which is doing packet inspection as well!

18403 snort 30 10 469m 174m 4152 R 43.1 3.6 6:19.41 snort 17679 squid 30 10 158m 86m 3664 R 29.2 1.8 2:09.92 squid 13389 root 20 0 61156 10m 9968 S 19.2 0.2 0:46.42 pmacctd 13398 root 20 0 64692 14m 4140 S 10.6 0.3 0:25.17 pmacctd 13394 root 20 0 64692 14m 4140 S 10.3 0.3 0:24.75 pmacctd 30272 dansguar 30 10 128m 16m 1340 R 10.0 0.3 0:14.85 dansguardian-av 13397 root 20 0 61156 10m 9.8m S 9.0 0.2 0:10.88 pmacctd

EDIT: adding "plugin_buffer_size: 1024" to both pmacctd_ethX.conf files seems to have significantly reduced the usage! as per the FAQ here. Some room for tuning?
http://wiki.pmacct.net/OfficialFAQs
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Tuesday, August 12 2014, 07:54 AM - #Permalink
Resolved

0 votes

Hi Tim,

My hands are a bit tied on this. I'm off today until Saturday and again later in the week for 10 days. system-mysqld was running away (up to 1.30) and there is no way my wife could fix things in my absence so I've had to pull the report for the moment. It also means that when I'm finally back and reinstall it, the database will be relatively empty as the purge routine appears to be working. I'll see if I can give it a go between the two trips.

I have been using phpMyAdmin as I remembered a post from long ago on how to modify /etc/phpMyAdmin/config.inc.php to gain access. I can't remember how to get root access but I can get reports access from details posted in a linked thread. I'm not a coder but about 25 years ago I played around with SQL queries so I have a vague idea about what I'm doing hence my simple reports. Google helps.

I do host 2 ClearOS torrents and torrent on and off for other things, but not, for example at the back end of last week. It could, I suppose, mean that my IP is well known and I am getting a lot of pings which is what I assume these single packet requests are.

Did you bump into any way not to log icmp? I had a bit of a google last night but failed.

Nick
The reply is currently minimized Show
Accepted Answer
Tim Burgess

Offline
Tuesday, August 12 2014, 08:49 AM - #Permalink
Resolved

0 votes

No problem have a good break perhaps Peter can help?

From my limited observations things can be improved by tweaking the pmacctd daemon so that database INSERTS are only generated once during the timeslot period, otherwise it has to UPDATE the existing table rather than just insert new data, which is more IO intensive for large tables. The default config appears to deviate from this principle?

Second benefit is to enable and configure the plugin_buffer_size (which appears to be disabled by default). This reduces the CPU load on high traffic but does nothing for system-mysql...on my system anyway.

The third benefit is to add database indexes to the stamp_inserted, ip_src and ip_dst columns, which appear to be used by the UPDATE query..

There are other optimisations with pmacct shown in the FAQ such as PF_RING / LIBPCAP but I've not investigated these. Interestingly the pmacctd FAQ's discourage the logging of all traffic to prevent DOS type problems
The reply is currently minimized Show

Accepted Answer

Tim Burgess

Offline

Friday, August 15 2014, 11:04 PM - #Permalink

Resolved

0 votes

Hmm so after a couple days the load is steadily increasing and the tips above don't seem to have alleviated the problem. The INSERT queries are generated by pmacctd, but also the UPDATE query is generated by the cron script to update the IP/username for particular hosts. The table appears to be locked between single queries...

mysql> show full processlist;

| 5156 | reports | localhost:34162 | reports | Query   |    1 | Locked   | INSERT INTO `network_detail_external` (stamp_updated, stamp_inserted, ip_src, packets, bytes) VALUES (FROM_UNIXTIME(1408143601), FROM_UNIXTIME(1408143000), '176.31.240.170', 1, 86)                       |

| 5157 | reports | localhost:34163 | reports | Query   |    2 | Locked   | INSERT INTO `network_detail_external` (stamp_updated, stamp_inserted, ip_dst, packets, bytes) VALUES (FROM_UNIXTIME(1408143601), FROM_UNIXTIME(1408143000), '176.31.240.170', 1, 86)                       |

| 5160 | reports | localhost:34172 | reports | Query   |    2 | Updating | UPDATE network_detail_external SET ip='>??H', hostname='m328-mp1-cvx1b.lan.ntl.com', username='', device_vendor='', device_type='' WHERE (ip_src='62.252.169.72' OR ip_dst='62.252.169.72') AND ip IS NULL |

The reply is currently minimized Show

Accepted Answer
Tim Burgess

Offline
Friday, August 15 2014, 11:17 PM - #Permalink
Resolved

0 votes

OK so took a look at the table status, and noticed that the network_detail, network_detail_external and proxy tables all use MyISAM engine not InnoDB... so followed Zombu2 advice on the first page, and changed the engine

mysql> use reports;
mysql> show table status;
mysql> alter table network_detail_external engine=innodb;
mysql> alter table network_detail engine=innodb;
mysql> alter table proxy engine=innodb;
mysql> show table status;

Now processlist list does not show table locking between each query and hence there seems to be an overall speed increase. I'll monitor the drop in load and see how it compares...

| 5208 | reports | localhost:34592 | reports | Query | 1 | Updating | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+86, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408143600) = stamp_inserted AND ip_src='185.30.232.167' | | 5209 | reports | localhost:34593 | reports | Query | 1 | Updating | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+76, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408143600) = stamp_inserted AND ip_dst='87.117.251.2' | | 5212 | reports | localhost:34637 | reports | Query | 0 | Updating | UPDATE network_detail_external SET ip='|{?', hostname='ras.beamtele.net', username='', device_vendor='', device_type='' WHERE (ip_src='124.123.26.138' OR ip_dst='124.123.26.138') AND ip IS NULL |
The reply is currently minimized Show
Accepted Answer

Peter Finch

Offline
Saturday, August 16 2014, 12:43 AM - #Permalink
Resolved

0 votes

Doh! I should have noticed that... I nnew it was locking at the table level from the process list but still had it in my head they were innodedb tables...

Good catch Tim! Let us know in a day or two how your load graphs look and I will alter mine as well if improvement noted. I was actually getting ready to just remove these modules this weekend in frustration.

Peter
The reply is currently minimized Show

Accepted Answer

Peter Finch

Offline

Saturday, August 16 2014, 02:23 PM - #Permalink

Resolved

0 votes

Here it is Saturday morning with nothing planned so what do? I know, I'll play with my ClearOS firewalls some more.

Changing reports tables to INNODB showed no improvement in CPU or load. Instead of one running query and a dozen locked ones I now have about 150 running queries, some running for nearly a minute. IO Wait is very low so not a disk issue, bound up in CPU. I will revert the INNODB change.

EDIT: Actually, with MyISAM I also have about 150 queries in PROCESSLIST with many locked for over a minute.

Think it is time to just uninstall the offending modules. Or is there an easy way to just turn all the network reporting off for now so I can enable again later to test future updates?

I include below a new MySQL process list after the change to INNODB. Maybe there is a clue in here for Devs. I edited out 100 or so similar queries in the list to keep the post small.

I am surprised at all the IPV6 references since at one point I disabled IPV6 but that was apparently reverted by a subsequent ClearOS update.

Peter



mysql> select * from information_schema.processlist order by time desc;

+------+---------+-----------------+---------+---------+------+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| ID   | USER    | HOST            | DB      | COMMAND | TIME | STATE     | INFO                                                                                                                                                                               |

+------+---------+-----------------+---------+---------+------+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| 6720 | reports | localhost:43670 | reports | Query   |   51 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+76, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408136400) = stamp_inserted AND ip_dst='64.34.185.196'            |

| 7404 | reports | localhost:56232 | reports | Query   |   50 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408186800) = stamp_inserted AND ip_dst='ff02::1:ff9f:bcb0'        |

| 5646 | reports | localhost:45270 | reports | Query   |   49 | Updating  | UPDATE `network_detail_external` SET packets=packets+3, bytes=bytes+216, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408053000) = stamp_inserted AND ip_dst='ff02::1:ff01:1205'       |

| 5118 | reports | localhost:53887 | reports | Query   |   48 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408009200) = stamp_inserted AND ip_dst='ff02::1:ff4b:d3c2'        |

| 4626 | reports | localhost:37050 | reports | Query   |   47 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1407969000) = stamp_inserted AND ip_dst='ff02::1:ff0b:b362'        |

| 6204 | reports | localhost:46992 | reports | Query   |   46 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408096200) = stamp_inserted AND ip_dst='ff02::1:ff86:1f62'        |

| 3439 | reports | localhost:40280 | reports | Query   |   45 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1407873000) = stamp_inserted AND ip_dst='ff02::1:ff81:5d42'        |

| 7329 | reports | localhost:50926 | reports | Query   |   44 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408182000) = stamp_inserted AND ip_dst='ff02::1:ff40:551e'        |

| 6925 | reports | localhost:36553 | reports | Query   |   43 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408152600) = stamp_inserted AND ip_dst='ff02::1:fffe:aca0'        |

| 3725 | reports | localhost:40670 | reports | Query   |   43 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1407895800) = stamp_inserted AND ip_dst='ff02::1:ffb5:4c92'        |

| 7550 | reports | localhost:41649 | reports | Query   |   42 | Updating  | UPDATE `network_detail_external` SET packets=packets+36, bytes=bytes+2176, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408196400) = stamp_inserted AND ip_dst='8.8.8.8'               |

| 7549 | reports | localhost:41648 | reports | Query   |   42 | Updating  | UPDATE `network_detail_external` SET packets=packets+5342, bytes=bytes+484555, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408196400) = stamp_inserted AND ip_src='24.99.223.144'     |

| 7063 | reports | localhost:54451 | reports | Query   |   42 | Updating  | UPDATE `network_detail_external` SET packets=packets+8377, bytes=bytes+12817799, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408162800) = stamp_inserted AND ip_src='208.111.161.254' |

| 7162 | reports | localhost:35507 | reports | Query   |   41 | Updating  | UPDATE `network_detail_external` SET packets=packets+2, bytes=bytes+120, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408170000) = stamp_inserted AND ip_src='173.194.37.73'           |

| 2501 | reports | localhost:38315 | reports | Query   |   41 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1407796200) = stamp_inserted AND ip_dst='ff02::1:ff95:b586'        |

| 7529 | reports | localhost:39270 | reports | Query   |   40 | Updating  | UPDATE `network_detail_external` SET packets=packets+12, bytes=bytes+942, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408194600) = stamp_inserted AND ip_src='69.171.248.65'          |

| 6936 | reports | localhost:37579 | reports | Query   |   38 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408153200) = stamp_inserted AND ip_dst='ff02::1:ff1d:77c0'        |

| 6129 | reports | localhost:40596 | reports | Query   |   38 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408090800) = stamp_inserted AND ip_dst='ff02::1:ff4b:f859'        |

| 7224 | reports | localhost:40203 | reports | Query   |   37 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408173000) = stamp_inserted AND ip_dst='ff02::1:ff5d:9b5a'        |

| 6249 | reports | localhost:51436 | reports | Query   |   36 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408099800) = stamp_inserted AND ip_dst='ff02::1:ff9e:4032'        |

| 2932 | reports | localhost:52159 | reports | Query   |   35 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1407831600) = stamp_inserted AND ip_dst='ff02::1:ffbf:fd2'         |

| 5583 | reports | localhost:39385 | reports | Query   |   34 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408047600) = stamp_inserted AND ip_dst='ff02::1:ff78:642c'        |

| 4098 | reports | localhost:45165 | reports | Query   |   33 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1407927000) = stamp_inserted AND ip_dst='ff02::1:ff61:b9aa'        |

| 5823 | reports | localhost:36929 | reports | Query   |   32 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408067400) = stamp_inserted AND ip_dst='ff02::1:fffe:6b8'         |

| 6384 | reports | localhost:37330 | reports | Query   |   31 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408110000) = stamp_inserted AND ip_dst='ff02::1:ff5a:d7e1'        |

| 7431 | reports | localhost:59477 | reports | Query   |   30 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408189200) = stamp_inserted AND ip_dst='ff02::1:fff4:7113'        |

| 7364 | reports | localhost:54129 | reports | Query   |   29 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408185000) = stamp_inserted AND ip_dst='ff02::1:ff97:7012'        |

. . .

. . .

| 5981 | reports | localhost:55764 | reports | Query   |    5 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408080000) = stamp_inserted AND ip_dst='ff02::1:ff50:4cf1'        |

| 6606 | reports | localhost:60888 | reports | Query   |    4 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408128000) = stamp_inserted AND ip_dst='ff02::1:ff3d:ded2'        |

| 7552 | reports | localhost:41651 | reports | Query   |    3 | Updating  | UPDATE `network_detail` SET packets=packets+4, bytes=bytes+304, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408196400) = stamp_inserted AND ip_dst='192.168.10.176'                   |

| 6981 | reports | localhost:44178 | reports | Query   |    3 | Updating  | UPDATE `network_detail_external` SET packets=packets+8, bytes=bytes+352, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408156800) = stamp_inserted AND ip_dst='17.173.254.222'          |

| 7530 | reports | localhost:39271 | reports | Query   |    2 | Updating  | UPDATE `network_detail_external` SET packets=packets+37, bytes=bytes+16825, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408194600) = stamp_inserted AND ip_dst='31.13.69.182'         |

| 7551 | reports | localhost:41650 | reports | Query   |    1 | Updating  | UPDATE `network_detail` SET packets=packets+156, bytes=bytes+28914, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408196400) = stamp_inserted AND ip_src='192.168.10.174'               |

| 6685 | reports | localhost:40176 | reports | Query   |    1 | Updating  | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1408133400) = stamp_inserted AND ip_dst='ff02::1:ffec:3470'        |

| 7536 | reports | localhost       | NULL    | Query   |    0 | executing | select * from information_schema.processlist order by time desc                                                                                                                    |

+------+---------+-----------------+---------+---------+------+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

147 rows in set, 1 warning (0.00 sec)

The reply is currently minimized Show

Accepted Answer
Nick Howitt

Offline
Saturday, August 16 2014, 04:16 PM - #Permalink
Resolved

0 votes

Back home for a few days now. I don't use the proxy but I've changed the engine to innodb for the network_detail and network_detail_external and restarted system-mysqld. Initially it did nothing to help (processing a backlog?) then system-mysqld seemed to peak at about 100% for 2.5 - 3 minutes every 5 minutes then drop out of "top". This is a reasonable improvement but this server does only supports a small family. It would struggle if it were supporting an SME or college. Unfortunately after about 40 mins I was back to networkdetail2db not always completing between cron jobs (perhaps an hourly thing but I can't keep staring at the screen!)
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Saturday, August 16 2014, 07:49 PM - #Permalink
Resolved

0 votes

Trial and error time now. I've added a filter to the WAN logging in /etc/pmacct/pmacctd_eth0.conf:
pcap_filter: 'not icmp'
and deleted all single packet records from the database (561,168 leaving 370,112 records):
delete FROM `network_detail_external` WHERE packets = 1;
If my hunch is correct (and if I have the filter correct) I should be logging less than half the records into a database less than half the size.

I have no idea what the NULL entries are in the hostname column. It is not just because there is no ptr record as other entries without a ptr record show up with the IP in the hostname field. These records with a NULL hostname also have a NULL username, device_type and device_vendor as opposed to a blank one. These account for 103,048 of my remaining 370,112 records!
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Sunday, August 17 2014, 09:05 AM - #Permalink
Resolved

0 votes

So much for that hunch. I have logged 18,441 single packets in the last 14 hours. - either that or my pmmctd filter is not working.
The reply is currently minimized Show
Accepted Answer
Tim Burgess

Offline
Tuesday, August 19 2014, 11:10 PM - #Permalink
Resolved

0 votes

Hmm the changes have helped somewhat.. but I can report that system-mysqld still eats up a lot of CPU every 5 minutes during updates

My load trends have definitely reduced though, and it no longer seems to snowball...

The spikes are periodic backups, I installed the app last Monday... increasing load until Saturday when I made the changes, and also restarted because of some rewiring and currently hover around the 0.5 mark

Processing all those IPV6 and single packet entries is not going to help!

There appear to be two things at play here... the pmacctd database updates every 10minutes (which doesn't include the IP / username / hostname) and the cron PHP script which is called every 5 minutes (which updates the existing IP / hostname / username if available).

Both of these coincide with each other and on my fairly quiet system max out one CPU for 2-3 minutes.... note that having a CPU governor which scales the frequency to match the load will give misleading readings as the load is relative to the processor speed

Attachments:

system1z.png

system1z.png
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Wednesday, August 20 2014, 11:44 AM - #Permalink
Resolved

0 votes

I unloaded the app yesterday evening. On Saturday I deleted all single packet entries in the database but they were still building again. I'd also tried to filter all ICMP packets using both aggregate[inbound] and aggregate[outbound], and pcap_filter filtering for "not icmp" in the pmacct.conf file but this did not appear to really slow down the single packet entries. I also switched to using innodb, Even before I deleted the extra entries this switch to innodb generally (but not always) allowed networkdetail2db to complete before it tried to start again. By yesterday networkdetail2db was failing to complete more often. My 15min load averages were down from 130% before the weekend to 40-50% indicating that the system-mysql (which takes nearly all the load) was running for about 2 minutes every 5 at 100% which confirms what I was observing with "top"

I don't think I saw any IPv6 traffic.

BTW where are you getting your system load graph from? It is not the ClearOS one which I have.

I tried logging everything inbound in the firewall over a 10 minute period to correlate it with the system-mysql table, but I don't think this was long enough. After eliminating all the volume traffic and identifyable traffic from the firewall dump, I could not find any of the remaining traffic in the system-mysql table. I also did not get any icmp traffic in the firewall which I find odd. I will need to do it again over a larger sample period (but at a quiet time to reduce the firewall dump!) but I will get no time before early September.

Because I could not correlate the firewall to the system-mysql table I could not identify the single packets or the NULL entries.
The reply is currently minimized Show
Accepted Answer
Tim Burgess

Offline
Wednesday, August 20 2014, 10:00 PM - #Permalink
Resolved

0 votes

I've installed Monitorix which is available in the clearos-epel repo

Perhaps a tcpdump log would help identify what the single packets are? This is a symptom of running torrent software as other members of the 'swarm' send single packets to identify who is available long after the client may have stopped transferring. Perhaps try disabling your torrents and monitor the transmission port

I was referring to Peters earlier post which contained many IPV6 entries too... given the almost infinite number of IPV6 addresses this could cause a very large database of traffic

I'll have a play with the pcap_filters
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Thursday, August 21 2014, 09:11 AM - #Permalink
Resolved

0 votes

The odd thing about the single packets I saw in the firewall is that if they were torrents, they were on the wrong port! I'll see if I can find some time in September to give it another go but it needs some concentration.
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Tuesday, September 02 2014, 07:24 PM - #Permalink
Resolved

0 votes

Reinstalled yesterday and still using innodb. The database is only (!) 556,487 records for the moment as the report was not running while I was away. From the pmacct wiki #18, in both interface conf files I've set:
sql_refresh_time: 600 sql_dont_try_update: true
sql_history is still 10m so sql_refresh_time matches, and these are my loads:

You can see where I made the change at 18:00. It is not perfect but it is better. 15m load averages are about .5. I need to run with this for longer to see what happens.

I have not tried packet sniffing yet. That will probably be in a few days.

Attachments:

pmacct.png

pmacct.png
The reply is currently minimized Show
Accepted Answer
Legacy Disabled

Offline
Wednesday, September 03 2014, 07:51 PM - #Permalink
Resolved

0 votes

Hi Tim and Nick,

Thanks for diving into this! It's a crazy whirlwind at the moment (ClearOS 6.6, ClearOS 7, new web site). Flip side: the kids are back in school so there's much more time for hacking.
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Wednesday, September 03 2014, 09:08 PM - #Permalink
Resolved

0 votes

Hi Peter,

Thanks for the reply. For interest here is my 7 day graph (I was away for the first 5 days, power cut on 31/08 (again!), re-installed on 1/9 and modded the config on 2/9. The system load and kernel usage tell the story. For me the report has a relatively high load on the server (which normally uses very little power and is a Core i3-4130 device) and I wonder if we are playing at this. On a busier system (small to mid-sized office) even with my config, would the report have the ability to pull the system down?

Attachments:

monitorix.png

monitorix.png
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Thursday, September 04 2014, 08:30 PM - #Permalink
Resolved

0 votes

.... and round about 18h00 today networkdetail2db began failing to complete again.
The reply is currently minimized Show
Accepted Answer
Legacy Disabled

Offline
Wednesday, September 10 2014, 03:38 PM - #Permalink
Resolved

0 votes

Hi all,

I think it's time to pull the report from the Marketplace (still available via yum though). The networkdetail2db should be replaced with a patch to pmacct -- that will certainly improve performance.

Like

1

Josh Harding likes this post.
The reply is currently minimized Show
Accepted Answer

gys

Offline
Saturday, September 27 2014, 09:46 PM - #Permalink
Resolved

0 votes

Hello,

since yesterday evening around 19:30 GMT+2 I've encountered the same problem.

Any solutions found meanwhile?

To keep the server up and running I stopped the service pmacctd, but this isn't a solution I guess

Regards
Guus
The reply is currently minimized Show
Accepted Answer

SkoglundTech

Offline
Saturday, September 27 2014, 10:25 PM - #Permalink
Resolved

0 votes

I'd like to bump this one too. I've been watching all the system-mysqld threads on here for several weeks. I've tried all the suggestions but keep ending up with a thoroughly slammed cpu.

I have zero torrents, so I know it isn't being caused by that on my machine at least. I did a reboot of my machine last night, and since then system-mysqld has logged over 457 minutes of cpu time....about 40x more than snort.
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Sunday, September 28 2014, 07:16 AM - #Permalink
Resolved

0 votes

If you look a couple of posts up, the app has been withdrawn from the marketplace until they can sort it out. With 6.6 and 7 on the horizon I'd guess it is a low priority. In the meanwhile I'd suggest a complete removal of the app with:
service pmacctd stop yum remove pmacct
If you're more adventurous, you could even drop the tables from system-mysql.

Like

1

Josh Harding likes this post.
The reply is currently minimized Show
Accepted Answer

Bent Are Fikse

Offline
Sunday, September 28 2014, 07:42 AM - #Permalink
Resolved

0 votes

I also hav e serious trouble with this, making CPU so busy that zarafa stops working after approx 1 1/\2 day. So Ill have to restart mysql every 24 hours, this keeps the system alive.
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Sunday, September 28 2014, 07:59 AM - #Permalink
Resolved

0 votes

Remove the report!
The reply is currently minimized Show
Accepted Answer

Bent Are Fikse

Offline
Sunday, September 28 2014, 08:09 AM - #Permalink
Resolved

0 votes

Nick Howitt wrote:

Remove the report!

Yea-yea-yeah :P I posted in the same second as you wrote your post

(ehh, well, not the same second, but I didnt refresh the forum before I posted. My bad)

Thank you:
The reply is currently minimized Show
Accepted Answer

Pavel Vorko

Offline
Sunday, September 28 2014, 02:11 PM - #Permalink
Resolved

0 votes

I did not understand until now, while browsing this topic.
What can be done to reduce the load on the gateway?
There is the exact recipe?
This situation is starting to annoy ...

Attachments:

COS_6-20140928.jpg

COS_6-20140928-2.jpg

COS_6-20140928.jpg

COS_6-20140928-2.jpg
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Sunday, September 28 2014, 03:09 PM - #Permalink
Resolved

0 votes

There are a couple of tweaks to the .conf files here. You can also try switching the database engine to innodb. Ultimately neither solved it. In your /var/log/system you will see a lot of messages saying networkdetail2db not able to start as it is already running. Making those changes and deleting stacks or records from the database stopped the errors for a while but they came back eventually.

Peter Baldwin has said said networkdetail2db needs to be replaced with something to improve performance.

I would honestly recommend stopping and removing pmacct and the network-detail reports.
The reply is currently minimized Show
Accepted Answer

Pavel Vorko

Offline
Sunday, September 28 2014, 04:08 PM - #Permalink
Resolved

0 votes

It's depressing. After all, we are talking about ClearOS Professional Edition and running the gateway in the enterprise.
The reply is currently minimized Show
Accepted Answer

gys

Offline
Sunday, September 28 2014, 05:44 PM - #Permalink
Resolved

0 votes

gys wrote:

Hello,

since yesterday evening around 19:30 GMT+2 I've encountered the same problem.

Any solutions found meanwhile?

To keep the server up and running I stopped the service pmacctd, but this isn't a solution I guess

Regards
Guus

This morning at 9:30 the same problem started again. After many trail & error it seems that de WAN NIC is malfunctioning. Now that I've replaced it the server starts normal and functions run smoothly. By the way I still have pmacctd removed from the system and will try later the effects of reinstalling.

First I want to see the system stable over a longer periode

Guus
The reply is currently minimized Show
Accepted Answer

David Smith

Offline
Tuesday, October 07 2014, 02:59 PM - #Permalink
Resolved

0 votes

Hi,

thanks for the info - my load was also peaking over 1.5 during office hours, but it was dropping down at night. I have removed the network detail reporting as prescribed and my daytime load has dropped to excellent levels again.

Thanks, David

Attachments:

Clearos_load_20141007.jpg

Clearos_load_20141007.jpg
The reply is currently minimized Show
Accepted Answer

Intelliant

Offline
Friday, October 31 2014, 06:06 AM - #Permalink
Resolved

0 votes

Same issue faced over the last 3+ months on 5 ClearOS servers that I maintain. Network detail report and pmacct removed.

Should fix as per Nick's advise above. Will observe over the next few hours.
The reply is currently minimized Show
Accepted Answer

Bent Are Fikse

Offline
Friday, October 31 2014, 04:05 PM - #Permalink
Resolved

0 votes

Sorry if this has been clarified earlier in this tread, but is this related only to the community or also professional? I got quorious since I've had this problem on 3 community but no problems on professionals yet (3 pcs professionals)...
The reply is currently minimized Show
Accepted Answer

Intelliant

Offline
Friday, October 31 2014, 07:28 PM - #Permalink
Resolved

0 votes

Professional 6.5+ all of them.
The reply is currently minimized Show
Accepted Answer

Andy Godber

Offline
Sunday, April 19 2015, 07:20 AM - #Permalink
Resolved

0 votes

Can folk confirm this never got resolved? Ive just seen that I'm getting the same problem on Pef 6.6
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Sunday, April 19 2015, 07:32 AM - #Permalink
Resolved

0 votes

No, it never got resolved and the app was withdrawn from the marketplace. The best advice is to remove the app
The reply is currently minimized Show
Accepted Answer
Dave Loper

Offline
Sunday, April 19 2015, 08:02 PM - #Permalink
Resolved

1 votes

pmacct still has problems. We might revisit this app in the future but for now it is a distraction to getting ClearOS 7 out the door. For now, Nick's suggestion is what we are recommending. If pmacct is running on your system, please remove it. It should not be installable on ClearOS 6.6.
The reply is currently minimized Show
Accepted Answer

Josh Harding

Offline
Sunday, May 03 2015, 03:56 AM - #Permalink
Resolved

0 votes

service pmacctd stop && yum remove pmacct

If I were Sheldon from Big Bang Theory, that would be a...
The reply is currently minimized Show
Accepted Answer

Josh Harding

Offline
Sunday, May 03 2015, 03:58 AM - #Permalink
Resolved

0 votes

sorry... duplicate post
The reply is currently minimized Show

Your Reply

Please login to post a reply

You will need to be logged in to be able to post a reply. Login using the form on the right or register an account if you are new here.

Community Forums

ClearOS Portal

ClearVM Platform

ClearVM 2 Platform

Forums