Forums

Ronald
Ronald
Offline
Resolved
0 votes
Not sure if it is the same problem as other report, but my server Clearos 6.5 is running close to 100% now for a number of days. I am a newby on managing a server, so I have tried to replicate what I found on this forum to test where the load is coming from.

I think it is related to system-mysqld although top tells me that it is not driving the load to 100%, but it is almost continuously on top of top.

top - 23:30:31 up 3:04, 3 users, load average: 2.41, 1.99, 2.17
Tasks: 391 total, 2 running, 389 sleeping, 0 stopped, 0 zombie
Cpu(s): 11.9%us, 2.7%sy, 0.0%ni, 0.0%id, 85.4%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1030072k total, 1016564k used, 13508k free, 1696k buffers
Swap: 2064376k total, 1418040k used, 646336k free, 101332k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2676 system-m 20 0 686m 537m 3116 S 12.6 53.5 5:45.86 system-mysqld
19078 root 20 0 2988 1240 828 R 0.7 0.1 0:00.13 top
16 root 20 0 0 0 0 R 0.3 0.0 0:24.71 kblockd/0
29 root 20 0 0 0 0 S 0.3 0.0 0:39.52 kswapd0
3860 root 20 0 22908 13m 13m S 0.3 1.3 0:15.73 pmacctd
3867 root 20 0 22100 9884 5848 S 0.3 1.0 0:07.91 pmacctd
4457 snort 20 0 297m 13m 3880 S 0.3 1.4 0:34.59 snort
4729 root 20 0 30956 1864 1460 S 0.3 0.2 0:03.47 X
4851 root 20 0 3428 200 164 S 0.3 0.0 0:31.87 snortsam
4872 clearcon 20 0 349m 9228 4108 S 0.3 0.9 0:29.38 gconsole
4937 plex 20 0 203m 6064 1536 S 0.3 0.6 0:11.08 Plex DLNA Serve
1 root 20 0 2948 892 784 S 0.0 0.1 0:01.15 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd


In var/log/mysqld.log I find this information:

140502 20:25:33 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
140502 20:27:48 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
140502 20:27:48 InnoDB: Initializing buffer pool, size = 8.0M
140502 20:27:48 InnoDB: Completed initialization of buffer pool
140502 20:27:49 InnoDB: Started; log sequence number 0 152806009
140502 20:27:49 [Note] Event Scheduler: Loaded 0 events
140502 20:27:49 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution

Not sure if it has anything to do with it, but the dashboard in the UI takes ages to load, in particular the charts on memory and CPU usage.

Any help to get me to the next step is appreciated.

Regards, Ronald
Friday, May 02 2014, 09:37 PM
Share this post:
Responses (77)
  • Accepted Answer

    Saturday, May 03 2014, 12:53 PM - #Permalink
    Resolved
    0 votes
    Hi Ronald,

    Yup..your load is high, but the sluggishness (eg. of dashboard loading) is caused by the extreme wait times on I/O activity (85.4%wa). You're out of memory and hitting swap which only makes the situation worse.

    I'd like to see the output of
    ps afxw


    I'll bet you have more than one script running that is trying to update your reports (parsing out the log file to the system MySQL database).

    If you know how to kill processes, you can do that...a Windows TM reboot would also prevent that.

    Is your software up-2-date? There was a release about 3-4 weeks ago that prevented multiple instances of some of the report scripts to run.

    Finally...more memory never hurts...at 1G...you're right on the edge.

    B.
    The reply is currently minimized Show
  • Accepted Answer

    Ronald
    Ronald
    Offline
    Saturday, May 03 2014, 02:02 PM - #Permalink
    Resolved
    0 votes
    Thanks Ben, I will have a look to insert some more memory., you are right it is small. This is because I started with a small redundant box to see if I like to run a server on Clearos, and as I really like the clearos setup the usage is growing.

    here is my ouput
    [root@stevenaar ~]# ps afxw
    PID TTY STAT TIME COMMAND
    2 ? S 0:00 [kthreadd]
    3 ? S 0:00 \_ [migration/0]
    4 ? S 0:02 \_ [ksoftirqd/0]
    5 ? S 0:00 \_ [migration/0]
    6 ? S 0:00 \_ [watchdog/0]
    7 ? S 0:08 \_ [events/0]
    8 ? S 0:00 \_ [cgroup]
    9 ? S 0:00 \_ [khelper]
    10 ? S 0:00 \_ [netns]
    11 ? S 0:00 \_ [async/mgr]
    12 ? S 0:00 \_ [pm]
    13 ? S 0:00 \_ [sync_supers]
    14 ? S 0:00 \_ [bdi-default]
    15 ? S 0:00 \_ [kintegrityd/0]
    16 ? S 1:30 \_ [kblockd/0]
    17 ? S 0:00 \_ [kacpid]
    18 ? S 0:00 \_ [kacpi_notify]
    19 ? S 0:00 \_ [kacpi_hotplug]
    20 ? S 0:00 \_ [ata_aux]
    21 ? S 0:00 \_ [ata_sff/0]
    22 ? S 0:00 \_ [ksuspend_usbd]
    23 ? S 0:00 \_ [khubd]
    24 ? S 0:00 \_ [kseriod]
    25 ? S 0:00 \_ [md/0]
    26 ? S 0:00 \_ [md_misc/0]
    27 ? S 0:00 \_ [linkwatch]
    28 ? S 0:00 \_ [khungtaskd]
    29 ? D 1:37 \_ [kswapd0]
    30 ? SN 0:00 \_ [ksmd]
    31 ? S 0:00 \_ [aio/0]
    32 ? S 0:00 \_ [crypto/0]
    37 ? S 0:00 \_ [kthrotld/0]
    39 ? S 0:00 \_ [kpsmoused]
    40 ? S 0:00 \_ [usbhid_resumer]
    71 ? S 0:00 \_ [kstriped]
    99 ? S 0:00 \_ [ttm_swap]
    100 ? S< 0:07 \_ [kslowd000]
    101 ? S< 0:07 \_ [kslowd001]
    121 ? S 0:00 \_ [scsi_eh_0]
    122 ? S 0:00 \_ [usb-storage]
    153 ? S 0:00 \_ [scsi_eh_1]
    154 ? S 0:00 \_ [scsi_eh_2]
    157 ? S 0:00 \_ [scsi_eh_3]
    158 ? S 0:01 \_ [usb-storage]
    180 ? S 0:00 \_ [scsi_eh_4]
    181 ? S 0:00 \_ [scsi_eh_5]
    307 ? S 0:09 \_ [kdmflush]
    309 ? S 0:00 \_ [kdmflush]
    326 ? D 0:27 \_ [jbd2/dm-0-8]
    327 ? S 0:00 \_ [ext4-dio-unwrit]
    706 ? S 0:00 \_ [kdmflush]
    750 ? S 0:00 \_ [jbd2/sda1-8]
    751 ? S 0:00 \_ [ext4-dio-unwrit]
    752 ? S 0:00 \_ [jbd2/dm-2-8]
    753 ? S 0:00 \_ [ext4-dio-unwrit]
    754 ? S 0:00 \_ [jbd2/sdc1-8]
    755 ? S 0:00 \_ [ext4-dio-unwrit]
    756 ? S 0:00 \_ [jbd2/sdb-8]
    757 ? S 0:00 \_ [ext4-dio-unwrit]
    810 ? S 0:01 \_ [kauditd]
    925 ? S 0:07 \_ [flush-253:0]
    1194 ? S 0:00 \_ [rpciod/0]
    1797 ? S 0:00 \_ [lockd]
    1798 ? S 0:00 \_ [nfsd4]
    1799 ? S 0:00 \_ [nfsd4_callbacks]
    1800 ? S 0:00 \_ [nfsd]
    1801 ? S 0:00 \_ [nfsd]
    1802 ? S 0:00 \_ [nfsd]
    1803 ? S 0:00 \_ [nfsd]
    1804 ? S 0:00 \_ [nfsd]
    1805 ? S 0:00 \_ [nfsd]
    1806 ? S 0:00 \_ [nfsd]
    1807 ? S 0:00 \_ [nfsd]
    4442 ? S 0:00 \_ [bluetooth]
    25986 ? S 0:00 \_ [flush-8:32]
    1 ? Ss 0:01 /sbin/init
    405 ? S<s 0:00 /sbin/udevd -d
    703 ? S< 0:00 \_ /sbin/udevd -d
    4544 ? S< 0:00 \_ /sbin/udevd -d
    972 ? S<sl 0:05 auditd
    990 ? Ss 0:00 /sbin/portreserve
    997 ? Ssl 0:04 /usr/sbin/nslcd
    1010 ? Sl 0:06 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
    1175 ? Ss 0:00 rpcbind
    1200 ? Ss 0:00 rpc.statd -p 662 -o 2020
    1226 ? Ss 0:00 dbus-daemon --system
    1237 ? S 0:01 avahi-daemon: running [stevenaar.local]
    1238 ? Ss 0:00 \_ avahi-daemon: chroot helper
    1253 ? Ss 0:01 cupsd -C /etc/cups/cupsd.conf
    1278 ? Ss 0:00 /usr/sbin/acpid
    1287 ? Ssl 0:00 hald
    1288 ? S 0:00 \_ hald-runner
    1333 ? S 0:00 \_ hald-addon-input: Listening on /dev/input/event3 /dev/input/event0 /dev/input/event1
    1337 ? S 0:00 \_ hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
    1742 ? Ssl 0:12 /usr/sbin/slapd -h ldap://127.0.0.1/ ldaps://127.0.0.1 ldaps://192.168.2.111/ -u ldap
    1757 ? Ss 0:00 winbindd
    1787 ? S 0:00 \_ winbindd
    3879 ? S 0:00 \_ winbindd
    1792 ? Ss 0:00 rpc.mountd -p 892
    1828 ? Ss 0:00 rpc.idmapd
    1839 ? Ssl 0:12 /usr/sbin/nscd
    1945 ? S 0:00 /bin/sh /usr/libexec/ipsec/_plutorun --debug --uniqueids yes --force_busy no --nocrsend no --strictcrlpolicy no --nat_traversal yes --keep_alive --p
    1950 ? S 0:00 \_ /bin/sh /usr/libexec/ipsec/_plutorun --debug --uniqueids yes --force_busy no --nocrsend no --strictcrlpolicy no --nat_traversal yes --keep_alive
    1952 ? Sl 0:00 | \_ /usr/libexec/ipsec/pluto --nofork --secretsfile /etc/ipsec.secrets --ipsecdir /etc/ipsec.d --use-netkey --uniqueids --nat_traversal --virtual_
    2177 ? S 0:00 | \_ _pluto_adns
    1951 ? S 0:00 \_ /bin/sh /usr/libexec/ipsec/_plutoload --wait no --post
    1946 ? S 0:00 logger -s -p daemon.error -t ipsec__plutorun
    1964 ? S 0:02 /usr/sbin/dnsmasq -s nl
    2005 ? Ssl 3:09 java -Xmx80m -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=10 -Djava.library.path=/opt/pcmonitor/bin/../native -jar /opt/pcmonitor/bin/../lib/pcmonitor
    2187 ? S 0:16 /usr/sbin/snmpd -LS0-6d -Lf /dev/null -p /var/run/snmpd.pid
    2237 ? S 0:02 arpwatch -u arpwatch -e - -i eth1 -f /var/lib/arpwatch/arp_eth1.dat
    2246 ? Ssl 0:53 clearsyncd
    2319 ? Ss 0:00 /usr/sbin/sshd
    31433 ? Ss 0:00 \_ sshd: root@notty
    31437 ? Ss 0:00 | \_ /usr/libexec/openssh/sftp-server
    32358 ? Ss 0:00 \_ sshd: root@notty
    32362 ? Ss 0:00 | \_ /usr/libexec/openssh/sftp-server
    4318 ? Ss 0:00 \_ sshd: root@notty
    4320 ? Ss 0:00 | \_ -bash
    26762 ? Ss 0:00 \_ sshd: root@pts/0
    26826 pts/0 Ss 0:00 \_ -bash
    27573 pts/0 R+ 0:00 \_ ps afxw
    2327 ? Ss 0:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
    2362 ? Ssl 0:50 clamd
    2399 ? S 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=m
    2501 ? Sl 0:30 \_ /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --s
    2553 ? S 0:00 /bin/sh /usr/clearos/sandbox/usr/bin/mysqld_safe --defaults-file=/usr/clearos/sandbox/etc/my.cnf --datadir=/var/lib/system-mysql --socket=/var/lib/sys
    2676 ? Sl 25:33 \_ /usr/clearos/sandbox/usr/libexec/system-mysqld --defaults-file=/usr/clearos/sandbox/etc/my.cnf --basedir=/usr/clearos/sandbox/usr --datadir=/var/l
    3571 ? Ss 0:01 /usr/lib/cyrus-imapd/cyrus-master -d
    3637 ? S 0:00 \_ imapd -s
    3638 ? S 0:00 \_ imapd -s
    3642 ? S 0:00 \_ imapd -s
    3643 ? S 0:00 \_ imapd -s
    3644 ? S 0:00 \_ imapd -s
    3645 ? S 0:00 \_ imapd -s
    27493 ? D 0:00 \_ ctl_cyrusdb -c
    3581 ? Ss 0:00 /usr/sbin/saslauthd -m /var/run/saslauthd -a pam
    3582 ? S 0:00 \_ /usr/sbin/saslauthd -m /var/run/saslauthd -a pam
    3583 ? S 0:00 \_ /usr/sbin/saslauthd -m /var/run/saslauthd -a pam
    3584 ? S 0:00 \_ /usr/sbin/saslauthd -m /var/run/saslauthd -a pam
    3585 ? S 0:00 \_ /usr/sbin/saslauthd -m /var/run/saslauthd -a pam
    3633 ? S 0:02 idled
    3722 ? Ss 0:00 /usr/libexec/postfix/master
    3731 ? S 0:00 \_ qmgr -l -t fifo -u
    20813 ? S 0:00 \_ pickup -l -t fifo -u
    3734 ? Ss 0:03 proftpd: (accepting connections)
    3767 ? Ss 0:00 crond
    2499 ? S 0:00 \_ CROND
    2500 ? Zs 0:00 \_ [sh] <defunct>
    3797 ? Ss 0:00 squid -f /etc/squid/squid.conf
    3800 ? S 0:12 \_ (squid) -f /etc/squid/squid.conf
    3801 ? S 0:00 \_ (pam_auth)
    3802 ? S 0:00 \_ (pam_auth)
    3803 ? S 0:00 \_ (pam_auth)
    3804 ? S 0:00 \_ (pam_auth)
    3805 ? S 0:00 \_ (pam_auth)
    3806 ? S 0:00 \_ (pam_auth)
    3807 ? S 0:00 \_ (pam_auth)
    3808 ? S 0:00 \_ (pam_auth)
    3809 ? S 0:00 \_ (pam_auth)
    3810 ? S 0:00 \_ (pam_auth)
    3811 ? S 0:00 \_ (pam_auth)
    3812 ? S 0:00 \_ (pam_auth)
    3813 ? S 0:00 \_ (pam_auth)
    3815 ? S 0:00 \_ (pam_auth)
    3816 ? S 0:00 \_ (pam_auth)
    3817 ? S 0:00 \_ (unlinkd)
    3829 ? S 0:01 /usr/bin/perl /usr/share/BackupPC/bin/BackupPC -d
    3835 ? SN 0:00 \_ /usr/bin/perl /usr/share/BackupPC/bin/BackupPC_trashClean
    3842 ? Ss 0:05 nmbd -D
    3845 ? S 0:00 \_ nmbd -D
    27571 ? D 0:00 \_ nmbd -D
    3860 ? Ss 1:26 pmacctd: Core Process [default]
    3865 ? S 0:41 \_ pmacctd: MySQL Plugin [inbound]
    26731 ? S 0:00 | \_ pmacctd: MySQL Plugin -- DB Writer [inbound]
    26990 ? S 0:00 | \_ pmacctd: MySQL Plugin -- DB Writer [inbound]
    3867 ? S 0:37 \_ pmacctd: MySQL Plugin [outbound]
    26991 ? S 0:00 \_ pmacctd: MySQL Plugin -- DB Writer [outbound]
    3870 ? Ss 0:00 smbd -D
    3894 ? S 0:00 \_ smbd -D
    12116 ? S 0:06 \_ smbd -D
    3896 ? Ss 0:00 dansguardian-av -c /etc/dansguardian-av/dansguardian.conf
    3898 ? S 0:00 \_ dansguardian-av -c /etc/dansguardian-av/dansguardian.conf
    3899 ? S 0:00 \_ dansguardian-av -c /etc/dansguardian-av/dansguardian.conf
    3900 ? S 0:00 \_ dansguardian-av -c /etc/dansguardian-av/dansguardian.conf
    3901 ? S 0:00 \_ dansguardian-av -c /etc/dansguardian-av/dansguardian.conf
    3902 ? S 0:00 \_ dansguardian-av -c /etc/dansguardian-av/dansguardian.conf
    3903 ? S 0:00 \_ dansguardian-av -c /etc/dansguardian-av/dansguardian.conf
    3904 ? S 0:00 \_ dansguardian-av -c /etc/dansguardian-av/dansguardian.conf
    3905 ? S 0:00 \_ dansguardian-av -c /etc/dansguardian-av/dansguardian.conf
    3907 ? S 0:00 \_ dansguardian-av -c /etc/dansguardian-av/dansguardian.conf
    3908 ? S 0:00 \_ dansguardian-av -c /etc/dansguardian-av/dansguardian.conf
    3914 ? Sl 0:27 /usr/libexec/dropbox/dropbox study
    3917 ? Sl 0:25 /usr/libexec/dropbox/dropbox linda
    4096 ? Ds 2:29 /usr/bin/monitorix -c /etc/monitorix.conf -p /var/run/monitorix.pid
    4310 ? S 0:00 su -s /bin/sh plex -c . /etc/sysconfig/PlexMediaServer; cd /usr/lib/plexmediaserver; ./'Plex Media Server' > /dev/null 2>&1
    4316 ? Ss 0:00 \_ sh -c . /etc/sysconfig/PlexMediaServer; cd /usr/lib/plexmediaserver; ./'Plex Media Server' > /dev/null 2>&1
    4323 ? Sl 0:15 \_ ./Plex Media Server
    4406 ? SNl 2:06 \_ Plex Plug-in [com.plexapp.system] /var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Content
    4937 ? Sl 0:53 \_ /usr/lib/plexmediaserver/Plex DLNA Server
    4350 ? Ss 0:02 /usr/sbin/webconfig
    4355 ? S 0:24 \_ /usr/sbin/webconfig
    4356 ? S 0:28 \_ /usr/sbin/webconfig
    4357 ? S 0:22 \_ /usr/sbin/webconfig
    5026 ? S 0:29 \_ /usr/sbin/webconfig
    5028 ? D 0:22 \_ /usr/sbin/webconfig
    5083 ? D 0:29 \_ /usr/sbin/webconfig
    10681 ? S 0:28 \_ /usr/sbin/webconfig
    10684 ? S 0:26 \_ /usr/sbin/webconfig
    10685 ? S 0:28 \_ /usr/sbin/webconfig
    10985 ? S 0:22 \_ /usr/sbin/webconfig
    13488 ? S 0:25 \_ /usr/sbin/webconfig
    13491 ? S 0:24 \_ /usr/sbin/webconfig
    13492 ? S 0:24 \_ /usr/sbin/webconfig
    17255 ? S 0:07 \_ /usr/sbin/webconfig
    4416 ? Ss 0:00 pptpd
    4457 ? Ssl 2:48 snort -i eth1 -u snort -g snort -D -c /etc/snort.conf
    4511 ? Ssl 1:26 /usr/bin/transmission-daemon -b -t -a *.*.*.* -e /var/log/transmission/transmission.log
    4520 ? Ss 0:00 /usr/bin/openvt -fwc 1 -- /bin/login -f clearconsole
    4528 ? Ss 0:00 \_ login -- clearconsole
    4619 tty1 Ss+ 0:00 \_ -bash
    4652 tty1 Sl+ 0:59 \_ /usr/sbin/tconsole
    4661 ? Ss 0:00 \_ /bin/sh /usr/bin/startx
    4717 ? S 0:00 \_ xinit /var/lib/clearconsole//.xinitrc -- /usr/bin/X :0 -auth /var/lib/clearconsole//.serverauth.4661
    4729 tty7 Ss+ 0:22 \_ /usr/bin/X :0 -auth /var/lib/clearconsole//.serverauth.4661
    4866 ? Ss 0:00 \_ sh /var/lib/clearconsole//.xinitrc
    4871 ? S 0:00 \_ /usr/bin/ratpoison
    4872 ? Sl 2:40 \_ /usr/lib/gconsole/gconsole
    4537 tty2 Ss+ 0:00 /sbin/mingetty /dev/tty2
    4539 tty3 Ss+ 0:00 /sbin/mingetty /dev/tty3
    4542 tty4 Ss+ 0:00 /sbin/mingetty /dev/tty4
    4545 tty5 Ss+ 0:00 /sbin/mingetty /dev/tty5
    4549 tty6 Ss+ 0:00 /sbin/mingetty /dev/tty6
    4552 ? Sl 0:00 /usr/sbin/console-kit-daemon --no-daemon
    5011 ? S 0:00 dbus-launch --autolaunch 6e5ee68e56495e8f74db175c0000001d --binary-syntax --close-stderr
    5018 ? Ss 0:00 /bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session
    5021 ? S 0:00 /usr/libexec/gconfd-2
    2567 ? Ss 0:00 syswatch
    2858 ? Ss 0:31 snortsam /etc/snortsam.conf
    4363 ? Ss 0:00 /usr/sbin/httpd
    4368 ? S 0:00 \_ /usr/sbin/httpd
    4369 ? S 0:00 \_ /usr/sbin/httpd
    4370 ? S 0:00 \_ /usr/sbin/httpd
    4371 ? S 0:00 \_ /usr/sbin/httpd
    4372 ? S 0:00 \_ /usr/sbin/httpd
    4373 ? S 0:00 \_ /usr/sbin/httpd
    4374 ? S 0:00 \_ /usr/sbin/httpd
    4375 ? S 0:00 \_ /usr/sbin/httpd
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, May 03 2014, 04:00 PM - #Permalink
    Resolved
    0 votes
    IDS (snort) and the Proxy (squid) are resource hogs. You are using 1.4GB of swap as well as 1GB of RAM. I'd expect your system to be crawling! I'd suggest at least 4GB RAM and if you can, increasing your swap (not so easy).

    PS Can you put your output between [ code ] and [ /code ] removing the spaces between [ and ]. Then you get nicely formatted posts of screen dumps and files.
    The reply is currently minimized Show
  • Accepted Answer

    Ronald
    Ronald
    Offline
    Saturday, May 03 2014, 04:25 PM - #Permalink
    Resolved
    0 votes
    Ok, as far as I am aware the only updates to the server have been the ones pushed by automatic updates, hence my server is up to date. YUM UPDATE and UPGRADE also return nothing.

    I am wondering what has changed in the latest updates that make the CPU go mad. It is a server that sits in my home environment and I did not open up for load other than myself

    I will try to get some incremental RAM, but what can do in the meantime to take the usage down.

    Regards, Ronald
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, May 03 2014, 05:39 PM - #Permalink
    Resolved
    0 votes
    Try stopping various services and see the effect. You can either look at the dashboard memory usage or do something like:
    egrep 'Mem|Cache|Swap' /proc/meminfo
    The first things I'd look at are the proxy (squid) and the IDS (snort).

    I've just tried playing around with the "top" command. Run "top" then "M". This gives you top by memory usage. Then hit "f" then you can deselect some of the columns and add in things like Swapped Size etc.
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, May 03 2014, 06:35 PM - #Permalink
    Resolved
    0 votes
    In addition to Nick's suggestions, here are two more things to try that might provide useful info...

    What's eating CPU?

    watch -n 5 'ps axf | awk "{ if ( \$3 !~ /S/ ) { print; } }"'

    In the following watch the "si" and "so" columns under swap for excessive swap activity. For changes to the command - the first number is the interval in seconds between updates - in this example 5, second number is the number of iterations - example is 32)

    vmstat -w 5 32

    "man ps" and "man vmstat" for more on these commands...
    The reply is currently minimized Show
  • Accepted Answer

    Ronald
    Ronald
    Offline
    Sunday, May 04 2014, 10:43 AM - #Permalink
    Resolved
    0 votes
    I have been working with the tools both of you proposed an concluded that the 2 major swap and mem consumers are webconfig and system-mysqld. I killed the latter first and immediately mem and swap usage dropped back to normal and so did the waittime. Then I killed webconfig too and not surprisingly everything dropped even further back.

    Then I started both again and after a few minutes the server was crawling again. I used htop to have a further look at the tree of mysql usage and looked in the mysqld.log too. I found that the InnoDB engine was creating most log entries.
    With Google as my friend I found a couple of posts that were talking about the excessive loads that mysql creates on a server and the InnoDb engine in particular. As I have an app running on the server that is using the InnoDB engine I did not want to take the risk of switching it off entirely and work with MyISAM only.

    I copied a couple of lines from a post that seem to address a similar problem as mine and put these in \etc\my.cnf:

    ## If open-files-limit is set very low, MySQL may increase on its own. Either
    ## way, increase this if MySQL gives 'too many open files' errors. Setting
    ## this above 65535 could be unwise (MySQL may crash).
    open-files-limit = 20000

    ### Cache
    thread-cache-size = 16
    table-open-cache = 4096
    table-definition-cache = 512

    ## Generally, it is unwise to set the query cache to be larger than 64-128M
    ## as the costs associated with maintaining the cache outweigh the performance
    ## gains. A far superior solution would be to implement memcached, though this
    ## required modifying the application, among other things.
    query-cache-type = 1
    query-cache-size = 32M
    query-cache-limit = 1M

    ### Per-thread Buffers
    sort-buffer-size = 1M
    read-buffer-size = 1M
    read-rnd-buffer-size = 2M
    join-buffer-size = 1M

    ### Temp Tables
    tmp-table-size = 64M
    max-heap-table-size = 64M

    ### Networking
    back-log = 100
    max-connections = 50
    max-connect-errors = 10000
    max-allowed-packet = 16M
    interactive-timeout = 600
    wait-timeout = 180
    net_read_timeout = 30
    net_write_timeout = 30
    # This value is the size of the listen queue for incoming TCP/IP connections.
    back_log = 128


    #### Storage Engines
    ## Set this to force MySQL to use a particular engine / table-type
    ## for new tables. This setting can still be overridden by specifying
    ## the engine explicitly in the CREATE TABLE statement.
    default-storage-engine = MyISAM

    ## Makes sure MySQL does not start if InnoDB fails to start. This helps
    ## prevent ugly silent failures.
    innodb = FORCE

    ### MyISAM
    ## Not sure what to set this to?
    ## Try running a 'du -sch /var/lib/mysql/*/*.MYI'
    ## This will give you a good estimate on the size of all the MyISAM indexes.
    ## (The buffer may not need to set that high, however)
    key-buffer-size = 2M
    ## This setting controls the size of the buffer that is allocated when
    ## sorting MyISAM indexes during a REPAIR TABLE or when creating indexes
    ## with CREATE INDEX or ALTER TABLE.
    myisam-sort-buffer-size = 2M

    ### InnoDB
    ## Note: While most settings in MySQL can be set at run-time, many InnoDB
    ## variables cannot be set at runtime as require restarting MySQL
    ###
    ## These settings control how much RAM InnoDB will use. Generally, when using
    ## mostly InnoDB tables, the innodb-buffer-pool-size should be as large as
    ## is possible without swapping or starving other processes of RAM. The other
    ## two settings usually do not need to be changed, but can help for very large
    ## datasets.
    innodb-buffer-pool-size = 285M
    innodb-log-buffer-size = 8M


    To be sure I rebooted the server (was not sure if just restarting system-mysqld would do the trick) and from then my machine is OK again. I will try to add some RAM mem to it, but as it is SDRAM PC133 I am nt sure how much I can physically add given the limitation of the board.

    I will edit this message if the problem reoccurs after a couple of days.

    Regards, Ronald
    The reply is currently minimized Show
  • Accepted Answer

    Sunday, May 04 2014, 11:54 AM - #Permalink
    Resolved
    0 votes
    You're beyond what I can really help with now except I can say that /etc/my.cnf is for mysql and /usr/clearos/sandbox/etc/my.cnf is for system-mysql so you may want to make changes to the latter. Also see the note here if you do make any changes.
    The reply is currently minimized Show
  • Accepted Answer

    Ronald
    Ronald
    Offline
    Sunday, May 04 2014, 03:44 PM - #Permalink
    Resolved
    0 votes
    thanks Nick, I applied these to this cnf file too. So far so good. I have not touched the log files or sizes as refered to in the link.
    will keep this forum posted if this does not work.

    Regards, Ronald
    The reply is currently minimized Show
  • Accepted Answer

    Wednesday, May 07 2014, 01:35 PM - #Permalink
    Resolved
    0 votes
    Hi Nick. I was having the same issue. My RAM is low but my company does not want me shutting down the server.
    I did up the max child processes because our user connections were higher than default and that helped big time.

    I was wondering if you know if when I am looking at the GUI System > Resource >Processes and Page output
    Is this " dansguardian child process " counts or overall processes on the actual Linux box?

    And the Page output, is the actual " page swapping " stats so I know if it goes up due to the increased child processes / RAM usage ?
    The reply is currently minimized Show
  • Accepted Answer

    Wednesday, May 07 2014, 04:38 PM - #Permalink
    Resolved
    0 votes
    I'm afraid I've no idea what you are looking at. Are you running the Pro version? I suspect that if I I did know what you were looking at, I wouldn't know the answer.
    The reply is currently minimized Show
  • Accepted Answer

    Wednesday, May 07 2014, 05:47 PM - #Permalink
    Resolved
    0 votes
    It is ClearOS Enterprise 5.2. The Web GUI interface has a System > Resource Reports > page where you can select Processes and Page outputs. I was just wondering if this was " ALL " running processes on our server or just the " dansguardian " child processes. This would be nice if it is the child processes because then I could the history of child process counts instead of the real time,

    " ps aux | grep dansguardian-av | wc -l "

    which only shows me the current processes at that time.
    If the GUI, which shows daily, weekly, monthly etc stats, I can see the max, avg and current child processes which would be very useful.
    The reply is currently minimized Show
  • Accepted Answer

    Wednesday, May 07 2014, 06:23 PM - #Permalink
    Resolved
    0 votes
    Have you looked at the "top" command from earlier in the thread to monitor memory usage by process. It is a surprisingly powerful tool?
    The reply is currently minimized Show
  • Accepted Answer

    Wednesday, May 07 2014, 06:29 PM - #Permalink
    Resolved
    0 votes
    I saw this on a buddy's server just today...Pete told me it was caused by a tool that watches network traffic and imports it to the system-msyql table.

    If you run:
    ps afxw | grep pmacct


    and see 1 or more entries, this is likely what is causing the load.

    For now, I just removed this (and the network report unfortunately) until we get a fix out:

    service pmacctd stop
    service system-mysql stop
    killall -9 pmacctd
    yum remove pmacct
    service system-mysql start


    Load dropped right away to 0 after this.

    B.
    The reply is currently minimized Show
  • Accepted Answer

    Wednesday, May 07 2014, 08:17 PM - #Permalink
    Resolved
    0 votes
    is " top " historical, like I can see averages or max for an entire day, week or month?
    or is it just real time?
    With the web GUI, System > Resource Reports > processes, I can see all that in a graph chart as well as current usage.
    I just cant find out if the " Processes " refers to ALL processes on the server itself, or " dansguardian " Child Processes.

    Also, the response of " network traffic " tool adding to CPU usage is very helpful.

    So thank you both for your input.

    I am still hoping to get an answer on my initial question on the Web interfaces stats.
    The reply is currently minimized Show
  • Accepted Answer

    Wednesday, May 07 2014, 09:00 PM - #Permalink
    Resolved
    0 votes
    top is real time only - I believe. I only found the other options looking at the man pages last weekend. You could check them as well.
    The reply is currently minimized Show
  • Accepted Answer

    zombu2
    zombu2
    Offline
    Monday, June 02 2014, 12:07 AM - #Permalink
    Resolved
    0 votes
    well here is what i did to make system-mysqld behave

    reset your stats and graphs
    system-database reset

    get your password for mysql
    cat /var/clearos/system_database/reports

    then log into mysql
    /usr/clearos/sandbox/usr/bin/mysql -u reports -p reports

    in mysql enter
    alter table network_detail engine=innodb;

    type exit and then restart mysql
    service system-mysqld restart

    been 4 days now and no runaway sql process , the system is nice and calm

    and before anyone asks no i did not uninstall network detail etc.
    The reply is currently minimized Show
  • Accepted Answer

    Evince
    Evince
    Offline
    Monday, June 02 2014, 12:20 PM - #Permalink
    Resolved
    0 votes
    Thanks , Good Information:)
    The reply is currently minimized Show
  • Accepted Answer

    Tuesday, August 05 2014, 09:05 PM - #Permalink
    Resolved
    0 votes
    I've just hit this one :(
    [root@server ~]# ps afxw | grep pmacct
    5559 pts/0 S+ 0:00 \_ grep pmacct
    17054 ? Ss 41:11 pmacctd: Core Process [default]
    17059 ? S 22:56 \_ pmacctd: MySQL Plugin [inbound]
    3351 ? S 0:00 | \_ pmacctd: MySQL Plugin -- DB Writer [inbound]
    4109 ? S 0:00 | \_ pmacctd: MySQL Plugin -- DB Writer [inbound]
    5119 ? S 0:00 | \_ pmacctd: MySQL Plugin -- DB Writer [inbound]
    17062 ? S 19:30 \_ pmacctd: MySQL Plugin [outbound]
    5118 ? S 0:00 \_ pmacctd: MySQL Plugin -- DB Writer [outbound]
    17063 ? Ss 20:19 pmacctd: Core Process [default]
    17065 ? S 1:43 \_ pmacctd: MySQL Plugin [inbound]
    17066 ? S 1:04 \_ pmacctd: MySQL Plugin [outbound]

    Is there any ClearOS fix in the pipeline?
    The reply is currently minimized Show
  • Accepted Answer

    Wednesday, August 06 2014, 06:14 PM - #Permalink
    Resolved
    0 votes
    I could not find a bug report so I have filed bug 1885. In the meanwhile I've removed the app-network-detail-report packages and pmacct. If following Ben's instructions above note that it is system-mysqld and not system-mysql.
    The reply is currently minimized Show
  • Accepted Answer

    Thursday, August 07 2014, 04:31 PM - #Permalink
    Resolved
    0 votes
    Hi Nick,

    This issue should have been resolved in a recent update (here's the tracker link. Does the /var/clearos/network_detail_report/purge_external file exist on your system? If not, then the update was not applied. If so, then something in the update did not work (?).

    Short version: The network detail report table for external connections was not getting purged. This caused the report engine to hammer the MySQL system.
    The reply is currently minimized Show
  • Accepted Answer

    Thursday, August 07 2014, 05:09 PM - #Permalink
    Resolved
    0 votes
    Hi Peter,

    I'll have to reinstall to check. Do I need to clean anything out first?
    The reply is currently minimized Show
  • Accepted Answer

    Thursday, August 07 2014, 05:14 PM - #Permalink
    Resolved
    0 votes
    I've just checked and, strangely, the file does exist even though I've uninstalled the reports!
    The reply is currently minimized Show
  • Accepted Answer

    Thursday, August 07 2014, 06:00 PM - #Permalink
    Resolved
    0 votes
    The should still exist on an uninstall, so that's normal. It looks like the upgrade was performed but the MySQL issue still persisted. We'll keep an eye out for this issue.
    The reply is currently minimized Show
  • Accepted Answer

    Thursday, August 07 2014, 06:12 PM - #Permalink
    Resolved
    0 votes
    I'll reinstall and see what happens. I think it only maxed out one cpu core so I did not really notice it having a huge impact.

    Do you have a comment on zombu2's solution further up the thread?
    The reply is currently minimized Show
  • Accepted Answer

    Thursday, August 07 2014, 06:36 PM - #Permalink
    Resolved
    0 votes
    Bad news :( I again have a 100% load, generally on one core but sometimes split between 2 or 3. Can I help to diagnose?
    The reply is currently minimized Show
  • Accepted Answer

    Intelliant
    Intelliant
    Offline
    Friday, August 08 2014, 08:51 AM - #Permalink
    Resolved
    0 votes
    Me too facing this same issue. Any detail required to help diagnose this shall be provided.
    The reply is currently minimized Show
  • Accepted Answer

    Friday, August 08 2014, 02:36 PM - #Permalink
    Resolved
    0 votes
    Nick Howitt wrote:
    Do you have a comment on zombu2's solution further up the thread?

    It's a different way to get to the same issue -- the network_detail_external table is getting too large.
    The reply is currently minimized Show
  • Accepted Answer

    Friday, August 08 2014, 02:56 PM - #Permalink
    Resolved
    0 votes
    It's normal to see MySQL working hard on an update, but it shouldn't last for more the 30-ish seconds. Every 15 minutes, the pmacct daemon does a data dump while the other reports update every 5 minutes. Are you seeing a sustained 100% usage? What's the 15-minute load? Are you running version 1.5.27 of app-network-detail?
    The reply is currently minimized Show
  • Accepted Answer

    Friday, August 08 2014, 04:39 PM - #Permalink
    Resolved
    0 votes
    Hi Peter,
    Yes I see 100% sustained usage. From top:
    load average: 1.13, 1.06, 1.01
    I have 2 real cores and 2 virtual cores. If I watch this in htop, the system-mysqld load varies between one core at 100% and two cores totalling 100%.

    Yes, I am runnig 1.5..27 reinstalled from yum yesterday afternoon:
    [root@server ~]# rpm -qa | grep app-network-d
    app-network-detail-report-1.5.27-1.v6.noarch
    app-network-detail-report-core-1.5.27-1.v6.noarch
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, August 09 2014, 01:04 PM - #Permalink
    Resolved
    0 votes
    FWIW, following this post my network_detail_external database has 770196 records. [strike]Following Dave Loper's comments in bug 1885 could the purge routine just be purging from network_detail and not network_detail_external?[/strike]

    Also attached is my load average graph. The start of the problem is obvious as is the bit where I temporarily uninstalled the packages.
    http://www.clearfoundation.com/media/kunena/attachments/legacy/images/laod_average.png
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, August 09 2014, 08:31 PM - #Permalink
    Resolved
    0 votes
    Hmm,
    I've just worked out how to browse the reports in phpMyAdmin (where is root's password kept, I had to use reports'?) and, looking at the oldest entry in network_detail_external, it is dated 26 Jul 2014 so it looks like the data deletion routine is working. Is there any way I can do an sql query to count the records by day? I wonder if the 1.5.27 update suddenly increased the number of records being logged.
    The reply is currently minimized Show
  • Accepted Answer

    Sunday, August 10 2014, 02:36 AM - #Permalink
    Resolved
    0 votes
    First: There are now several threads on the system-mysqld excessive cpu usage for reports, and I have posted replies to several of those over the past 6 months. We probably need one thread but which one? I picked this one since it was most recently updated.

    Second: I apologize for the length of this post but I wanted to make all this diagnostic information available.

    Third: System is Community 6.5.0 (Final) with all released updates through today. And I do have the purge file referenced earlier in this thread with a size of 0 and date of Aug 2. (/var/clearos/network_detail_report/purge_external)

    Peter

    Let’s see what top shows hogging the CPU:

    top - 18:00:12 up  5:28,  3 users,  load average: 1.17, 1.17, 1.21
    Tasks: 277 total, 2 running, 275 sleeping, 0 stopped, 0 zombie
    Cpu(s): 21.0%us, 29.6%sy, 0.0%ni, 47.1%id, 2.2%wa, 0.0%hi, 0.1%si, 0.0%st
    Mem: 8050980k total, 7898036k used, 152944k free, 3585296k buffers
    Swap: 8191992k total, 4k used, 8191988k free, 2775176k cached

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    2091 system-m 20 0 2071m 105m 6472 S 199.8 1.3 373:54.90 system-mysqld
    78 root 39 19 0 0 0 S 0.3 0.0 0:17.07 kipmi0
    1746 ldap 20 0 1155m 84m 5284 S 0.3 1.1 0:24.97 slapd
    2192 ftp 20 0 147m 2112 812 S 0.3 0.0 0:07.43 proftpd
    2223 root 20 0 170m 20m 824 S 0.3 0.3 0:28.34 l7-filter
    2869 clearcon 20 0 657m 126m 33m S 0.3 1.6 0:22.56 gconsole
    1 root 20 0 21448 1556 1252 S 0.0 0.0 0:00.59 init
    2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
    . . .


    Two of my 4 3GHz cores are fully consumed by system-mysqld. (Yes I let top run for a while and it stayed at 200%.)

    So my clearos system, on a very fast box, is absolutely loafing (hardly any cpu usage) beyond the reporting system. Why would/should reports do this? It shouldn't.

    Login to MySQL as reports. (If you don't know where to find the password you probably shouldn't be doing this.)

    /usr/clearos/sandbox/usr/bin/mysql -ureports -p???


    Once logged in, select the reports database for your queries:

    use reports;


    Display the tables in the reports schema:

    mysql> show tables;
    +-------------------------+
    | Tables_in_reports |
    +-------------------------+
    | network |
    | network_detail |
    | network_detail_external |
    | proxy |
    | proxy_domains |
    | resource |
    +-------------------------+
    6 rows in set (0.00 sec)


    How many rows are in each of these tables?

    mysql> SELECT table_name, table_rows
    -> FROM INFORMATION_SCHEMA.TABLES
    -> WHERE TABLE_SCHEMA = 'reports';
    +-------------------------+------------+
    | table_name | table_rows |
    +-------------------------+------------+
    | network | 653394 |
    | network_detail | 1617329 |
    | network_detail_external | 885436 |
    | proxy | 2013755 |
    | proxy_domains | 0 |
    | resource | 185377 |
    +-------------------------+------------+
    6 rows in set (0.20 sec)


    Two of these tables have over a million rows. Let’s see how old the oldest record is for each:

    mysql> SELECT timestamp FROM proxy ORDER BY timestamp ASC LIMIT 1;
    +---------------------+
    | timestamp |
    +---------------------+
    | 2013-12-19 10:53:07 |
    +---------------------+
    1 row in set (0.90 sec)
    mysql> SELECT stamp_inserted FROM network_detail ORDER BY stamp_inserted ASC LIMIT 1;
    +---------------------+
    | stamp_inserted |
    +---------------------+
    | 2013-12-21 10:00:00 |
    +---------------------+
    1 row in set (0.49 sec)


    Looks like rows in these tables are not being regularly pruned by the reports purge process as oldest records are from Dec 2013.

    Which tables have indexes? Maybe we are querying large tables without indexes.

    mysql> SELECT DISTINCT
    -> TABLE_NAME,
    -> INDEX_NAME
    -> FROM INFORMATION_SCHEMA.STATISTICS
    -> WHERE TABLE_SCHEMA = 'reports';
    +---------------+------------+
    | TABLE_NAME | INDEX_NAME |
    +---------------+------------+
    | network | PRIMARY |
    | network | iface |
    | network | timestamp |
    | proxy_domains | PRIMARY |
    | proxy_domains | ip |
    | proxy_domains | timestamp |
    | proxy_domains | hostname |
    | resource | PRIMARY |
    +---------------+------------+
    8 rows in set (0.00 sec)


    Interestingly, there are no indexes on the two largest tables.

    Let’s see what queries the system is currently running. Time is number of seconds in current state.

    mysql> show FULL processlist;
    +-----+---------+-----------------+---------+---------+------+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Id | User | Host | db | Command | Time | State | Info |
    +-----+---------+-----------------+---------+---------+------+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | 21 | reports | localhost:42686 | reports | Query | 0 | Locked | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1407603000) = stamp_inserted AND ip_dst='ff02::1:ff4c:29a4' |
    | 68 | reports | localhost:47829 | reports | Query | 2 | Locked | INSERT INTO `network_detail_external` (stamp_updated, stamp_inserted, ip_dst, packets, bytes) VALUES (FROM_UNIXTIME(1407607201), FROM_UNIXTIME(1407606600), 'ff02::1:ff08:ade2', 1, 72) |
    | 115 | reports | localhost:53665 | reports | Query | 1 | Locked | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1407610200) = stamp_inserted AND ip_dst='ff02::1:ff04:df52' |
    | 133 | reports | localhost:54854 | reports | Query | 3 | Locked | INSERT INTO `network_detail_external` (stamp_updated, stamp_inserted, ip_dst, packets, bytes) VALUES (FROM_UNIXTIME(1407611701), FROM_UNIXTIME(1407610800), 'ff02::1:ff26:64f6', 2, 144) |
    | 162 | reports | localhost:59025 | reports | Query | 1 | Locked | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1407613800) = stamp_inserted AND ip_dst='ff02::1:ff15:46d2' |
    | 189 | reports | localhost:33090 | reports | Query | 0 | Locked | INSERT INTO `network_detail_external` (stamp_updated, stamp_inserted, ip_dst, packets, bytes) VALUES (FROM_UNIXTIME(1407616201), FROM_UNIXTIME(1407615000), 'ff02::1:ffef:f2bc', 1, 72) |
    | 210 | reports | localhost:35334 | reports | Query | 1 | Locked | INSERT INTO `network_detail_external` (stamp_updated, stamp_inserted, ip_dst, packets, bytes) VALUES (FROM_UNIXTIME(1407618001), FROM_UNIXTIME(1407616800), 'ff02::1:ffbc:5708', 1, 72) |
    | 236 | reports | localhost:37628 | reports | Query | 1 | Locked | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1407618600) = stamp_inserted AND ip_dst='ff02::1:ff43:ef6c' |
    | 243 | reports | localhost | reports | Query | 0 | NULL | show FULL processlist |
    | 247 | reports | localhost:38735 | reports | Query | 4 | Updating | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+72, stamp_updated=NOW() WHERE FROM_UNIXTIME(1407619800) = stamp_inserted AND ip_dst='ff02::1:ff53:20e' |
    | 256 | reports | localhost:39442 | reports | Query | 2 | Updating | UPDATE network_detail SET ip='??
    ?', hostname='wemoswitchc70.local.lan', username='', device_vendor='', device_type='' WHERE (ip_src='192.168.10.170' OR ip_dst='192.168.10.170') AND ip IS NULL |
    +-----+---------+-----------------+---------+---------+------+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    11 rows in set (0.00 sec)


    So a bunch of queries running. The last two are active updates and sucking up the CPU. The others are inserts and updates blocked while they wait for tables to be unlocked by completion of the two update queries. Adding indexes may help on the updates but may also slow down inserts for large tables...

    I welcome feedback from devs...
    The reply is currently minimized Show
  • Accepted Answer

    Sunday, August 10 2014, 07:31 AM - #Permalink
    Resolved
    0 votes
    I've worked out how to summarise by dates in sql now and for network_detail_external I get:
    mysql> SELECT count(*), DATE(stamp_inserted) DateOnly FROM `network_detail_external` GROUP BY DateOnly;
    +----------+------------+
    | count(*) | DateOnly |
    +----------+------------+
    | 54780 | 2014-07-26 |
    | 54253 | 2014-07-27 |
    | 53489 | 2014-07-28 |
    | 57178 | 2014-07-29 |
    | 54701 | 2014-07-30 |
    | 52011 | 2014-07-31 |
    | 51586 | 2014-08-01 |
    | 54122 | 2014-08-02 |
    | 110018 | 2014-08-03 |
    | 45995 | 2014-08-04 |
    | 52052 | 2014-08-05 |
    | 32959 | 2014-08-06 |
    | 11383 | 2014-08-07 |
    | 56079 | 2014-08-08 |
    | 51203 | 2014-08-09 |
    | 16981 | 2014-08-10 |
    +----------+------------+
    16 rows in set (1.92 sec)
    Unfortunately my history does not go back as far as the 1.5.27 update so I can't prove anything about the latest update. The dip on 07 Aug was when I had removed the network detail report.

    For network_detail I get:
    mysql> SELECT count(*), DATE(stamp_inserted) DateOnly FROM `network_detail` GROUP BY DateOnly;
    +----------+------------+
    | count(*) | DateOnly |
    +----------+------------+
    | 46 | 2014-05-18 |
    | 789 | 2014-05-19 |
    | 862 | 2014-05-20 |
    | 698 | 2014-05-21 |
    | 822 | 2014-05-22 |
    | 746 | 2014-05-23 |
    | 853 | 2014-05-24 |
    | 925 | 2014-05-25 |
    | 1112 | 2014-05-26 |
    | 1016 | 2014-05-27 |
    | 834 | 2014-05-28 |
    | 873 | 2014-05-29 |
    | 915 | 2014-05-30 |
    | 861 | 2014-05-31 |
    | 973 | 2014-06-01 |
    | 856 | 2014-06-02 |
    | 740 | 2014-06-03 |
    | 824 | 2014-06-04 |
    | 782 | 2014-06-05 |
    | 939 | 2014-06-06 |
    | 1027 | 2014-06-07 |
    | 1097 | 2014-06-08 |
    | 868 | 2014-06-09 |
    | 797 | 2014-06-10 |
    | 828 | 2014-06-11 |
    | 824 | 2014-06-12 |
    | 1051 | 2014-06-13 |
    | 1147 | 2014-06-14 |
    | 746 | 2014-06-15 |
    | 826 | 2014-06-16 |
    | 856 | 2014-06-17 |
    | 863 | 2014-06-18 |
    | 831 | 2014-06-19 |
    | 749 | 2014-06-20 |
    | 1016 | 2014-06-21 |
    | 902 | 2014-06-22 |
    | 957 | 2014-06-23 |
    | 668 | 2014-06-24 |
    | 587 | 2014-06-25 |
    | 537 | 2014-06-26 |
    | 662 | 2014-06-27 |
    | 613 | 2014-06-28 |
    | 524 | 2014-06-29 |
    | 612 | 2014-06-30 |
    | 501 | 2014-07-01 |
    | 576 | 2014-07-02 |
    | 446 | 2014-07-03 |
    | 547 | 2014-07-04 |
    | 673 | 2014-07-05 |
    | 796 | 2014-07-06 |
    | 616 | 2014-07-07 |
    | 545 | 2014-07-08 |
    | 666 | 2014-07-09 |
    | 643 | 2014-07-10 |
    | 834 | 2014-07-11 |
    | 948 | 2014-07-12 |
    | 882 | 2014-07-13 |
    | 688 | 2014-07-14 |
    | 652 | 2014-07-15 |
    | 749 | 2014-07-16 |
    | 678 | 2014-07-17 |
    | 647 | 2014-07-18 |
    | 757 | 2014-07-19 |
    | 621 | 2014-07-20 |
    | 739 | 2014-07-21 |
    | 813 | 2014-07-22 |
    | 562 | 2014-07-23 |
    | 531 | 2014-07-24 |
    | 625 | 2014-07-25 |
    | 577 | 2014-07-26 |
    | 410 | 2014-07-27 |
    | 441 | 2014-07-28 |
    | 519 | 2014-07-29 |
    | 515 | 2014-07-30 |
    | 530 | 2014-07-31 |
    | 481 | 2014-08-01 |
    | 668 | 2014-08-02 |
    | 509 | 2014-08-03 |
    | 551 | 2014-08-04 |
    | 639 | 2014-08-05 |
    | 383 | 2014-08-06 |
    | 211 | 2014-08-07 |
    | 768 | 2014-08-08 |
    | 408 | 2014-08-09 |
    | 100 | 2014-08-10 |
    +----------+------------+
    85 rows in set (0.05 sec)
    This looks reasonable but I am getting different results from Peter Finch. My tables seem to get purged and my network_detail_external is much larger than network_detail.
    The reply is currently minimized Show
  • Accepted Answer

    Monday, August 11 2014, 04:43 PM - #Permalink
    Resolved
    0 votes
    I'll try and have a poke around with this tonight :)

    Do you also have the app-network-map installed and configured? from the SQL queries above it maybe tripping over a large number of IPV6 address traffic... Nick do you see similar queries if you run 'show FULL processlist;'

    On a quiet system post install I have none.... to alleviate the 'runaway' nature of the script try increasing the time interval in cron (/etc/cron.d/app-network-detail-report) currently scheduled at 5minute intervals
    */5 * * * * root /usr/sbin/networkdetail2db >/dev/null 2>&1
    12 3 * * * root /usr/sbin/networkdetailpurge >/dev/null 2>&1
    The reply is currently minimized Show
  • Accepted Answer

    Monday, August 11 2014, 05:12 PM - #Permalink
    Resolved
    0 votes
    Hi Tim,

    I do have app-network-map installed. I've had a little browse through system-mysql and I did not notice any ipv6 traffic. I've tried exporting the file to a csv file to have a further look but my version of Excel (2003) won't load it as it is way to big.

    Here is my process list:
    mysql> show FULL processlist;
    +-----+---------+-----------------+---------+---------+------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Id | User | Host | db | Command | Time | State | Info |
    +-----+---------+-----------------+---------+---------+------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | 169 | reports | localhost:56274 | reports | Query | 1 | Updating | UPDATE `network_detail_external` SET packets=packets+2, bytes=bytes+244, stamp_updated=NOW() WHERE FROM_UNIXTIME(1407775200) = stamp_inserted AND ip_dst='49.205.148.242' |
    | 170 | reports | localhost:56276 | reports | Query | 1 | Locked | UPDATE `network_detail_external` SET packets=packets+1, bytes=bytes+131, stamp_updated=NOW() WHERE FROM_UNIXTIME(1407775200) = stamp_inserted AND ip_src='88.222.186.3' |
    | 179 | reports | localhost | NULL | Query | 0 | NULL | show FULL processlist |
    +-----+---------+-----------------+---------+---------+------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    3 rows in set (0.00 sec)
    Note it is shorter than when I looked yesterday, perhaps because I rebooted earlier to day to move the kernel on, but my 15min load average is now up to 1.17 (out of 4).

    I'll try your approach to cron first but as I am on holiday soon I may have to remove it again.

    Does your post indicate you are also having the same issue?

    [edit]
    I've just loaded the table in Access (which I don't really know) and I am really surprised at the amount of traffic logged and the amount of single byte traffic (pings?) - 526775 out of 878400 packets.
    [/edit]

    [edit2]
    Also 235684 records with NULL hostname
    [/edit2]

    [edit3]
    Changing the cron job did nothing (or very little), and, yes, I did restart crond after the edit.
    I notice /var/log/system is getting a lot of:
    Aug 11 19:20:01 server networkdetail2db: Unable to start script - currently running.
    Aug 11 19:40:01 server networkdetail2db: Unable to start script - currently running.
    Aug 11 19:50:01 server networkdetail2db: Unable to start script - currently running


    Looking at the system log it seems like one error message is missed after about 50 mins. This means that the script is taking that long to execute!
    [/edit3]
    The reply is currently minimized Show
  • Accepted Answer

    Monday, August 11 2014, 09:53 PM - #Permalink
    Resolved
    0 votes
    Hi Nick, no I don't have the same symptoms here, but my table is only small so far.

    FYI you can browse the system-database contents using the phpMyAdmin web interface (https://clearosip:81/mysql). This is a new feature in ClearOS 6.6 beta 1, select the 'system database' from the drop down and login with your system-mysql root password.

    I also have lots of IP entries in network_detail_external but not as much as yours... do you by any chance also host torrents? this might explain the large growth of very small packet connections
    The reply is currently minimized Show
  • Accepted Answer

    Monday, August 11 2014, 10:36 PM - #Permalink
    Resolved
    0 votes
    Some more poking around... I also see lots of NULL IP records

    I've been experimenting with adding a table index to the mysql query columns to improve the query performance. My hunch is that with large amounts of TCP connections the database size grows rapidly and so does the time taken to query and update the host information in it - hence my question about Torrents. The queries appear to usually use WHERE stamp_inserted, ip_dst OR ip_src... but as I don't have the symptoms I can't tell if these improve things

    If you are happy to test could you try running the following on the reports database?
    ALTER TABLE `network_detail_external` ADD INDEX(`ip_src`);
    ALTER TABLE `network_detail_external` ADD INDEX(`ip_dst`);
    ALTER TABLE `network_detail_external` ADD INDEX(`stamp_inserted`);
    The reply is currently minimized Show
  • Accepted Answer

    Monday, August 11 2014, 11:02 PM - #Permalink
    Resolved
    0 votes
    Something else to try...

    modifying the pmacctd daemon (I incorrectly thought it was the cron job earlier but that just calls a script to update the mappings in network_detail which is by comparison much smaller than network_detail_external)

    With reference to:-
    http://wiki.pmacct.net/OfficialExamples

    You should have two generate configlets by ClearOS in /etc/pmacctd/pmacctd_ethX.conf - where ethX is your WAN and will log to network_detail_external

    In here are two parameters, sql_refresh_time: 900, sql_history: 10m... try reducing the first to say 600 so that only INSERTS and not UPDATES are carried out once per SQL timeslot...
    Alternatively drop it right down to 60, which will reduce the amount of traffic kept in memory (at the expense of more frequent but smaller database inserts) to try and reduce the size of data being dumped into system-mysql at any one time

    You may need to run 'service pmacctd restart' to implement
    The reply is currently minimized Show
  • Accepted Answer

    Monday, August 11 2014, 11:43 PM - #Permalink
    Resolved
    0 votes
    P.S pmacctd daemon gobbles up CPU... on a 50Mbps download from my LAN it accounts for nearly 50% of my CPU time, cumulatively more than snort which is doing packet inspection as well!
    18403 snort     30  10  469m 174m 4152 R 43.1  3.6   6:19.41 snort
    17679 squid 30 10 158m 86m 3664 R 29.2 1.8 2:09.92 squid
    13389 root 20 0 61156 10m 9968 S 19.2 0.2 0:46.42 pmacctd
    13398 root 20 0 64692 14m 4140 S 10.6 0.3 0:25.17 pmacctd
    13394 root 20 0 64692 14m 4140 S 10.3 0.3 0:24.75 pmacctd
    30272 dansguar 30 10 128m 16m 1340 R 10.0 0.3 0:14.85 dansguardian-av
    13397 root 20 0 61156 10m 9.8m S 9.0 0.2 0:10.88 pmacctd


    EDIT: adding "plugin_buffer_size: 1024" to both pmacctd_ethX.conf files seems to have significantly reduced the usage! as per the FAQ here. Some room for tuning?
    http://wiki.pmacct.net/OfficialFAQs
    The reply is currently minimized Show
Your Reply