TCP tuning
- For servers that are serving up huge numbers of concurent sessions, there are some tcp options that should probabaly be enabled. With a large # of clients doing their best to kill the server, its probabaly not uncommon for the server to have 20000 or more open sockets.
In order to optimize TCP performace for this situation, I would suggest tuning the following parameters.
1 |
echo 1024 65000 > /proc/sys/net/ipv4/ip_local_port_range |
- Allows more local ports to be available. Generally not a issue, but in a benchmarking scenario you often need more ports available. A common example is clients running
ab
or http_load
or similar software.In the case of firewalls, or other servers doing NAT or masquerading, you may not be able to use the full port range this way, because of the need for high ports for use in NAT.
Increasing the amount of memory associated with socket buffers can often improve performance. Things like NFS in particular, or apache setups with large buffer configured can benefit from this.
1 2 |
echo 262143 > /proc/sys/net/core/rmem_max echo 262143 > /proc/sys/net/core/rmem_default |
- This will increase the amount of memory available for socket input queues. The “wmem_*” values do the same for output queues.
Note: With 2.4.x kernels, these values are supposed to “autotune” fairly well, and some people suggest just instead changing the values in:
1 2 |
/proc/sys/net/ipv4/tcp_rmem /proc/sys/net/ipv4/tcp_wmem |
- There are three values here, “min default max”.
These reduce the amount of work the TCP stack has to do, so is often helpful in this situation.
1 2 3 4 |
echo 0 > /proc/sys/net/ipv4/tcp_sack echo 0 > /proc/sys/net/ipv4/tcp_timestamps <a name="fds"></a><b></b> |
File Limits and the like
1 |
<a name="fds"></a><b></b> |
Open tcp sockets, and things like apache are prone to opening a large amount of file descriptors. The default number of available FD is 4096, but this may need to be upped for this scenario.
The theorectial limit is roughly a million file descriptors, though I’ve never been able to get close to that many open.
I’d suggest doubling the default, and trying the test. If you still run out of file descriptors, double it again.
For example:
1 2 |
echo 128000 > /proc/sys/fs/inode-max echo 64000 > /proc/sys/fs/file-max |
- and as root:
1 |
ulimit -n 64000 |
Note: On 2.4 kernels, the “inode-max” entry is no longer needed.
You probabaly want to add these to /etc/rc.d/rc.local so they get set on each boot.
There are more than a few ways to make these changes “sticky”. In Red Hat Linux, you can you /etc/sysctl.conf and /etc/security/limits.conf to set and save these values.
If you get errors of the variety “Unable to open file descriptor” you definately need to up these values.
You can examine the contents of /proc/sys/fs/file-nr to determine the number of allocated file handles, the number of file handles currently being used, and the max number of file handles.
1 |
<a name="procs"></a><b></b> |
Process Limits
1 |
<a name="procs"></a> |
- For heavily used web servers, or machines that spawn off lots and lots of processes, you probabaly want to up the limit of processes for the kernel.
Also, the 2.2 kernel itself has a max process limit. The default values for this are 2560, but a kernel recompile can take this as high as 4000. This is a limitation in the 2.2 kernel, and has been removed from 2.3/2.4.
The values that need to be changed are:
If your running out how many task the kernel can handle by default, you may have to rebuild the kernel after editing:
1 |
/usr/src/linux/include/linux/tasks.h |
- and change:
1 2 |
#define NR_TASKS 2560 /* On x86 Max 4092, or 4090 w/APM configured.*/ |
- to
1 2 |
#define NR_TASKS 4000 /* On x86 Max 4092, or 4090 w/APM configured.*/ |
- and:
1 |
#define MAX_TASKS_PER_USER (NR_TASKS/2) |
- to
1 |
#define MAX_TASKS_PER_USER (NR_TASKS) |
Then recompile the kernel.
also run:
1 |
ulimit -u 4000 |
Note: This process limit is gone in the 2.4 kernel series.
1 |
<a name="procs"></a><a name="threads"></a><b></b> |
Threads
1 |
<a name="threads"></a><b></b> |
Limitations on threads are tightly tied to both file descriptor limits, and process limits.
Under Linux, threads are counted as processes, so any limits to the number of processes also applies to threads. In a heavily threaded app like a threaded TCP engine, or a java server, you can quickly run out of threads.
For starters, you want to get an idea how many threads you can open. The thread-limit
util mentioned in the Tuning Utilities section is probabaly as good as any.
The first step to increasing the possible number of threads is to make sure you have boosted any process limits as mentioned before.
There are few things that can limit the number of threads, including process limits, memory limits, mutex/semaphore/shm/ipc limits, and compiled in thread limits. For most cases, the process limit is the first one to run into, then the compiled in thread limits, then the memory limits.
To increase the limits, you have to recompile glibc. Oh fun!. And the patch is essentially two lines!. Woohoo!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
--- ./linuxthreads/sysdeps/unix/sysv/linux/bits/local_lim.h.akl Mon Sep 4 19:37:42 2000 +++ ./linuxthreads/sysdeps/unix/sysv/linux/bits/local_lim.h Mon Sep 4 19:37:56 2000 @@ -64,7 +64,7 @@ /* The number of threads per process. */ #define _POSIX_THREAD_THREADS_MAX 64 /* This is the value this implementation supports. */ -#define PTHREAD_THREADS_MAX 1024 +#define PTHREAD_THREADS_MAX 8192 /* Maximum amount by which a process can descrease its asynchronous I/O priority level. */ --- ./linuxthreads/internals.h.akl Mon Sep 4 19:36:58 2000 +++ ./linuxthreads/internals.h Mon Sep 4 19:37:23 2000 @@ -330,7 +330,7 @@ THREAD_SELF implementation is used, this must be a power of two and a multiple of PAGE_SIZE. */ #ifndef STACK_SIZE -#define STACK_SIZE (2 * 1024 * 1024) +#define STACK_SIZE (64 * PAGE_SIZE) #endif /* The initial size of the thread stack. Must be a multiple of PAGE_SIZE. * */ |
Now just patch glibc, rebuild, and install it. ;-> If you have a package based system, I seriously suggest making a new package and using it.
Some info how to do this are Jlinux.org. They describe how to increase the number of threads so Java apps can use them.
1 |
<a name="nfs"></a><b></b> |
NFS
1 |
<a name="nfs"></a><b></b> |
A good resource on NFS tuning on linux is the linux NFS HOW-TO. Most of this info is gleaned from there.
But the basic tuning steps include:
Try using NFSv3 if you are currently using NFSv2. There can be very significant performance increases with this change.
Increasing the read write block size. This is done with the rsize and wsize mount options. They need to the mount options used by the NFS clients. Values of 4096 and 8192 reportedly increase performance alot. But see the notes in the HOWTO about experimenting and measuring the performance implications. The limits on these are 8192 for NFSv2 and 32768 for NFSv3
Another approach is to increase the number of nfsd threads running. This is normally controlled by the nfsd init script. On Red Hat Linux machines, the value “RPCNFSDCOUNT” in the nfs init script controls this value. The best way to determine if you need this is to experiment. The HOWTO mentions a way to determin thread usage, but that doesnt seem supported in all kernels.
Another good tool for getting some handle on NFS server performance is nfsstat
. This util reads the info in /proc/net/rpc/nfs[d] and displays it in a somewhat readable format. Some info intended for tuning Solaris, but useful for it’s description of the nfsstat format
See also the tcp tuning info