nginx proxy 前端代理 处理time_wait 过多问题

/ 0评 / 1

没事查看了一下服务器的监控数据

发现established少,但是NON_established比较多,我们的业务大概每秒也就几个的访问量,所以established是正确的,但是NON_established却比较多,有异常。

于是登录服务器使用netstat查看了一下:

1
2
3
4
5
6
7
8
9
10
 #netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
CLOSE_WAIT 5
TIME_WAIT 1081
ESTABLISHED 7
# netstat -n | awk '/^tcp/'
tcp6       0      0 127.0.0.1:8887          127.0.0.1:48673         TIME_WAIT  
tcp6       0      0 127.0.0.1:8887          127.0.0.1:48737         TIME_WAIT  
tcp6       0      0 127.0.0.1:8887          127.0.0.1:48704         TIME_WAIT  
tcp6       0      0 127.0.0.1:8887          127.0.0.1:48731         TIME_WAIT  
tcp6       0      0 127.0.0.1:8887          127.0.0.1:48743         TIME_WAIT

大部分都是8887端口的,这是一个跑在docker里面的http服务,前端是使用nginx根据域名进行转发到docker的http服务,大致配置如下

1
2
3
4
5
6
7
8
9
10
11
upstream svr {
    server 127.0.0.1:8887;
}
server {
    listen 80;
    server_name  my.com;
 
 location / {
        proxy_pass  http://svr ;
  }
}

百度了一个这个问题,发现已经有同学给出解决方案了:
https://www.cnblogs.com/QLeelulu/p/3601499.html

Nginx 1.1以上版本的upstream已经支持keep-alive的,所以我们可以开启Nginx proxy的keep-alive来减少tcp连接:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
upstream http_backend {
    server 127.0.0.1:8080;
 
    keepalive 16;
}
 
server {
    ...
 
    location /http/ {
        proxy_pass http://http_backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        ...
    }
}

Nginx keepalive介绍文档: http://nginx.org/cn/docs/http/ngx_http_upstream_module.html#keepalive

可是配置keep-alive之后,TIME_WAIT并没有减少,nginx版本是1.4.6,应该支持keepalive特性的。

1
2
/usr/sbin/nginx -v
nginx version: nginx/1.4.6 (Ubuntu)

我查看了一下服务器的访问日志,很多100.121.1** 开头的访问记录过来,这个是不带域名直接访问服务的,
但是我的nginx配置里配置了多个域名访问,在没有为缺省配置时,则使用第一个配置的域名,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
100.121.139.53 - - [15/May/2019:11:16:41 +0800] "HEAD / HTTP/1.0" 200 0 "-" "-"
100.121.109.203 - - [15/May/2019:11:16:41 +0800] "HEAD / HTTP/1.0" 200 0 "-" "-"
100.121.139.108 - - [15/May/2019:11:16:42 +0800] "HEAD / HTTP/1.0" 200 0 "-" "-"
100.121.139.112 - - [15/May/2019:11:16:42 +0800] "HEAD / HTTP/1.0" 200 0 "-" "-"
100.121.109.221 - - [15/May/2019:11:16:42 +0800] "HEAD / HTTP/1.0" 200 0 "-" "-"
100.121.109.224 - - [15/May/2019:11:16:42 +0800] "HEAD / HTTP/1.0" 200 0 "-" "-"
100.121.139.86 - - [15/May/2019:11:16:42 +0800] "HEAD / HTTP/1.0" 200 0 "-" "-"
100.121.119.4 - - [15/May/2019:11:16:42 +0800] "HEAD / HTTP/1.0" 200 0 "-" "-"
100.121.109.254 - - [15/May/2019:11:16:43 +0800] "HEAD / HTTP/1.0" 200 0 "-" "-"
100.121.110.1 - - [15/May/2019:11:16:43 +0800] "HEAD / HTTP/1.0" 200 0 "-" "-"
100.121.109.179 - - [15/May/2019:11:16:43 +0800] "HEAD / HTTP/1.0" 200 0 "-" "-"
100.121.139.70 - - [15/May/2019:11:16:43 +0800] "HEAD / HTTP/1.0" 200 0 "-" "-"
100.121.139.39 - - [15/May/2019:11:16:43 +0800] "HEAD / HTTP/1.0" 200 0 "-" "-"
100.121.109.167 - - [15/May/2019:11:16:43 +0800] "HEAD / HTTP/1.0" 200 0 "-" "-"
100.121.110.29 - - [15/May/2019:11:16:43 +0800] "HEAD / HTTP/1.0" 200 0 "-" "-"
100.121.119.23 - - [15/May/2019:11:16:43 +0800] "HEAD / HTTP/1.0" 200 0 "-" "-"
 
upstream svr {
    server 127.0.0.1:8887;
}
server {
    listen 80;
    server_name  my2.com;
 
 location / {
        proxy_pass  http://svr ;
  }
}

因此,在my2.com的配置上也按nginx的keepalive配置后,问题解决了。服务器清静了。

1
2
3
4
#netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'  
CLOSE_WAIT 5
TIME_WAIT 2
ESTABLISHED 4

其他:
1. 为什么会有100.121.1** 开头的访问记录:
这个服务器在阿里云上配置了 负载均衡,在负载均衡的agent服务器集群为了探测后端服务器是否健康,每台agent服务器每隔2-3秒就会访问一次后端服务器来检测是否健康
2. 很多文章都是推荐修改tcp_tw_reuse和tcp_tw_recycle内核参数来解决 time_wait过多问题,但是我非常不建议这样做。具体原因参考前面引用的blog文章有描述
记一次TIME_WAIT网络故障
再叙TIME_WAIT

发表评论

电子邮件地址不会被公开。 必填项已用*标注

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据