varnish反向代理服务安装使用 - wtdig/study GitHub Wiki

一、varnish的安装


1、下载vanish6.0.1版本

cd /usr/software/varnish/

wget http://varnish-cache.org/_downloads/varnish-6.0.1.tgz

2、解压文件

chmod 777 varnish-6.0.1.tgz

tar -zxvf varnish-6.0.1.tgz

3、安装（指定安装路径到/usr/local/varnish）

  安装需要的库

  yum -y install libedit-devel 

  yum -y install pcre-devel 

  yum -y install ncurses-devel

  cd varnish-6.0.1

  ./configure --prefix=/usr/local/varnish

如果出现报错：configure: error: rst2man is needed to build Varnish, please install python-docutils.

  yum install python-docutils

  继续执行./configure --prefix=/usr/local/varnish

如果报错：configure: WARNING: dot not found - build will fail if svg files are out of date.
checking for a Python interpreter with version >= 2.7... none
configure: error: Python >= 2.7 is required.

yum中最新的也是Python 2.6.6，只能下载Python 2.7.9的源代码自己编译安装。

需要安装python

1）安装devtoolset

yum groupinstall "Development tools"

2）安装编译Python需要的包包

yum install zlib-devel
yum install bzip2-devel
yum install openssl-devel
yum install ncurses-devel
yum install sqlite-devel

3）下载并解压Python 2.7.9的源代码

cd /usr/software/python/

wget --no-check-certificate https://www.python.org/ftp/python/2.7.9/Python-2.7.9.tar.xz

chmod 777 Python-2.7.9.tar.xz

tar xf Python-2.7.9.tar.xz

cd Python-2.7.9

4）编译与安装Python 2.7.9

./configure --prefix=/usr/local

make && make altinstall

5）将python命令指向Python 2.7.9

ln -s /usr/local/bin/python2.7 /usr/local/bin/python

6）检查Python版本

sh

sh-4.1# python -V

Python 2.7.9

安装完成之后，继续执行./configure --prefix=/usr/local/varnish

make && make altinstall


4、准备测试文件

cp etc/example.vcl /usr/local/varnish/default.vcl

修改文件

backend default {
    .host = "45.78.9.159";
    .port = "8080";
}

5、启动varnish服务

varnishd -f /usr/local/varnish/default.vcl -s malloc,50M -T 45.78.9.159:2000 -a 0.0.0.0:1111

-f 指定配置文件

-s 指定格式，分配大小为50M

-T 服务的ip和端口

-a 访问的ip和端口

6、启动tomcat进行测试，端口号为8080

7、访问测试：45.78.9.159:1111 访问到tomcat，说明已经反向代理了tomcat

参考资料：

vanish 安装

python 安装

二、vcl的语法

VCL简介
VCL（Varnish Configuration Language）：Varnish配置语言，语法简单，功能强
大，类似于c、perl。主要用来配置如何处理请求和内容的缓存策略。
VCL在执行时，会转换成二进制代码。
VCL文件被分为多个子程序，不同的子程序在不同的时间里执行，比如一个子程序在
接到请求时执行，另一个子程序在接收到后端服务器传送的文件时执行。
n 基本语法介绍
1：用花括号做界定符，使用分号表示声明结束。注释用//、#、/* */
2：赋值（=）、比较（==）、和一些布尔值（！、&&、||），!(取反)等类似c语法
3：支持正则表达式，ACL匹配使用~ 操作
4：不同于C的地方，反斜杠（\）在VCL中没有特殊的含义。只是用来匹配URLs
5：VCL没有用户定义的变量，只能给backend、request、document这些对象的变量赋值，
大部分是手工输入的，而且给这些变量分配值的时候，必须有一个VCL兼容的单位
6：VCL有if，但是没有循环。
7：可以使用set来给request的header添加值，unset 或remove 来删除某个header

1、申明backend

一个backend申明创建和初始化一个backend目标：
backend sishuok {
.host = "www.sishuok.com";
.port = “8080";
}
一个请求可以选择一个Backend：
if (req.http.host ~ "^(www.)?sishuok.com$") {
set req.backend = sishuok;
}
还可以给它设置很多的参数，如下所示：
backend sishuok {
.host = "www.sishuok.com";
.port = “8080";
.connect_timeout = 1s;
.first_byte_timeout = 5s;
.between_bytes_timeout = 2s;
.max_connections=1000;
}
为了避免后端服务器过载，.max_connections 可以设置连接后端服务器得
最大限制数。
在backend中申明的timeout参数可以被覆盖，.connect_timeout 等待连接
后端的时间；.first_byte_timeout 等待从backend传输过来的第一个字符的时
间；.between_bytes_timeout 两个字符的间隔时间。

2、Director

Director：是backend的逻辑分组或backend的集群。主要有随机、循环和DNS几种Director，不同类型
的Director具有不同的算法来选择backend。比如随机的Director示例如下：
director b2 random {
.retries = 5;
{
.backend = b1;//引用已经存在的backend
.weight = 7;
}
{
.backend = { //或者是直接在这里定义backend
.host = "fs2";
}
.weight = 3;
}
}
.retries这个参数指定查找可用后端的次数。默认director中的所有后端的.retries相同。
.weight表示这个后端的权重

随机的Director又分成三种，分别是：random、client、hash，他们采用同样的
随机分发算法，只是种子数值不同，种子数分别采用随机数、客户端id，或者是
缓存的hash（典型如url）。
n 对于client director
你可以通过设置VCL的变量client.identity来区分客户端，值可以从
session cookie 或其它相似的值来获取
n 对于hash director
默认使用URL的hash值，可以通过req.hash 获取到
n round-robin director
它没有什么选项，就是一次循环使用backend，第一个请求用第一个
backend，第二个请求用第二个，以此类推。
如果某个backend出了问题，它会继续尝试下一个，理论上它要尝试完所有
的backend，都不好用的话，才会出错。


DNS director有两种不同的方法来选择后端，一种是random或者round-robin；另一种是使
用.list（list的方式不支持ipv6）：
director directorname dns {
.list = {
.host_header = "www.example.com";
.port = "80";
.connection_timeout = 0.4;
"192.168.15.0"/24;
"192.168.16.128"/25;
}
.ttl = 5m;
.suffix = "internal.example.net";
}
这段代码会制定384个后端，都使用80端口及0.4s的连接超时，.list声明中设置选
项必须在IPS的前面。.ttl定义DNSlookups的时间。

fallback director：选择第一个健康的backend，示例：
director b3 fallback {
{ .backend = www1; }
{ .backend = www2; } // 第一个不好用，才会到这里
{ .backend = www3; } // 前两个都不好用，才会到这里}
n probe（后端探针）：探测后端，确定他们是否健康，返回的状态用req.backend.healthy核对：
backend sishuok {
.host = "www.sishuok.com";
.port = “8080";
.probe = { .url = "/test.jpg";
.timeout = 0.3 s;
.window = 8; //要检查后端服务器的次数
.threshold = 3; //.window里面要有多少polls成功就认为后端是健康的
.initial = 3; //当varnish启动的时候，要确保多少个probe正常
} }
当然，也可以把probe从backend中拿出来单独定义，形如：
backend sishuok{……
.probe=p1;
}
probe p1{……}

可能用到的参数：
.url：访问backend的路径，缺省是”/”
.request：设置详细的请求头，是一些字符串
.window：要检查后端服务器的次数，默认是8
.threshold：.window里面要有多少polls成功就认为后端是健康的，默认是3
.initial：当varnish启动的时候，要确保多少个probe正常，默认和threshold一样
.expected_response：期望的response code，默认是200
.interval：定义probe多久检查一次后端，默认是5秒
.timeout：定义probe的过期时间，默认是2秒
n 也可以指定原始的http请求，形如：
backend sishuok {
.host = "www.sishuok.com";
.port = “8080";
.probe = {
.request ="GET / HTTP/1.1“
"Host: www.foo.bar“
"Connection: close";
}}

3、ACLs

ACLs ：访问控制列表，示例如下：
acl local {
"localhost";
"192.0.2.0"/24;
! "192.0.2.23";
}
如果一个ACL中指定一个主机名，varnish不能解析，他将解析匹配到所有地址。
如果你使用了一个否定标记（！），那么将拒绝匹配所有主机。
下面是一个匹配的示例：
if (client.ip ~ local) {
pipe;
}

4、GRACE模式\Saint模式

GRACE模式
当几个客户端请求同一个页面的时候，varnish只发送一个请求到后端服务器，然后让那个其他几个请
求挂起等待返回结果，返回结果后，复制请求的结果发送给客户端。
如果您的服务每秒有数千万的点击率，那么这个队列是庞大的，没有用户喜欢等待服务器响应。为了解
决这个问题，可以指示varnish去保持缓存的对象超过他们的TTL（就是该过期的，先别删除），并且去提供旧
的内容给正在等待的请求。
为了提供旧的内容，首先我们必须有内容去提供。使用以下VCL，以使varnish保持所有对象超出了他们
的TTL30分钟。
sub vcl_fetch { set beresp.grace = 30m;}
这样，varnish还不会提供旧对象。为了启用varnish去提供旧对象，我们必须在请求上开启它。下面表
示，我们接收15s的旧对象：
sub vcl_recv { set req.grace = 15s;}
你可能想知道，为什么，如果我们无法提供这些对象，我们在缓存中保持这些对象30分钟？如果你开启
健康检查，你可以检查后端是否出问题。如果出问题了，我们可以提供长点时间的旧内容。
if (! req.backend.healthy) { set req.grace = 5m;
} else { set req.grace = 15s;}
所以，总结下，Grace模式解决了两个问题：
1：通过提供旧的内容，避免请求扎堆。
2：如果后端坏了，提供旧的内容。

Saint模式
神圣模式可以让你抛弃一个后端服务器的某个页面，并尝试从其他服务器获取，或
提供缓存中的旧内容。让我们看看如何在VCL中开启：
sub vcl_fetch {
if (beresp.status == 500) {
set beresp.saintmode = 10s;
return(restart);
}
set beresp.grace = 5m;
}
设置beresp.saintmode为10秒时，varnish会不请求该服务器10秒。或多或少可以算
是一个黑名单。restart被执行时，如果我们有其他后端可以提供该内容，varnish会请求
它们。当没有其他后端可用，varnish就会提供缓存中的旧内容。

5、常用函数

在VCL里面，可以使用如下这些内置函数：
hash_data(str)：
增加一个散列值，默认hash_data() 是调用request的host和url
regsub(str，regex，sub)：
用sub来替换指定的目标
regsuball(str，regex，sub)：
用sub替换所有发现的目标
ban_url(regex)：
禁用缓存中url匹配regex的所有对象， ban_url(regex)预计在4.0会去
掉，建议使用ban(expression)
ban(expression)：
禁用缓存中匹配表达式的所有对象，这是一种清空缓存中某些无效内容的
方法

6、常用http头

Cache-Control：指定了缓存如何处理内容。varnish关心max-age参数，并用它来计算对象的TTL。
“Cache-Control:no-cache”是被忽略的。
n Age：varnish添加了一个Age头信息，以指示在Varnish中该对象被保持了多久。你可以通过varnishlog
像下面那样抓出Age：varnishlog -i TxHeader -I ^Age
n Pragma：一个HTTP 1.0服务器可能会发送”Pragma:no-cache”。Varnish忽略这种头信息。在VCL中你
可以很方便的增加对这种头信息的支持，在vcl_fetch中：
if (beresp.http.Pragma ~ "nocache") { pass;}
n Authorization：varnish看到授权头信息时，它会pass该请求。你也可以unset这个头信息
n Cookies：varnish不会缓存来自后端的具有Set-Cookie头信息的对象。同样，如果客户端发送了一个
Cookie头信息，varnish将绕过缓存，直接发给后端。
n Vary：Vary头信息是web服务器发送的，代表什么引起了HTTP对象的变化。可以通过Accept-Encoding这
样的头信息弄明白。当服务器发出”Vary:Accept-Encoding”，它等于告诉varnish，需要对每个来自
客户端的不同的Accept-Encoding缓存不同的版本。所以，如果客户端只接收gzip编码。varnish就不会
提供deflate编码的页面版本。
如果Accept-Encoding字段含有很多不同的编码，比如浏览器这样发送：
Accept-Encodign: gzip,deflate 另一个这样发送:
Accept-Encoding: deflate,gzip 因为Accept-Encoding头信息不同，varnish将保存两种不同
的请求页面。规范Accept-Encoding头信息将确保你的不同请求的缓存尽可能的少，后面有个例子。

一个子程序就是一串可读和可用的代码，子程序在VCL中没有参数，也没有返回值。示例如
下：
sub pipe_if_local {
if (client.ip ~ local) {
pipe;
}
}
n 调用一个子程序，使用子程序的关键字名字，如下所示：call pipe_if_local；
n 有很多默认子程序和varnish的工作流程相关，这些子程序会检查和操作http头文件和各种
各样的请求，决定哪个、哪些请求被使用，如果这些子程序没有被定义，或者没有完成预
定的处理而被终止，控制权将被转交给系统默认的子程序。它们是：
1：vcl_init
当VCL加载时调用，之后加载客户请求。一般用于初始化VMOD模块。
返回值有：ok
表示正常返回值，返回OK后VCL加载。

7、

vcl文件示例

#
# This is an example VCL file for Varnish.
#
# It does not do anything by default, delegating control to the
# builtin VCL. The builtin VCL is called when there is no explicit
# return statement.
#
# See the VCL chapters in the Users Guide at https://www.varnish-cache.org/docs/
# and https://www.varnish-cache.org/trac/wiki/VCLExamples for more examples.

# Marker to tell the VCL compiler that this VCL has been adapted to the
# new 4.0 format.
vcl 4.0;

# Default backend definition. Set this to point to your content server.

import directors;

backend wtdig1 {
    .host = "45.78.9.159";
    .port = "8080";
    .connect_timeout = 10s;
    .first_byte_timeout = 10s;
    .between_bytes_timeout = 10s;
    .max_connections=1000;
}

backend wtdig2 {
    .host = "45.78.9.159";
    .port = "8094";
    .connect_timeout = 10s;
    .first_byte_timeout = 10s;
    .between_bytes_timeout = 10s;
    .max_connections=1000;
}

#  进行分组
sub vcl_init {
    new cluster1 = directors.round_robin();
    cluster1.add_backend(wtdig1);   
    cluster1.add_backend(wtdig2);    
}

#  请求进入该子程序
sub vcl_recv {
    # Happens before we check if we have this in cache already.
    #
    # Typically you clean up the request here, removing cookies you don't need,
    # rewriting the request, etc.
    set req.backend_hint = cluster1.backend();
}

sub vcl_backend_response {
    # Happens after we have read the response headers from the backend.
    #
    # Here you clean the response headers, removing silly Set-Cookie headers
    # and other mistakes your backend does.
}

sub vcl_deliver {
    # Happens when we have all the pieces we need, and are about to send the
    # response to the client.
    #
    # You can do accounting or modifying the final object here.
}