OpenResty Lua脚本实战 反爬虫限制IP访问频率 home 编辑时间 2022/10/19 ![](/api/file/getImage?fileId=634f60bfda740500130155d6) <br><br> ## 前言 结合上一篇 [OpenResty yum 方式安装 支持 CentOS 7和8](https://leanote.zzzmh.cn/blog/post/admin/634e1094da74050013015530) 这一篇学习 `OpenResty` 最重要的一个部分 `Lua` 脚本 实战写一个限制IP频率等简单反爬虫脚本 先说明这样的脚本网上有很多例子,随便 `Ctrl + C` 就能实现这功能,但我希望的是学会 `Lua` 语法,以便以后自由实现各种功能 <br><br> ## 折腾 为防止搞奔溃线上环境,这一篇仅在本地docker中完成编写和测试 <br> **Docker安装OpenResty** ```shell docker run -d -p 8000:80\ -e "TZ=Asia/Shanghai"\ -m 200M --oom-kill-disable --memory-swap=-1\ --name openresty\ openresty/openresty ``` <br> 访问 [localhost:8000](http://localhost:8000/) 看到下面这个画面说明启动成功 ![](/api/file/getImage?fileId=634f63dbda740500130155d9) <br> 接下来遇到一个问题,操作openresty内部的文件不太方便,这里选择复制内部文件到外面,再重新开容器映射到里面 (我这里外面是linux系统的虚拟机) ```shell # 复制内部目录到外部 停止并删除容器 docker cp openresty:/usr/local/openresty /home/docker/ sudo chmod -R 777 openresty docker stop openresty docker rm openresty # 重新启动容器 将外部目录映射给内部 这里顺便把redis和mysql关联一下 docker run -d -p 8000:80\ -e "TZ=Asia/Shanghai"\ -m 200M --oom-kill-disable --memory-swap=-1\ -v /home/docker/openresty:/usr/local/openresty\ --link mysql\ --link redis\ --name openresty\ openresty/openresty ``` <br> 到这里还发现一个小问题,直接操作 `nginx.conf` 没效果,后来发现容器内部有一个 `default.conf` 需要先删除 ```shell docker exec -it openresty bash cd /etc/nginx/conf.d/ rm -f default.conf ``` <br> 至此 就可以在 `/home/docker/openresty/` 目录下操作,然后浏览器访问 [localhost:8000](http://localhost:8000/),看效果,有些操作需要重启生效 <br> 找到nginx配置文件,稍微精简了一下 `/home/docker/openresty/nginx/conf/nginx.conf` ```shell user root; worker_processes 8; events { worker_connections 1024; } http { include mime.types; default_type text/html; keepalive_timeout 65; include /etc/nginx/conf.d/*.conf; server { listen 80; server_name localhost; location / { root html; index index.html index.htm; } error_page 500 502 503 504 /50x.html; location = /50x.html { root html; } } } ``` <br> 修改后需要重新加载配置文件才会生效 ```shell # 进入容器 docker exec -it openresty bash # 测试配置文件是否正确 openresty -t # 如果没有异常 可以重新加载配置文件 openresty -s reload ``` <br> 先实现个helloword证明已跑通 在nginx目录下新建一个lua文件夹 再在文件夹内新建一个lua文件 `hello.lua` ```shell server { location / { # 执行lua脚本文件 content_by_lua_file lua/hello.lua; # root html; # index index.html index.htm; } ... } ``` <br> `nginx/lua/hello.lua` ```lua ngx.say('Hello Lua!'); ``` <br> 最后在容器里重新加载配置文件 ```shell docker exec -it openresty bash openresty -s reload ``` 访问[localhost:8000](http://localhost:8000/) 即可看到 `Hello Lua!` <br><br> 接下来可以直接快进到写本篇主题:反爬虫的脚本实现 思路如下图 (例子用的是60秒20次的限制,也可以改成任意时间任意次数) ![](/api/file/getImage?fileId=634f7ad3da740500130155e7) <br><br> `nginx.conf` ```shell server { location / { # 执行lua脚本文件 content_by_lua_file lua/access_limit.lua; # root html; # index index.html index.htm; } ... } ``` `access_limit.lua` ```lua local redis_iresty = require "resty.redis_iresty" local redis = redis_iresty:new() local key = ngx.md5(ngx.var.remote_addr) local count = redis:get(key); if count then -- 大于单位时间的限制次数则给与500错误 测试暂定200次60秒 if tonumber(count) > 200 then ngx.exit(ngx.HTTP_INTERNAL_SERVER_ERROR) else -- 次数仍在正常范围则只 +1 记录次数 redis:incr(key) end else -- 首次访问次数记为1 redis:set(key, 1) -- 过期时间60秒 redis:expire(key, 60) end -- 在测试页面打印key value ttl 3个值 ngx.say(key..'<br>'..redis:get(key)..'<br>'..redis:ttl(key)) ``` <br><br> **这里需要注意!** `local redis = require "resty.redis_iresty"` 这行代码默认是跑不通的,因为不存在这个库,这是一位up主自己实现并封装的库,需要预先加入到resty文件夹中。 参考 [Redis 接口的二次封装 (gitbooks)](https://moonbingbing.gitbooks.io/openresty-best-practices/content/redis/out_package.html) 具体操作是 在目录 `openresty/lualib/resty/` 下 新增一个文件 `redis_iresty.lua` 内容如下,注意:其中redis连接的IP地址和PORT端口,需要改为你实际情况的 ```lua local redis_c = require "resty.redis" local ok, new_tab = pcall(require, "table.new") if not ok or type(new_tab) ~= "function" then new_tab = function (narr, nrec) return {} end end local _M = new_tab(0, 155) _M._VERSION = '0.01' local commands = { "append", "auth", "bgrewriteaof", "bgsave", "bitcount", "bitop", "blpop", "brpop", "brpoplpush", "client", "config", "dbsize", "debug", "decr", "decrby", "del", "discard", "dump", "echo", "eval", "exec", "exists", "expire", "expireat", "flushall", "flushdb", "get", "getbit", "getrange", "getset", "hdel", "hexists", "hget", "hgetall", "hincrby", "hincrbyfloat", "hkeys", "hlen", "hmget", "hmset", "hscan", "hset", "hsetnx", "hvals", "incr", "incrby", "incrbyfloat", "info", "keys", "lastsave", "lindex", "linsert", "llen", "lpop", "lpush", "lpushx", "lrange", "lrem", "lset", "ltrim", "mget", "migrate", "monitor", "move", "mset", "msetnx", "multi", "object", "persist", "pexpire", "pexpireat", "ping", "psetex", "psubscribe", "pttl", "publish", --[[ "punsubscribe", ]] "pubsub", "quit", "randomkey", "rename", "renamenx", "restore", "rpop", "rpoplpush", "rpush", "rpushx", "sadd", "save", "scan", "scard", "script", "sdiff", "sdiffstore", "select", "set", "setbit", "setex", "setnx", "setrange", "shutdown", "sinter", "sinterstore", "sismember", "slaveof", "slowlog", "smembers", "smove", "sort", "spop", "srandmember", "srem", "sscan", "strlen", --[[ "subscribe", ]] "sunion", "sunionstore", "sync", "time", "ttl", "type", --[[ "unsubscribe", ]] "unwatch", "watch", "zadd", "zcard", "zcount", "zincrby", "zinterstore", "zrange", "zrangebyscore", "zrank", "zrem", "zremrangebyrank", "zremrangebyscore", "zrevrange", "zrevrangebyscore", "zrevrank", "zscan", "zscore", "zunionstore", "evalsha" } local mt = { __index = _M } local function is_redis_null( res ) if type(res) == "table" then for k,v in pairs(res) do if v ~= ngx.null then return false end end return true elseif res == ngx.null then return true elseif res == nil then return true end return false end -- 这里要改为你实际情况的地址和端口 function _M.connect_mod( self, redis ) redis:set_timeout(self.timeout) return redis:connect("172.0.0.1", 6379) end function _M.set_keepalive_mod( redis ) -- put it into the connection pool of size 100, with 60 seconds max idle time return redis:set_keepalive(60000, 1000) end function _M.init_pipeline( self ) self._reqs = {} end function _M.commit_pipeline( self ) local reqs = self._reqs if nil == reqs or 0 == #reqs then return {}, "no pipeline" else self._reqs = nil end local redis, err = redis_c:new() if not redis then return nil, err end local ok, err = self:connect_mod(redis) if not ok then return {}, err end redis:init_pipeline() for _, vals in ipairs(reqs) do local fun = redis[vals[1]] table.remove(vals , 1) fun(redis, unpack(vals)) end local results, err = redis:commit_pipeline() if not results or err then return {}, err end if is_redis_null(results) then results = {} ngx.log(ngx.WARN, "is null") end -- table.remove (results , 1) self.set_keepalive_mod(redis) for i,value in ipairs(results) do if is_redis_null(value) then results[i] = nil end end return results, err end function _M.subscribe( self, channel ) local redis, err = redis_c:new() if not redis then return nil, err end local ok, err = self:connect_mod(redis) if not ok or err then return nil, err end local res, err = redis:subscribe(channel) if not res then return nil, err end res, err = redis:read_reply() if not res then return nil, err end redis:unsubscribe(channel) self.set_keepalive_mod(redis) return res, err end local function do_command(self, cmd, ... ) if self._reqs then table.insert(self._reqs, {cmd, ...}) return end local redis, err = redis_c:new() if not redis then return nil, err end local ok, err = self:connect_mod(redis) if not ok or err then return nil, err end local fun = redis[cmd] local result, err = fun(redis, ...) if not result or err then -- ngx.log(ngx.ERR, "pipeline result:", result, " err:", err) return nil, err end if is_redis_null(result) then result = nil end self.set_keepalive_mod(redis) return result, err end for i = 1, #commands do local cmd = commands[i] _M[cmd] = function (self, ...) return do_command(self, cmd, ...) end end function _M.new(self, opts) opts = opts or {} local timeout = (opts.timeout and opts.timeout * 1000) or 1000 local db_index= opts.db_index or 0 return setmetatable({ timeout = timeout, db_index = db_index, _reqs = nil }, mt) end return _M ``` <br> 这里需要重启就不说了,反正每次改完 `nginx.conf` 或引用的 `lua` 文件,都需要 `openresty -s reload` 一下 <br> 接下来 访问 [localhost:8000](http://localhost:8000/) 查看最终效果 前200次是 状态200 展示 key + value + ttl 第201次开始是 状态500 错误页面 之后无论如何访问都是500 直到1分钟计时结束 再次访问回到200 展示 key + value + ttl 大功告成 <br><br> <br><br> 补充发现的2个小问题 1 首次访问数值是1,之后每次访问都是+2,直到结束, 这个问题藏得挺深的,通过查看openresty日志才能看到 ```shell # 查看日志命令 docker logs openresty # 返回的日志 10.0.2.2 - - [20/Oct/2022:14:18:27 +0800] "GET / HTTP/1.1" 200 55 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" 10.0.2.2 - - [20/Oct/2022:14:18:27 +0800] "GET /favicon.ico HTTP/1.1" 200 55 "http://localhost:8000/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" 10.0.2.2 - - [20/Oct/2022:14:18:28 +0800] "GET / HTTP/1.1" 200 55 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" 10.0.2.2 - - [20/Oct/2022:14:18:29 +0800] "GET /favicon.ico HTTP/1.1" 200 55 "http://localhost:8000/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" ``` 可以看到每次访问都产生2条,除了访问根目录,还会请求一次 `favicon.ico` 网页图标,而我们nginx的配置中 / 为接受所有请求,所有会立即被触发2次 **最终打死不改版** `nginx.conf` ```shell server { location / { # 执行lua脚本文件 content_by_lua_file lua/access_limit.lua; # root html; # index index.html index.htm; } location /favicon.ico { return 404; } ... } ``` `access_limit.lua` ```lua local redis_iresty = require "resty.redis_iresty" local redis = redis_iresty:new() local key = ngx.md5(ngx.var.remote_addr) local count = redis:get(key); if count then -- 大于单位时间的限制次数则给与500错误 测试暂定200次60秒 if tonumber(count) > 200 then ngx.exit(ngx.HTTP_INTERNAL_SERVER_ERROR) else -- 次数仍在正常范围则只 +1 记录次数 redis:incr(key) end else -- 首次访问次数记为1 redis:incr(key) -- 过期时间60秒 redis:expire(key, 60) end -- 在测试页面打印key value ttl 3个值 ngx.say(key..'<br>'..redis:get(key)..'<br>'..redis:ttl(key)) ``` <br> 2 Win 11 VirtualBox下安装的Docker,居然是光盘启动的系统,磁盘只是一个挂载,如果把数据存在home目录下重启就会消失,怪不得只需要1.8G,我前文是把Docker中的数据映射到/home/docker下,后来发现重启就消失了,就是这个原因,后续改为映射到 `/mnt/sda1/var/lib/docker/data`,其中data是新建文件夹,至此重启也不会丢失,但有个先后次序问题,导致openresty不能在这里开机启动,还需要手动start一下 <br> 顺便怀念一波在Linux下开发的日子,就没有这种破事,同一台硬件,速度翻2~3倍,指哪打哪。 <br><br> ## END 参考 [OpenResty全套课程 (bilibili)](https://www.bilibili.com/video/BV1nU4y1x7Lt) [分布式--OpenResty+lua+Redis实现限流与防爬虫 (csdn)](https://blog.csdn.net/qq_24000367/article/details/125536798) 送人玫瑰,手留余香 赞赏 Wechat Pay Alipay simpleMDE.js 轻松打造一个类似简书的纯前端MarkDown语法编辑器 OpenResty yum 方式安装 支持 CentOS 7和8