NGINX Lua Scraping Protection

If you have followed the guide to configuring NGINX with Lua, you can now use Application Integrity's scraping protection module.

Prerequisites

Getting Started

Your account manager can provide a software package which includes the following files:

  • README.md - contains this information and a link to this documentation
  • example.block.conf.nginx - an example nginx config file demonstrating blocking/redirecting with NGINX and Lua
  • example.scraping.conf.nginx - an example nginx config file demonstrating scraping protection with NGINX and Lua
  • lua-plugins/injector.lua - injects a script into the <head> of an HTML document
  • lua-plugins/mitigation.lua - requests an ACTION from the mitigation API
  • lua-plugins/mitigation/ - contains the plugin modules
  • lua-plugins/scraping_check.lua - contains the code to check whether the scraping protection has passed
  • lua-plugins/scraping_guard.lua - contains the code to protect and endpoint
  • lua-plugins/xss.lua - helper to protect redirects from attacks
  • lua-plugins/tests/ - contains the unit tests
  • lua-plugins/interstitial.html - an example interstitial page

Configuration

  1. Follow the configuration setup here

  2. Server block configuration

    Configuration Required Type Default Example Description
    $custom_scraping_fields false string NONE '{"field":"value"}' It is possible to configure specific settings to control scraping.
    $session_secret false string NONE 'correcthorsebatterystaple' The key that will be used to encrypt the cookies. This will default to a random string, however if you have multiple NGINX instances you will want to set this to the same value on all instances.
    $scraping_interstitial_url true string NONE '/interstitial' The route where to redirect a protected page to. You will need to create the location block to serve your custom interstitial page.
    $session_name false string session scraping-session The session name stored in the browser for the cookie. The defaults to session so it is recommended to set this to something unique.
    $scraping_cookie_ttl false number 15 10 Specifies how regularly the interstitial page should be shown to a user if they have navigated away from the site
    $scraping_referer_parameter false string next referer On successfully passing the scraping checks, this is the query parameter that Human will look for to find out where to redirect the user to
    $session_cookie_lifetime false number 3600 63072000 Length of time the cookie will be valid for. It is recommended that this is set to 10-30 seconds.
    $scraping_protection false number nil 1 This field, when set to 1 is used to define pages that should be protected against scraping. When set to 2 this field defines the interstitial page
    $scraping_refresher true string nil /nginx/scraping/refresh This field informs scraping protection of the url to poll to renew the user's cookie. This is a required field
  3. Protecting an endpoint

    To protect an endpoint from scraping, add the following to any NGINX location block that you wish to be protectedaccessbyluafile /etc/lua-plugins/scrapingguard.lua;accessbyluafile /etc/lua-plugins/scrapingguard.lua;

    access_by_lua_file /etc/lua-plugins/scraping_guard.lua;
    
  4. Setup a location endpoint, that displays the interstitial page. This page will be shown while the protecting is running checks. See the example interstitial.html file provided.

  • please note that its important to protect this endpoint from xss attacks. Bot Guard offers a protection against this that is straight forward to implement. Add the following to the location block for the interstitial page
    access_by_lua_file /etc/lua-plugins/xss.lua;
  1. So that users will not keep seeing the interstitial page as they browse around the site, add a session refresh endpoint that the Bot Guard tag will use to check the status of the current user in the background.
    location /refresh {
        set $detection_tag_mo "2";
        content_by_lua_file /etc/lua-plugins/scraping_check.lua;
    }
  • Note, its important that the mo for this endpoint is 2, so that Bot Guard is regularly requested and can update the status of the user. In this example we have explicitly set it for demonstration, although this is the default.
  1. Please note, it doesn't make sense to block/redirect and endpoint and to try and apply scraping protection. Please choose one or the other depending on your needs.

Examples

All examples and conf files will need

set $mitigation_api_key "API_KEY"; # the api key provided to you by your account manager
set $mitigation_api_et "12"; # the event type (scraping protection)
set $detection_tag_ci "CUSTOMER_ID"; # your customer ID
set $detection_tag_dt "DETECTION_TAG_ID"; # your tag ID
set $detection_tag_si "SITE_ID"; # a site identifier, specified by the customer

#scraping management
set $session_secret "$SCRAPING_SESSION_SECRET";
set $scraping_interstitial_url "/interstitial";
set $scraping_refresher "/refresh";
set $session_name "x-reload-session"; #some descreet name for the scraping session
set $scraping_referer_parameter "next"; #customise the parameter that will be used as the query parameter (defaults to next)
set $session_cookie_lifetime 63072000;
set $scraping_cookie_ttl 15;

Catch All

The following is the most basic example. It will send all non-GET requests to the mitigation api and inject the script tag on all responses that contain an </head> and/or <body> tag. This assumes that you have unzipped the release to /etc/lua-plugins

worker_processes auto;
pcre_jit on;

events {
    worker_connections 1024;
}

http {

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    include mime.types;
    default_type application/octet-stream;
    gzip on;

    access_log /dev/stdout;

    lua_package_path "/etc/lua-plugins/?.lua;;";
    more_clear_headers Server;
    server_tokens off;

    server {
        listen 3000;
        server_name some.example.com localhost;
        resolver 8.8.8.8;
        client_header_buffer_size 8k;
        large_client_header_buffers 8 64k;
        error_log /dev/stdout debug;

        # required variables
        set $mitigation_api_key "API_KEY";
        set $detection_tag_ci "CUSTOMER_ID";
        set $detection_tag_dt "DETECTION_TAG_ID";
        set $mitigation_api_et "12";
        set $detection_tag_si "SITE_ID";
        set $detection_tag_host "sub.example.com";
        set $detection_tag_path "/ag/CUSTOMER_ID/clear.js";
        set $detection_tag_spa "0";
        set $detection_tag_mo "2";

        #scraping management
        set $session_secret "$SCRAPING_SESSION_SECRET";
        set $scraping_interstitial_url "/interstitial";
        set $scraping_refresher "/refresh";
        set $session_name "x-reload-session"; #some descreet name for the scraping session
        set $scraping_referer_parameter "next"; #customise the parameter that will be used as the query parameter (defaults to next)
        set $session_cookie_lifetime 63072000;
        set $scraping_cookie_ttl 15;

        location ~* \.(?:ico|css|js|gif|jpe?g|png|woff2|woff|ttf)$ {
            root /usr/share/nginx/html;
            index index.html index.htm;
        }

        location ^~ /refresh {
            set $detection_tag_mo "2";
            content_by_lua_file /etc/lua-plugins/scraping_check.lua;
        }

        location ^~ /interstitial {
            default_type text/html;
            header_filter_by_lua_block {
                ngx.header.content_length = nil;
            }

            set $detection_tag_spa "1";
            set $scraping_protection 2;
            body_filter_by_lua_file /etc/lua-plugins/injector.lua;
            access_by_lua_file /etc/lua-plugins/xss.lua;
            return 200 '<html><body><h1>Please wait while we check some things....</h1></body></html>';
        }

        location ^~ / {
            default_type text/html;

            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $remote_addr;

            header_filter_by_lua_block {
                ngx.header.content_length = nil;
            }
            body_filter_by_lua_file /etc/lua-plugins/injector.lua;
            lua_need_request_body on;

            set $scraping_protection 1;
            # protect all endpoints from scraping 
            access_by_lua_file /etc/lua-plugins/scraping_guard.lua;

            proxy_pass http://localhost:$BACKEND_PORT;
        }
        error_page 500 502 503 504 /50x.html;
        location = /50x.html {
            root html;
        }
    }
}

Route Management

The following example is very similar to above, however it defines a couple of major differences. These differences are listed here and commented within the example for easier reading.

  1. Variables that are shared between endpoints are part of the server, and not location block
  2. An NGINX location block to handle signup attempts that will be redirected to if a signup is blocked by the mitigation API
  3. The /signup route is a vanilla HTML/CSS website and redirects a blocked user to a /catch endpoint, however it informs the client that the redirect is a 200. This code is configured to deceive the client rather than inform them.
  4. Different routes to define different configuration for /login vs /signup
  5. The /login route is an SPA and defines a response code and body to respond with when a request is blocked in the case of an SPA
  6. The interstitial page is served at /interstitial
  7. All endpoints are protected by scraping
worker_processes auto;
pcre_jit on;

events {
    worker_connections 1024;
}

http {

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    include mime.types;
    default_type application/octet-stream;
    gzip on;

    access_log /dev/stdout;

    lua_package_path "/etc/lua-plugins/?.lua;;";
    more_clear_headers Server;
    server_tokens off;

    server {
        listen 3000;
        server_name some.example.com localhost;
        resolver 8.8.8.8;

        underscores_in_headers on; #required for signal headers
        # buffers for headers and body
        client_header_buffer_size 512k;
        large_client_header_buffers 8 512k;
        client_max_body_size 100M;
        proxy_busy_buffers_size   512k;
        proxy_buffers   4 512k;
        proxy_buffer_size   256k;

        location ~* \.(?:ico|css|js|gif|jpe?g|png|woff2|woff|ttf)$ {
            root /usr/share/nginx/html;
            index index.html index.htm;
        }

        # 1. Variables that are shared between endpoints are part of the server, and not location block
        set $mitigation_api_key "API_KEY";
        set $mitigation_api_et "12";
        set $detection_tag_ci "CUSTOMER_ID";
        set $detection_tag_dt "DETECTION_TAG_ID";
        set $detection_tag_host "sub.example.com";
        set $detection_tag_path "/ag/CUSTOMER_ID/clear.js";

        #scraping management
        set $session_secret "$SCRAPING_SESSION_SECRET";
        set $scraping_interstitial_url "/interstitial";
        set $scraping_refresher "/refresh";
        set $session_name "x-reload-session"; #some descreet name for the scraping session
        set $scraping_referer_parameter "next"; #customise the parameter that will be used as the query parameter (defaults to next)
        set $session_cookie_lifetime 63072000;
        set $scraping_cookie_ttl 15;

        location ^~ /refresh {
            set $detection_tag_mo "2";
            content_by_lua_file /etc/lua-plugins/scraping_check.lua;
        }

        location ^~ /interstitial {
            default_type text/html;
            header_filter_by_lua_block {
                ngx.header.content_length = nil;
            }

            set $detection_tag_spa "1";
            set $scraping_protection 2;
            body_filter_by_lua_file /etc/lua-plugins/injector.lua;
            access_by_lua_file /etc/lua-plugins/xss.lua;
            return 200 '<html><body><h1>Please wait while we check some things....</h1></body></html>';
        }

        # 2. An NGINX location block to handle signup attempts that will be redirected to if a signup is blocked by the mitigation API
        location ^~ /signup {
            default_type text/html;

            # required variables
            set $detection_tag_spa "0";
            set $detection_tag_mo "2";
            set $detection_tag_si "SITE_ID";

            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $remote_addr;

            header_filter_by_lua_block {
                ngx.header.content_length = nil;
            }
            body_filter_by_lua_file /etc/lua-plugins/injector.lua;
            lua_need_request_body on;
            set $scraping_protection 1;
            # protect endpoint from scraping 
            access_by_lua_file /etc/lua-plugins/scraping_guard.lua;

            proxy_pass http://localhost:$BACKEND_PORT;
        }
        # 4. Different routes to define different configuration for /login vs /signup
        location ^~ /login {
            default_type text/html;

            # required variables
            set $detection_tag_spa "1";
            set $detection_tag_mo "2";
            set $detection_tag_si "SITE_ID";
            # 5. The /login route is an SPA and defines a response code and body to respond with when a request is blocked in the case of an SPA
            set $block_spa_response_code "200";
            set $block_spa_response_body '{"success":"you are now logged in"}';

            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $remote_addr;

            header_filter_by_lua_block {
                ngx.header.content_length = nil;
            }
            body_filter_by_lua_file /etc/lua-plugins/injector.lua;
            lua_need_request_body on;
            set $scraping_protection 1;
            # protect endpoint from scraping
            access_by_lua_file /etc/lua-plugins/scraping_guard.lua;

            proxy_pass http://localhost:$BACKEND_PORT;
        }

        location ^~ / {
            header_filter_by_lua_block {
                ngx.header.content_length = nil;
            }
            body_filter_by_lua_file /etc/lua-plugins/injector.lua;
            set $scraping_protection 1;
            # protect all endpoints from scraping 
            access_by_lua_file /etc/lua-plugins/scraping_guard.lua;
            proxy_pass http://localhost:$BACKEND_PORT;
        }

        error_page 500 502 503 504 /50x.html;

        location = /50x.html {
            root html;
        }
    }
}

Final Remarks

The interstitial page can be any html static page, or any other web servable content that can run javascript. There is an example supplied as part of the Lua zip package but is as simple as something like

<html>
<head>
    <link rel="stylesheet" href="/public/checking.css">
    <style>
        section {
            text-align: center;
            background-color: #CCBCBC;
            margin: 0 auto;
            width: 80%;
            padding: 1.5em;
        }
        #message {
            padding: 1em;
            text-align: center;
        }
        .loader {
            margin: 0 auto;
            border: 16px solid #1C1D21;
            border-radius: 50%;
            border-top: 16px solid #F1E3E4;
            width: 120px;
            height: 120px;
            -webkit-animation: spin 2s linear infinite; /* Safari */
            animation: spin 2s linear infinite;
        }

        /* Safari */
        @-webkit-keyframes spin {
            0% { -webkit-transform: rotate(0deg); }
            100% { -webkit-transform: rotate(360deg); }
        }

        @keyframes spin {
            0% { transform: rotate(0deg); }
            100% { transform: rotate(360deg); }
        }
    </style>
</head>
<body>
<section>
    <div class="loader"></div>
    <div id="message"></div>
</section>
</body>
</html>