MediaGuard Troubleshooting
If you encounter any issues with your MediaGuard integration, you should reach out to a HUMAN representative right away—we're happy to help you resolve the problem as soon as possible. To help us diagnose the root cause of the issue, we'll ask you to complete the following troubleshooting steps and questionnaires to provide us with key details and rule out potential causes.
Compute time
Unnecessarily long compute times can add latency to MediaGuard's responses. Long compute times usually happen when your MediaGuard cluster is underprovisioned, which means that your cluster doesn't have enough nodes to support peak traffic levels. However, if your cluster does have enough nodes, compute time shouldn't be a significant source of latency.
Compute time questionnaire
In your support request, include the answers to these questions:
- Has your bid request volume increased beyond the current traffic capacity of your MediaGuard cluster? (For example, has your QPS (queries per second) volume increased by more than 20% compared to standard traffic levels?)
- Have you received (or do you anticipate receiving) any expected increases in traffic?
- Have you received any unexpected increases in traffic?
Network
Network issues can also add latency to MediaGuard's responses or trigger timeouts.
Network troubleshooting steps
From one of the bidding servers that you use to communicate with MediaGuard, run the following diagnostic commands (and share their results with your HUMAN representative):
traceroute <your_mediaguard_cluster>
(at least three times)traceroute --tcp --port=443 <your_mediaguard_cluster>
(at least three times)mtr --report-wide --report-cycles=200 <your_mediaguard_cluster>
mtr --report-wide --tcp --report-cycles=200 --port=443 <your_mediaguard_cluster>
You may receive different results between different traceroutes. This is normal and sometimes occurs when ICMP requests and other information items are deprioritized.
Ideally, you should run the above diagnostic tests from multiple bidding servers in the affected region. You can obtain the current IP address of any of your MediaGuard clusters by running the dig command (dig <your_mediaguard_cluster>
) on the cluster's DNS endpoint.
HUMAN will also run independent traceroutes from our servers to your system, which allows us to compare results from both approaches. To facilitate this process, you must provide us with an externally accessible endpoint that allows us to send traceroute requests to your bidding server.
Connection management
Poor connection management is another common source of latency. The number of connections, QPS (queries per second) per connection, and connection lifetime can all add latency to MediaGuard's responses.
Under perfect network conditions, your MediaGuard cluster can support up to 100 QPS per connection. However, in practice, we recommend utilizing between 65% and 80% of the available connection capacity to send and receive MediaGuard data. You can calculate the ideal number of connections for your MediaGuard integration by using the following formula:
(mean latency x QPS) ÷ (1000 x utilization) = number of connections
You can improve MediaGuard's performance by using fewer connections and keeping each connection open as long as possible to send as many requests as possible. It's more efficient to send multiple requests over a long-lived connection than it is to repeatedly open and close multiple short-lived connections, since long-lived requests reduce connection overhead and therefore put less train on MediaGuard.
Since MediaGuard is optimized to handle high traffic volumes, low-QPS requests won't accurately reflect MediaGuard's performance. After you've established a successful connection, we recommend running tests at slightly elevated QPS levels (above 200 QPS) to obtain accurate metrics.
Client metrics
If your servers are set up to measure MediaGuard's response times, HUMAN can compare this data to our own measurements for more thorough troubleshooting.
Metrics questionnaire
In your support request, include the following information:
- Metrics by datacenter and/or region
- Timeout percentage
- Latency histogram (for example):
50% <15ms
30% >= 15ms and <20ms
15% >= 20ms and <30ms
5% >=30
Other troubleshooting
To give us a better sense of both your internal systems and your MediaGuard integration, we also ask you to fill out a general troubleshooting questionnaire. Some of the questions listed below help us rule out common issues, while others provide us with a starting point for further inquiry.
General questionnaire
In your support request, include the answers to as many of these questions as you can:
- Were there any recent changes to your ad server?
- Were there any recent changes to the volume of requests to your ad server?
- Were there any recent changes to your network/service provider?
- Were there any recent changes to your routing?
- Did the problem persist for at least half an hour?
- Is the distance between your ad server and your MediaGuard cluster greater than 30 miles (48 kilometers)?
- Are there more than fifteen hops between your ad server and your MediaGuard cluster?
- Have you performed a traceroute and MTR using ICMP protocol?
- Have you performed a traceroute and MTR using TCP protocol?
- Is the latency originating from a hop close to HUMAN's cluster?
- Have you confirmed the results with your NOC (Network Operations Center) team?
- What is the externally-accessible endpoint for HUMAN to send traceroute requests to your bidding server? (And is the ICMP protocol allowed through your firewalls?)
- If you're collecting metrics, can you provide any graphs that illustrate the timeline of your issues?