ProxyChecker Tutorial: How to Verify and Filter Proxies Like a Pro
What ProxyChecker does
ProxyChecker is a tool that tests proxy servers to determine whether they are alive, how fast they respond, what protocols they support (HTTP, HTTPS, SOCKS4/5), and whether they leak identifying information (IP, DNS, headers). Use it to maintain a reliable proxy pool, improve scraping stability, and protect privacy-aware workflows.
Why verification and filtering matter
- Reliability: Dead or slow proxies break requests and waste retries.
- Performance: Sorting by latency improves throughput.
- Security: Detects transparent or misconfigured proxies that expose your real IP.
- Efficiency: Filtering reduces bandwidth and compute wasted on bad proxies.
Key checks a professional ProxyChecker should perform
- Liveness (connectivity): Establish a TCP connection to the proxy host/port.
- Protocol support: Verify whether it correctly proxies HTTP, HTTPS (CONNECT), SOCKS4, SOCKS5.
- Authentication: Test for required username/password and handle auth failures gracefully.
- Timeout & latency: Measure RTT and full-request time; use both for different sorting goals.
- Anonymity level: Detect whether the proxy reveals the client’s IP via headers or forwarding.
- DNS leak test: Resolve a known unique hostname through the proxy and check which DNS server answered.
- Geo/IP check: Determine the proxy’s public IP and geolocation for regional routing.
- Content integrity: For sensitive use, verify response bodies aren’t altered or injected with tracking.
- Rate limits & throttling: Detect when a proxy enforces per-IP or per-connection limits.
- TLS validation: For HTTPS proxies, validate server certificates and TLS handshake behavior.
Step-by-step: building a reliable ProxyChecker (practical guide)
Assumptions: you have a list of proxies in the form host:port and optional credentials. Reasonable defaults chosen: connection timeout 5s, request timeout 10s, max concurrent checks 200.
-
Prepare the test harness
- Use a concurrency-friendly language or library (Go, Node.js with async, Python asyncio + aiohttp/asks).
- Implement a worker-pool with a max concurrent connections parameter to avoid local resource exhaustion.
-
Basic TCP/connectivity test
- Attempt a TCP connect to host:port with a short timeout (e.g., 3–5s).
- Mark proxies that fail to connect as dead; skip further tests.
-
Protocol detection
- For HTTP/HTTPS: send a simple GET to a small known endpoint (example: a tiny text file under your control) via the proxy. For HTTPS, use CONNECT then TLS handshake.
- For SOCKS4/5: perform the SOCKS handshake and attempt to connect to the same test endpoint.
- Record which protocol(s) succeeded.
-
Measure latency and full-request time
- Record time for TCP connect (connect_time) and time for complete test request (total_time).
- Keep both metrics; sort by total_time for user-facing selection, by connect_time for quick-fail routing.
-
Anonymity & header inspection
- Request a service that returns request metadata (headers and apparent client IP). Compare returned IP to your known IP.
- Check for headers that commonly leak IPs (X-Forwarded-For, Via, Forwarded, Client-IP). Mark proxies as: transparent, anonymous, high-anonymity (elite).
-
DNS leak and content checks
- Have the proxy resolve a unique hostname you control and confirm the DNS request came from the proxy’s location.
- Verify the response body matches an expected hash to detect content modification.
-
Authentication handling
- If credentials are provided, perform checks with and without credentials to confirm auth requirements and behavior on bad credentials.
-
Rate-limit detection
- Send several sequential requests and watch for HTTP 429, connection resets, increasing latency, or other signs of throttling. Tag proxies that show limits.
-
TLS and certificate checks
- For HTTPS tests, verify server certs and note proxies that tamper with TLS or replace certificates (possible MITM).
-
Scoring and filtering
- Build a numeric score combining pass/fail flags and weighted metrics. Example weighting (adjust to your needs): liveness (required), anonymity 30%, avg total_time 25%, TLS integrity 15%, no rate-limit 15%, geo desirability 15%.
- Define thresholds for categories: Good (score ≥ 80), Acceptable (50–79), Bad (<50).
- Output and exports
- Save results with: host:port, supported protocols, auth required, connect_time, total_time, anonymity category, public IP, geo, score, last-checked timestamp.
- Provide filtering options (protocol, max latency, min score, country).
Leave a Reply