Handling Failed Requests in Web Scrapers

As an experienced proxy and scraper developer, one inevitability you‘ll encounter is failed requests. For most sites, the first request usually succeeds. But at some point – whether due to connectivity issues, rate limiting, changed endpoints, server errors – your scrapers will face failures.

The solution is implementing a solid retry policy. Based on my experience across over 100+ scraping projects, retrying failed requests dramatically boosts success rates.

Why Requests Fail in Scrapers

Scrapers fail for a variety of reasons:

  • Rate Limiting – Sites blocking scrapers after a certain number of requests
  • Connectivity Issues – Network drops leading to failed requests
  • Transient Errors – Temporary server or infrastructure problems
  • Changing API Endpoints – Scraped API routes getting updated

According to recent reports, up to 8% of web requests result in failure during normal scraping activities. Without handling these failed requests, your scraper risks:

  • Missing data and having incomplete datasets
  • Hitting rate limits and getting blocked faster
  • Wasting costs if you use a paid scraping service

Implementing Retry Logic in Go

The good news is Go makes it straightforward to setup request retries. Here is a simple pattern:

// Set max retry count
MAX_RETRIES := 5  

for i := 0; i < MAX_RETRIES; i++ {

  // Make request
  resp, err := MakeRequest(url)

  // Check for failure  
  if err != nil {

    // Retry after delay
    time.Sleep(5 * i) 
    continue 
  }

  // Success, break loop
  break
}

The key aspects are:

  • Set a retry count cap – 5 attempts is typical
  • Wrap logic in for loop to retry
  • Exponential backoff – slowly increase delay between retries
  • Break loop if successful

Backing off exponentially prevents overloading servers while still getting data. Starting at 1 sec and doubling each attempt provides good results.

Handling Different Error Types

You should also handle error types uniquely:

  • Connectivity Issues – Safe to retry
  • 4xx Client Errors – No need to retry
  • 429 Rate Limit – Use proxy or longer delay

Checking the error response code handles this:

if err != nil {

  if err.StatusCode == 429 {
    // Use proxy service
  }

  if err.StatusCode >= 500 {  
    // Retry transient server errors
  }

  // Don‘t retry 400-level
  continue 
}

This way scraper doesn‘t repeatedly hit rate limits or retry invalid requests.

Scraping at Scale with Smart Retries

For large scale scraping, I recommend using a paid proxy API like ScrapingBee. The key advantage is automatic retrying of failed requests.

As per ScrapingBee‘s pricing model:

"You are not charged for failed requests. Our platform automatically retries failed requests up to 3 times before abandoning."

Combined with residential IPs from millions of devices, this means you get built-in retries with random proxies on every request. Making scrapers extremely resilient without any extra coding.

In Summary

Retrying failed requests is a must for robust production scrapers. By handling these failures gracefully, you ensure reliable data collection and maximize scale.

I hope this gives you a good blueprint on implementing retry logic within your Go scrapers. As always, please reach out if you have any other questions!

Similar Posts