YouTube Blocked My Server in 15 Seconds

In the previous post, I built a Ruby client that fetches YouTube transcripts via the InnerTube player API. Clean implementation, no external dependencies, works perfectly. The next step was obvious: run it across 6,000 videos on the server and populate the knowledge base.

That step lasted about 15 seconds.

The bulk run

Guia is a spiritist content platform with a RAG-powered chat. The chat searches through video transcripts, so every public video needs its captions pulled, chunked, and embedded. I had around 6,000 videos queued for transcript fetching, each one its own background job via Solid Queue.

I deployed, the jobs started processing, and the queue drained fast. Within 15 seconds the count dropped from 5,853 to 4,371. Most of those were returning "no captions available" immediately, which is expected for livestreams and older content. But the ones that should have returned transcripts were also failing. Every single one.

I SSH'd into the server and ran a quick curl against a video I knew had captions:

curl -sI "https://www.youtube.com/watch?v=some_video_id"

302 redirect to google.com/sorry/index. YouTube's CAPTCHA page. The server's IP was banned.

Trying the back door

The TranscriptClient fetches transcripts in three steps: scrape the watch page for an API key, call InnerTube for a caption URL, download the XML. The ban hit at step one because the watch page itself was returning the CAPTCHA redirect.

I figured the fix might be to skip the watch page entirely. The Ruby Events project uses a different approach: POST directly to YouTube's get_transcript endpoint with protobuf-encoded parameters. No watch page scrape, no API key extraction. I matched their exact request format, including the protobuf encoding and client context:

uri = URI("https://www.youtube.com/youtubei/v1/get_transcript")
req = Net::HTTP::Post.new(uri.request_uri)
req["Content-Type"] = "application/json"
req.body = {
  context: { client: { clientName: "WEB", clientVersion: "2.20250101" } },
  params: Base64.strict_encode64(protobuf_payload)
}.to_json

400 error. FAILED_PRECONDITION.

I tried adding cookies from a fresh youtube.com visit. Added more headers. Swapped client versions. Every combination returned the same thing. The issue wasn't the endpoint or the request format. YouTube blocks server IPs from all transcript methods. You can't work around it by changing which internal API you call.

This is well-documented in open source issue trackers once you know what to search for. The Python youtube-transcript-api has open issues about cloud IPs getting blocked. ReVanced has similar reports. The consensus is that YouTube fingerprints requests by source IP range and rejects anything that looks like a datacenter.

The data damage

While I was debugging the IP ban, I noticed something worse. The FetchTranscriptJob had a simple rescue clause:

rescue YouTube::TranscriptClient::TranscriptNotAvailable
  video.update!(status: :no_transcript)

When the watch page returned a CAPTCHA redirect instead of HTML, the client couldn't extract the API key and raised TranscriptNotAvailable. Technically correct, from the exception's perspective, but semantically wrong. The video wasn't missing captions. The server was blocked. And the job had marked 7,500+ videos as permanently having no transcript, a status that the pipeline treats as final and never retries.

The fix was adding a separate exception class:

class RateLimited < StandardError; end

def fetch_api_key(video_id)
  # ...
  response = http.request(req)

  if response.is_a?(Net::HTTPRedirection) &&
      response["location"]&.include?("google.com/sorry")
    raise RateLimited, "YouTube rate-limited this IP"
  end
  # ...
end

Then resetting all 7,625 wrongly-marked videos back to their previous state.

Inverting the architecture

The server's IP is burned. Proxies are an option, but rotating residential proxies for 6,000 videos felt like building infrastructure to work around a problem that has a simpler solution: my laptop isn't blocked.

YouTube doesn't ban residential IPs from normal-volume transcript fetching. I'd been fetching transcripts locally during development without any issues. So instead of figuring out how to make the server fetch from YouTube, I made the server stop trying. The server would become an API that accepts transcripts, and my local machine would do the fetching.

The architecture is straightforward:

┌──────────────┐     GET /api/transcripts/next      ┌──────────────┐
│              │ ◄─────────────────────────────────── │              │
│    Server    │                                      │  Local Mac   │
│   (Kamal)    │     POST /api/transcripts            │  (launchd)   │
│              │ ◄─────────────────────────────────── │              │
└──────────────┘                                      └──────────────┘
                                                            │
                                                            ▼
                                                      ┌──────────────┐
                                                      │   YouTube    │
                                                      └──────────────┘

The server exposes two endpoints. GET /api/transcripts/next returns the next video that needs a transcript. POST /api/transcripts accepts the result and returns the next pending video in the same response. That second detail matters: combining "submit result" and "get next work item" into a single request cuts the round trips in half.

The server side

The API uses bearer token auth with a token stored in Rails credentials:

module Api
  class BaseController < ActionController::API
    before_action :authenticate

    private

    def authenticate
      token = request.headers["Authorization"]&.delete_prefix("Bearer ")
      expected = Rails.application.credentials.transcript_api_token

      unless token.present? && ActiveSupport::SecurityUtils.secure_compare(token, expected)
        head :unauthorized
      end
    end
  end
end

The transcripts controller handles both directions of the pipeline:

module Api
  class TranscriptsController < BaseController
    def next_pending
      video = Video.where(status: :archive_approved)
                   .where(raw_transcript: nil)
                   .order(recorded_on: :desc)
                   .first

      if video
        render json: { id: video.id, video_id: video.video_id, title: video.title }
      else
        head :no_content
      end
    end

    def create
      video = Video.find(params[:id])

      if params[:no_transcript]
        video.update!(status: :no_transcript)
      else
        video.update!(
          raw_transcript: params[:segments],
          plain_transcript: params[:segments].map { |s| s[:text] }.join(" ")
        )
      end

      # Return next pending video in the same response
      next_video = Video.where(status: :archive_approved)
                        .where(raw_transcript: nil)
                        .where.not(id: video.id)
                        .order(recorded_on: :desc)
                        .first

      if next_video
        render json: { id: next_video.id, video_id: next_video.video_id, title: next_video.title }
      else
        head :no_content
      end
    end
  end
end

The server-side background jobs that used to fetch transcripts are neutered with an environment variable guard:

class FetchTranscriptJob < ApplicationJob
  def perform(video)
    return unless ENV["FETCH_TRANSCRIPTS_ENABLED"] == "true"
    # ...
  end
end

That variable isn't set in config/deploy.yml, so the job is a no-op on the server. The job class still exists because other parts of the codebase reference it, but it never does anything in production.

The local side

A rake task runs the fetch loop:

# lib/tasks/transcripts.rake
namespace :transcripts do
  task fetch: :environment do
    lockfile = Rails.root.join("tmp/transcript_fetch.lock")
    lock_fh = File.open(lockfile, File::RDWR | File::CREAT)

    unless lock_fh.flock(File::LOCK_EX | File::LOCK_NB)
      puts "Another instance is running. Exiting."
      exit 0
    end

    lock_fh.truncate(0)
    lock_fh.write(Process.pid.to_s)
    lock_fh.flush

    api_url = ENV.fetch("TRANSCRIPT_API_URL", "https://guia.espirita.club")
    token = Rails.application.credentials.transcript_api_token
    client = YouTube::TranscriptClient.new
    fetched = 0
    no_transcript = 0

    # Get first video
    video = api_get("#{api_url}/api/transcripts/next", token)

    while video
      begin
        segments = client.fetch(video["video_id"])
        response = api_post("#{api_url}/api/transcripts", token, {
          id: video["id"], segments: segments
        })
        fetched += 1
      rescue YouTube::TranscriptClient::TranscriptNotAvailable
        response = api_post("#{api_url}/api/transcripts", token, {
          id: video["id"], no_transcript: true
        })
        no_transcript += 1
      rescue YouTube::TranscriptClient::RateLimited
        puts "Rate limited. Stopping."
        break
      end

      video = response # POST returns next video
    end

    puts "Done. Fetched: #{fetched}, No transcript: #{no_transcript}"
  end
end

The flock at the top is important. An earlier version used a PID file, which works fine until you kill -9 the process and the stale PID file blocks all future runs. flock is a kernel-level lock that the OS releases when the process dies, regardless of how it dies. The file stays on disk but the lock is gone, so the next run acquires it cleanly.

The rate limiting strategy is deliberately simple: if YouTube returns a CAPTCHA redirect, stop. No delays between requests, no exponential backoff, no retries. The next hourly cron run picks up where this one left off. YouTube's ban seems to reset within an hour for residential IPs, so this approach naturally stays under whatever threshold triggers the block.

Scheduling with launchd

On macOS, launchd is the right way to schedule recurring tasks. A plist in ~/Library/LaunchAgents/ handles the hourly runs:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.guia.transcript-fetch</string>
    <key>WorkingDirectory</key>
    <string>/Users/henrique/code/guia</string>
    <key>ProgramArguments</key>
    <array>
        <string>bin/rails</string>
        <string>transcripts:fetch</string>
    </array>
    <key>StartInterval</key>
    <integer>3600</integer>
    <key>StandardOutPath</key>
    <string>/Users/henrique/code/guia/log/transcripts/launchd.log</string>
    <key>StandardErrorPath</key>
    <string>/Users/henrique/code/guia/log/transcripts/launchd.log</string>
    <key>EnvironmentVariables</key>
    <dict>
        <key>PATH</key>
        <string>/Users/henrique/.local/share/mise/shims:/opt/homebrew/bin:/usr/bin:/bin</string>
    </dict>
</dict>
</plist>

Install and start it:

cp config/launchd/com.guia.transcript-fetch.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/com.guia.transcript-fetch.plist

The PATH in the plist is critical. launchd runs with a minimal environment that doesn't include mise shims or Homebrew's bin directory. Without the explicit PATH, bin/rails can't find Ruby.

The same pattern handles embeddings. A second plist runs bin/rails embeddings:generate hourly, fetching transcript text from the server, running it through Ollama's bge-m3 locally, and posting the resulting vectors back. Local embedding generation runs about 10x faster than on the CPU-only production server, which was a nice side effect of the architectural inversion.

The pattern

The interesting thing about this solution isn't the specific implementation. It's the inversion. The conventional architecture for data pipelines is: server runs jobs, server fetches external data, server processes it. When the external service blocks server IPs, the instinct is to fix the server: add proxies, rotate IPs, add delays.

The alternative is to ask who actually has access. My laptop fetches YouTube transcripts without issues. It's not a server. It's not in a datacenter IP range. YouTube doesn't care about it. So instead of making the server pretend to not be a server, I made it stop trying to be the fetcher entirely. The server became the API, and the machine with access became the worker.

This applies to any service that rate-limits or blocks datacenter IPs. Social media scrapers, search engine data, any third-party service that distinguishes between "real users" and "servers" by IP reputation. Instead of building increasingly complex server-side workarounds, consider whether you already have a machine that can do the fetching. A local dev machine, an office server on a residential connection, a Raspberry Pi on a home network. Make your production server the receiver, not the fetcher.

The first batch run from my laptop processed 10 transcripts in 32 seconds. The hourly cron chips away at the backlog steadily, processing over a thousand videos per run. No proxies, no IP rotation, no clever request timing. Just the right machine doing the fetching.