This is the third post in a series about migrating a large Rails app from RSpec to Minitest. The second post covered fixture design. This one is about the hardest part of the whole migration: making Elasticsearch tests run in parallel.
BSPK has a lot of search. Shoppers, items, notes, tags, feed posts, sales associates. Most of these are backed by Elasticsearch via Searchkick, and they all had specs that indexed data and asserted on search results. When the test suite ran serially, this worked fine. Every test had the index to itself. When I turned on parallel testing with twelve workers, everything broke.
The problem
Minitest's parallel testing gives each worker its own database. Fixtures load into each worker's DB independently, transactions roll back between tests, and there's no cross-contamination. But Elasticsearch isn't a database. It's a shared external service. All twelve workers were hitting the same ES cluster, writing to the same indexes, and reading each other's data.
A test in worker 3 would index five shoppers and assert that a search returned exactly five results. Meanwhile, worker 7 had just indexed its own shoppers into the same index. The search returned twelve results. Test fails.
The flakiness was maddening because it was timing-dependent. Run the suite once, three failures. Run it again, different failures. Run it a third time, all green. Classic parallel race condition.
Per-worker index prefixes
The fix was straightforward once I understood the problem. Each parallel worker gets its own Elasticsearch index prefix, so their data never overlaps.
In test_helper.rb:
ENV["SEARCHKICK_INDEX_PREFIX"] = "test#{ENV.fetch('TEST_ENV_NUMBER', nil)}"
parallelize(workers: :number_of_processors)
parallelize_setup do |worker|
ENV["SEARCHKICK_INDEX_PREFIX"] = "test#{worker}"
Searchkick.index_prefix = "test#{worker}"
# Clear cached index objects so models pick up the new prefix
Searchkick.models.each do |model|
model.instance_variable_set(:@searchkick_index, nil)
end
end
Worker 0 writes to test0_shoppers, worker 1 writes to test1_shoppers, and so on. Same isolation model as the per-worker databases, just applied to Elasticsearch.
The cache clearing is important. Searchkick memoizes the index object on each model class. Without clearing it, the model would keep using the prefix from before the fork, and you'd be right back to shared indexes.
Disabling callbacks globally
Searchkick hooks into ActiveRecord callbacks to automatically index records on create, update, and destroy. That's great in production, but in tests it means every fixture load triggers an ES index operation. With thirty fixture files loading into twelve workers simultaneously, that's a lot of unnecessary indexing.
I disabled callbacks globally in test_helper.rb:
Searchkick.disable_callbacks
Tests that need search behavior opt in explicitly:
def with_searchkick(&block)
Searchkick.callbacks(true, &block)
end
This way, a model test that checks validations never touches Elasticsearch. Only the tests that actually exercise search pay the indexing cost.
The safe_reindex pattern
Every search test needs to get data into ES before it can assert on results. The naive approach is to call Model.reindex and hope for the best. In parallel, "hope for the best" fails about 30% of the time.
The pattern I landed on:
module ElasticsearchTestHelper
def safe_reindex(model_class)
model_class.instance_variable_set(:@searchkick_index, nil)
model_class.reindex(async: false, mode: :inline, refresh: false)
model_class.instance_variable_set(:@searchkick_index, nil)
end
def with_searchkick(&block)
Searchkick.callbacks(true, &block)
end
end
Two things to note. First, the @searchkick_index cache is cleared both before and after the reindex. Before, so that Searchkick creates a fresh timestamped index with the current worker's prefix. After, so that subsequent calls see the new index name (Searchkick appends a timestamp to each reindex).
Second, there's no index.delete call. An earlier version had one:
index = model_class.searchkick_index
index.delete if index.exists?
model_class.reindex(...)
This caused intermittent 404 errors under parallel load. The problem was a race condition: Searchkick's reindex already creates a new timestamped index, imports data, swaps the alias, and cleans up old indexes. The explicit delete before reindex was redundant, and under high concurrency, the delete would sometimes hit right as another operation was reading the alias. Removing it fixed the last source of flaky ES failures.
I verified this with five consecutive full suite runs: 7,500+ tests each, zero failures.
The clean-room company
The fixture design post mentioned a three-company structure: Vista (primary data), Art Gallery (cross-tenant), and ES Test (clean-room). The clean-room company exists specifically for search tests.
# Clean-room company for Elasticsearch tests — has NO shoppers,
# store_visits, chats, or other records so safe_reindex produces
# a known-empty baseline.
es_test:
name: ES Test Company
dns_names: "{es-test.bspk.com}"
external_id_str: es_test_company
abbreviated_name: ES
With matching fixtures for a store, two sales associates, and their accounts. All accessible through helpers:
def es_company = companies(:es_test)
def es_store = stores(:es_test_store)
def es_sa1 = sales_associates(:es_test_sa1)
def es_sa2 = sales_associates(:es_test_sa2)
When a search test starts, it calls safe_reindex on the relevant model class. Because the ES Test company has zero child records in fixtures, the initial index is empty. The test then creates exactly the records it needs using inline factory helpers, re-indexes, and asserts on known data.
No surprise records from other fixtures. No bleeding from other tests. The test controls the entire search state.
before_all for expensive setup
Some search test classes have heavy setup: creating dozens of records with specific attributes, then reindexing. The shopper finder tests, for example, create shoppers with different names, emails, phone formats, gender values, and contact preferences to exercise every search filter.
Running that setup before every test method was adding up. Six test classes were taking three times longer than they needed to because the same twenty records were being created and indexed sixty times.
TestProf's before_all runs setup once per test class and wraps it in a transaction that persists across all test methods:
class ShoppersFinderTest < ActiveSupport::TestCase
include ElasticsearchTestHelper
include InlineFactoryHelpers
include BeforeAll
before_all(setup_fixtures: true) do
@company = es_company
@sa = es_sa1
@shopper1 = create_shopper(company: @company, store: es_store,
first_name: "Alice", last_name: "Smith", email: "[email protected]")
@shopper2 = create_shopper(company: @company, store: es_store,
first_name: "Bob", last_name: "Jones", phone: "+15551234567")
# ... 15 more shoppers with specific attributes
safe_reindex(ElasticSearch::SearchClient)
end
setup do
@company.reload # reset any mutations from previous test
end
def test_search_by_name
results = SalesAssociate::ShoppersFinder.new(@sa, query: "Alice").results
assert_includes results, @shopper1
refute_includes results, @shopper2
end
end
The setup_fixtures: true flag is required in Rails 8 to make fixture data available inside the before_all block. The setup block calls .reload on objects that tests might have mutated (changing a filter, updating an attribute) so each test sees fresh state.
The before_all_helper.rb also patches Minitest to deactivate the previous class's transaction when switching between test classes in a parallel worker. Without this, the transaction from one before_all class could leak into the next class running in the same worker:
Minitest.singleton_class.prepend(Module.new do
def run_one_method(klass, method_name)
prev = defined?(@previous_klass) ? @previous_klass : nil
if prev && prev != klass && prev.respond_to?(:before_all_executor)
prev.before_all_executor&.deactivate!
end
@previous_klass = klass
super
end
end)
This was a fun one to debug. Tests would pass in isolation, pass when running a single file, but fail when running the full suite because an unrelated test class's before_all transaction was still open.
VCR and the body matching problem
Some of our search-adjacent code calls LLMs (the natural language search feature translates English queries into Elasticsearch DSL). These HTTP calls are recorded with VCR cassettes. When we went parallel, the cassettes stopped matching.
The issue: VCR matches requests by method, URI, and body. The request body includes the system prompt, which includes the full site content (for the AI chat agent). Every time a blog post changed or a new record was added, the body changed, and the cassette didn't match.
On top of that, Elasticsearch index names in the request body now included worker-specific prefixes (test0_shoppers vs test1_shoppers), so the same test recorded on worker 0 wouldn't match when replayed on worker 3.
The fix was a custom request matcher that normalizes both problems:
VCR.configure do |c|
c.register_request_matcher :normalized_body do |request_1, request_2|
normalize = ->(body) {
return body if body.nil? || body.empty?
normalized = body.dup
# Strip worker-specific ES index prefixes
normalized.gsub!(VCR_INDEX_PREFIX_PATTERN, VCR_NORMALIZED_INDEX_NAME)
# Strip LLM system prompts that change with content updates
normalized.gsub!(/"role"\s*:\s*"(system|developer)".*?(?="role")/, "")
normalized
}
normalize.call(request_1.body) == normalize.call(request_2.body)
end
end
Cassettes are now recorded with normalized bodies, and replayed with the same normalization. The index prefix test3_shoppers in a live request matches test_shoppers in the cassette. The system prompt with yesterday's blog posts matches the cassette from last week.
The parallelize(workers: 1) trap
The most expensive mistake I made was a subtle one. During the initial migration of finder specs (Phase 7), I added parallelize(workers: 1) to every ES-backed test class. My reasoning: these tests are fragile, let's run them serially to avoid issues.
What I didn't realize is that Rails' parallelization is all-or-nothing at the suite level. If any test class sets parallelize(workers: 1), Rails falls back to running the entire suite in a single process. Not just that class. Everything.
The suite was running in about 400 seconds. I assumed that was normal for the volume of tests. When I removed all the parallelize(workers: 1) overrides and let Rails use all twelve cores, it dropped to 82 seconds. I'd been running the full suite serially for three days without realizing it.
The lesson was simple: don't use parallelize(workers: 1) on individual classes. Either fix the parallel isolation issue, or if you really need serial execution, use a different mechanism (like TestProf's before_all to reduce per-test cost).
The final numbers
After all the ES parallel work landed, the search test files went from being the slowest, flakiest part of the suite to being unremarkable. Fifty-six test files with ES integration, running across twelve parallel workers, consistently green.
The standardization commit that applied safe_reindex across all fifty-six files actually removed about 180 lines of code. The previous patterns (manual delete + reindex, inline callbacks blocks, redundant refresh calls) were all more code and less reliable.
For the six heaviest test classes, before_all cut execution time by roughly 3x. Those classes have a hundred-plus test methods each, and the setup (creating records + reindexing) only runs once.
If you're running Elasticsearch tests in a Rails app and they're either slow or flaky, the playbook is: per-worker index prefixes, globally disabled callbacks with opt-in, a clean-room fixture company with zero indexed records, safe_reindex without explicit deletes, and before_all for the expensive test classes. Every piece solves a specific problem. Skip one and you'll probably find out which one the hard way.