Case Studies

Autoscale + Load Test — When Your Website Needs to Handle 10,000 Concurrent Users, How Do You Know It Actually Can?

A case study on designing autoscaling infrastructure for a web application that must survive 50x traffic spikes from marketing campaigns — complete with performance testing that proves it works before launch day.

11 Mar 20268 min

AutoscalingPerformance TestingLoad TestingCloud InfrastructureDevOps

The Problem: "The Website Crashes Every Time We Launch a Campaign"

A client came to us with a problem that's painfully familiar to many engineering teams — their web application would crash every time marketing ran a campaign. On normal days, the system handled traffic comfortably. But when traffic surged 10–50x within five minutes, the servers buckled.

The pattern repeated like clockwork:

Response times jumped from 200ms to 15 seconds
Error rates blew past 40% within the first three minutes
Marketing had already spent the ad budget — but customers couldn't even load the page
The DevOps team would scramble to manually scale servers, a process that took 20–30 minutes

The cost: Each 30-minute outage resulted in an estimated ฿500,000–2,000,000 (~$14,000–$57,000) in lost revenue, depending on campaign size.

Defining Success

We aligned with the client on crystal-clear targets:

Objective	Target
Concurrent Users	10,000 simultaneous
Response Time (P95)	≤ 500ms
Error Rate	< 0.1%
Scale-up Time	≤ 2 minutes (trigger to ready)
Scale-down Time	≤ 10 minutes (after traffic drops)
Zero Downtime	No downtime during scaling events

The hardest part wasn't "making it scale." It was proving it could scale — before the real campaign went live.

Architecture Principles

Scale Horizontally, Fail Gracefully

We designed the architecture around three core principles:

Stateless Application — Every instance must function independently with no internal state, enabling instant horizontal scaling
Metric-driven Scaling — Decisions based on real signals, not just CPU utilization — including request queue depth and response time
Graceful Degradation — If the system truly can't keep up, it degrades in a controlled manner rather than collapsing entirely

Multi-layer Scaling Strategy

We didn't just scale the application servers. Every layer was designed to scale independently:

Load Balancer Layer — Traffic distribution and health checking
Application Layer — Scales based on request rate and response time
Cache Layer — Absorbs read-heavy traffic so not every request hits the database
Database Layer — Read replicas for queries that don't require 100% real-time data
Queue Layer — Offloads non-urgent work (emails, analytics updates) to asynchronous processing

Scaling Triggers: Beyond CPU

Many organizations set autoscaling to trigger at CPU > 70% — sounds reasonable, but in practice it's far too slow. By the time CPU hits 70%, users are already experiencing timeouts.

We used Composite Metrics that combine multiple signals:

Request Rate — Requests per second (a leading indicator)
Response Time P95 — If it starts climbing, the system is about to struggle
Active Connections — Number of connections waiting for a response
Queue Depth — Number of tasks waiting in queue

When these signals cross their thresholds simultaneously, the system scales up immediately — without waiting for CPU to overheat.

Predictive Scaling: Knowing Before the Spike Arrives

For campaigns with known schedules, we added Scheduled Scaling — the system pre-warms instances 15 minutes before a campaign goes live. No waiting for traffic to arrive before reacting.

Performance Testing: Proving It Before Launch Day

This is the part we care about most — if you don't test properly, you're gambling.

Test Strategy: Four Layers

We designed testing in four progressive layers, each with a distinct purpose:

1. Baseline Test — Know Your Starting Point

Test at normal traffic levels (500 concurrent users) to establish baseline performance. If the baseline is poor, scaling won't help — you'll just have more slow instances.

2. Load Test — Test the Target

Gradually ramp from 500 to 10,000 concurrent users over 20 minutes to observe:

How quickly does the autoscaler respond?
How fast are new instances ready to serve traffic?
Do errors occur during scaling transitions?

3. Spike Test — Simulate Reality

This is the most critical test. Simulate traffic jumping from 500 to 8,000 within 60 seconds — mimicking a real campaign launch. The key questions:

Can the system survive the first 60–120 seconds before new instances are ready?
Are any requests timing out while waiting for scale-up?

4. Soak Test — Test Endurance

Run 10,000 users continuously for 4 hours to detect:

Memory leaks
Connection pool exhaustion
Database connections not being returned
Cache invalidation behaving unexpectedly

Designing Realistic Test Scenarios

A common mistake in load testing is hammering a single endpoint repeatedly — the result is a 99% cache hit rate and beautiful numbers that mean nothing in production.

We designed test scenarios that mirror actual user behavior:

Behavior	Proportion
Home → Browse products → Leave	45%
Home → Search → View product → Add to cart	30%
Home → Search → View product → Purchase → Payment	15%
Home → Login → View order history	10%

Each scenario includes Think Time (the pause while a user reads a page before clicking) randomized between 2–8 seconds, closely mimicking real behavior.

Results

Baseline Test (500 Users)

Metric	Before Optimization	After Optimization
Response Time (P50)	180ms	95ms
Response Time (P95)	450ms	210ms
Response Time (P99)	1,200ms	380ms
Throughput	2,800 req/s	4,200 req/s
Error Rate	0.02%	0.01%

Just by optimizing the baseline (query tuning, cache strategy, payload reduction), we achieved a 2x performance improvement — without adding a single instance.

Load Test (500 → 10,000 Users, 20 Minutes)

Metric	Target	Actual
Concurrent Users	10,000	10,000 ✅
Response Time (P95)	≤ 500ms	320ms ✅
Error Rate	< 0.1%	0.03% ✅
Scale-up Time	≤ 2 min	90 seconds ✅
Instances (Min → Max)	3 → ?	3 → 12
Throughput (Peak)	—	18,400 req/s

Spike Test (500 → 8,000 Users, 60 Seconds)

This was the test we worried about most — and the results taught us several valuable lessons.

Round 1: Failed

First 30 seconds: response time spiked to 2,800ms
Error rate during spike: 3.2%
Root cause: New instances needed 45 seconds of warm-up before serving traffic

What We Changed:

Added Connection Pre-warming — New instances prepare their connection pools before accepting traffic
Increased Minimum Instances from 3 to 5 during time windows with likely spike patterns
Implemented Request Buffering at the load balancer — hold requests briefly rather than rejecting them immediately when backends aren't ready
Tuned the Circuit Breaker — when a service exceeds its response time threshold, serve a cached response instead

Round 2: Passed

Metric	Round 1	Round 2
Response Time (P95) during Spike	2,800ms	480ms ✅
Error Rate during Spike	3.2%	0.08% ✅
Time to Stabilize	3 minutes	90 seconds ✅

Soak Test (10,000 Users, 4 Hours)

Metric	Hour 1	Hour 4
Response Time (P95)	310ms	340ms
Memory Usage	62%	68%
Error Rate	0.02%	0.03%
Active Instances	12	12

No memory leaks detected — memory increased slightly as cache size grew, but remained within acceptable bounds.

Cost Analysis: How Expensive Is Autoscaling?

The question every client asks: "Won't autoscaling blow up our cloud bill?"

Scenario	Instances	Cost/Hour	Notes
Normal (500 Users)	3	~~฿150 (~~$4.30)	Baseline
Small Campaign (3,000 Users)	6	~~฿300 (~~$8.60)	2x scale
Large Campaign (10,000 Users)	12	~~฿600 (~~$17.10)	4x scale
Post-Campaign (Scale-down)	3	~~฿150 (~~$4.30)	Returns to baseline in 10 min

Compared to the ฿500K–2M lost per 30-minute outage, the additional ฿450/hour (~$13) during peak traffic is a remarkably good trade.

Critically, the system scales down too — during low-traffic periods, you're not paying for idle instances.

Lessons Learned

1. Optimize Before You Scale

Many organizations jump straight to autoscaling without examining baseline performance. The result: 10 instances all running slowly — 10x the cost, 0x the improvement.

We spent the first two weeks on baseline optimization (query tuning, cache strategy, payload reduction) and achieved a 2x performance gain without adding a single instance.

2. Spike Tests Matter More Than Load Tests

Load tests that gradually increase traffic always produce flattering results because the autoscaler has plenty of time to react. Spike tests — where traffic surges instantaneously — expose vulnerabilities that load tests never reveal.

If you can only run one test, make it a spike test.

3. Warm-up Time Is the Silent Killer

A new instance needs 30–60 seconds after spinning up before it can actually serve traffic. During that window, if existing instances can't handle the load, errors spike before reinforcements arrive.

The fix: Pre-warm connection pools, reduce application boot time, and set minimum instances high enough to absorb the initial wave.

4. Test With Real Scenarios, Not Just Homepage Hits

Load tests that repeatedly hit GET / produce dangerously misleading results because they don't test:

Database writes (checkout, registration)
Complex search queries
Session management under high load
Third-party API calls (payment gateways, SMS OTP)

5. Set Budget Alerts Alongside Autoscaling

Autoscaling without an upper limit can lead to runaway costs from DDoS attacks or bot traffic. Always configure a maximum instance limit and cost alerts.

What We'd Still Improve

We believe in transparency — here's what we'd do differently or continue working on:

Multi-region Failover — Currently running in a single region. If that region goes down, everything goes down with it
Canary Deployment + Autoscale — We haven't tested the scenario where a canary release coincides with a traffic spike
AI-based Predictive Scaling — Currently using rule-based scaling. In the future, an ML model could predict traffic patterns more accurately

Facing the Same Challenge?

A website that crashes during peak traffic isn't just a technical problem — it's a business problem that translates directly into lost revenue.

What Enersys can help with: assessing how much traffic your current infrastructure can actually handle, identifying bottlenecks, and building a plan to fix them — before your next campaign goes live.

Sources

Talk to Us About Infrastructure & Performance →

ลิงก์ที่เกี่ยวข้อง

Genesis AI Platform

ลองใช้ AI ในองค์กรจริง

บริการของ Enersys

ดูบริการทั้งหมดของเรา

ติดต่อเรา

ปรึกษาโปรเจกต์ของคุณ

Back to Insights

You Paid a Fortune for AI + ERP, but Got Only 10% of the Value — The “Last Mile” Problem Nobody Talks About

90% of enterprise AI projects fail—not because the technology is bad, but because people refuse to change. HBR and erp.today expose the Last Mile problem that costs companies millions every year.

Analog Businesses Are Dying — UTCC Reveals Thailand’s Rising Stars vs Falling Stars for 2026

The Thai Chamber of Commerce is clear: internet cafes, print media, and bookstores are fading away, while Cloud, Cybersecurity, and the Creator Economy are surging ahead. Thailand’s digital GDP is growing 4.2%—twice the pace of the national economy. Which side is your business on?

An Enersys Company Tour — Inside Every Room of a Thai Software House: Who Does What, and How AI Augments Each Role in 2026

Clients and partners keep asking the same questions — what does Enersys actually do, and who handles what inside the company? This article opens every door of the software house — all 14 rooms (an auspicious number in Thai culture) — from the front desk and engineering floor up to the Executive Office. You'll see who owns what, which AI assistant works alongside each role, and why the human + AI mix in 2026 ships better work faster.

"Empowering Innovation,
Transforming Futures."

ติดต่อเราเพื่อทำให้โปรเจกต์ของคุณเป็นจริง