Skip to main content
Case Studies

Autoscale + Load Test — When Your Website Needs to Handle 10,000 Concurrent Users, How Do You Know It Actually Can?

A case study on designing autoscaling infrastructure for a web application that must survive 50x traffic spikes from marketing campaigns — complete with performance testing that proves it works before launch day.

11 Mar 20268 min
AutoscalingPerformance TestingLoad TestingCloud InfrastructureDevOps

The Problem: "The Website Crashes Every Time We Launch a Campaign"

A client came to us with a problem that's painfully familiar to many engineering teams — their web application would crash every time marketing ran a campaign. On normal days, the system handled traffic comfortably. But when traffic surged 10–50x within five minutes, the servers buckled.

The pattern repeated like clockwork:

  • Response times jumped from 200ms to 15 seconds
  • Error rates blew past 40% within the first three minutes
  • Marketing had already spent the ad budget — but customers couldn't even load the page
  • The DevOps team would scramble to manually scale servers, a process that took 20–30 minutes

The cost: Each 30-minute outage resulted in an estimated ฿500,000–2,000,000 (~$14,000–$57,000) in lost revenue, depending on campaign size.


Defining Success

We aligned with the client on crystal-clear targets:

Objective Target
Concurrent Users 10,000 simultaneous
Response Time (P95) ≤ 500ms
Error Rate < 0.1%
Scale-up Time ≤ 2 minutes (trigger to ready)
Scale-down Time ≤ 10 minutes (after traffic drops)
Zero Downtime No downtime during scaling events

The hardest part wasn't "making it scale." It was proving it could scale — before the real campaign went live.


Architecture Principles

Scale Horizontally, Fail Gracefully

We designed the architecture around three core principles:

  1. Stateless Application — Every instance must function independently with no internal state, enabling instant horizontal scaling
  2. Metric-driven Scaling — Decisions based on real signals, not just CPU utilization — including request queue depth and response time
  3. Graceful Degradation — If the system truly can't keep up, it degrades in a controlled manner rather than collapsing entirely

Multi-layer Scaling Strategy

We didn't just scale the application servers. Every layer was designed to scale independently:

  • Load Balancer Layer — Traffic distribution and health checking
  • Application Layer — Scales based on request rate and response time
  • Cache Layer — Absorbs read-heavy traffic so not every request hits the database
  • Database Layer — Read replicas for queries that don't require 100% real-time data
  • Queue Layer — Offloads non-urgent work (emails, analytics updates) to asynchronous processing

Scaling Triggers: Beyond CPU

Many organizations set autoscaling to trigger at CPU > 70% — sounds reasonable, but in practice it's far too slow. By the time CPU hits 70%, users are already experiencing timeouts.

We used Composite Metrics that combine multiple signals:

  • Request Rate — Requests per second (a leading indicator)
  • Response Time P95 — If it starts climbing, the system is about to struggle
  • Active Connections — Number of connections waiting for a response
  • Queue Depth — Number of tasks waiting in queue

When these signals cross their thresholds simultaneously, the system scales up immediately — without waiting for CPU to overheat.

Predictive Scaling: Knowing Before the Spike Arrives

For campaigns with known schedules, we added Scheduled Scaling — the system pre-warms instances 15 minutes before a campaign goes live. No waiting for traffic to arrive before reacting.


Performance Testing: Proving It Before Launch Day

This is the part we care about most — if you don't test properly, you're gambling.

Test Strategy: Four Layers

We designed testing in four progressive layers, each with a distinct purpose:

1. Baseline Test — Know Your Starting Point

Test at normal traffic levels (500 concurrent users) to establish baseline performance. If the baseline is poor, scaling won't help — you'll just have more slow instances.

2. Load Test — Test the Target

Gradually ramp from 500 to 10,000 concurrent users over 20 minutes to observe:

  • How quickly does the autoscaler respond?
  • How fast are new instances ready to serve traffic?
  • Do errors occur during scaling transitions?

3. Spike Test — Simulate Reality

This is the most critical test. Simulate traffic jumping from 500 to 8,000 within 60 seconds — mimicking a real campaign launch. The key questions:

  • Can the system survive the first 60–120 seconds before new instances are ready?
  • Are any requests timing out while waiting for scale-up?

4. Soak Test — Test Endurance

Run 10,000 users continuously for 4 hours to detect:

  • Memory leaks
  • Connection pool exhaustion
  • Database connections not being returned
  • Cache invalidation behaving unexpectedly

Designing Realistic Test Scenarios

A common mistake in load testing is hammering a single endpoint repeatedly — the result is a 99% cache hit rate and beautiful numbers that mean nothing in production.

We designed test scenarios that mirror actual user behavior:

Behavior Proportion
Home → Browse products → Leave 45%
Home → Search → View product → Add to cart 30%
Home → Search → View product → Purchase → Payment 15%
Home → Login → View order history 10%

Each scenario includes Think Time (the pause while a user reads a page before clicking) randomized between 2–8 seconds, closely mimicking real behavior.


Results

Baseline Test (500 Users)

Metric Before Optimization After Optimization
Response Time (P50) 180ms 95ms
Response Time (P95) 450ms 210ms
Response Time (P99) 1,200ms 380ms
Throughput 2,800 req/s 4,200 req/s
Error Rate 0.02% 0.01%

Just by optimizing the baseline (query tuning, cache strategy, payload reduction), we achieved a 2x performance improvement — without adding a single instance.

Load Test (500 → 10,000 Users, 20 Minutes)

Metric Target Actual
Concurrent Users 10,000 10,000 ✅
Response Time (P95) ≤ 500ms 320ms ✅
Error Rate < 0.1% 0.03% ✅
Scale-up Time ≤ 2 min 90 seconds ✅
Instances (Min → Max) 3 → ? 3 → 12
Throughput (Peak) 18,400 req/s

Spike Test (500 → 8,000 Users, 60 Seconds)

This was the test we worried about most — and the results taught us several valuable lessons.

Round 1: Failed

  • First 30 seconds: response time spiked to 2,800ms
  • Error rate during spike: 3.2%
  • Root cause: New instances needed 45 seconds of warm-up before serving traffic

What We Changed:

  • Added Connection Pre-warming — New instances prepare their connection pools before accepting traffic
  • Increased Minimum Instances from 3 to 5 during time windows with likely spike patterns
  • Implemented Request Buffering at the load balancer — hold requests briefly rather than rejecting them immediately when backends aren't ready
  • Tuned the Circuit Breaker — when a service exceeds its response time threshold, serve a cached response instead

Round 2: Passed

Metric Round 1 Round 2
Response Time (P95) during Spike 2,800ms 480ms ✅
Error Rate during Spike 3.2% 0.08% ✅
Time to Stabilize 3 minutes 90 seconds ✅

Soak Test (10,000 Users, 4 Hours)

Metric Hour 1 Hour 4
Response Time (P95) 310ms 340ms
Memory Usage 62% 68%
Error Rate 0.02% 0.03%
Active Instances 12 12

No memory leaks detected — memory increased slightly as cache size grew, but remained within acceptable bounds.


Cost Analysis: How Expensive Is Autoscaling?

The question every client asks: "Won't autoscaling blow up our cloud bill?"

Scenario Instances Cost/Hour Notes
Normal (500 Users) 3 ฿150 ($4.30) Baseline
Small Campaign (3,000 Users) 6 ฿300 ($8.60) 2x scale
Large Campaign (10,000 Users) 12 ฿600 ($17.10) 4x scale
Post-Campaign (Scale-down) 3 ฿150 ($4.30) Returns to baseline in 10 min

Compared to the ฿500K–2M lost per 30-minute outage, the additional ฿450/hour (~$13) during peak traffic is a remarkably good trade.

Critically, the system scales down too — during low-traffic periods, you're not paying for idle instances.


Lessons Learned

1. Optimize Before You Scale

Many organizations jump straight to autoscaling without examining baseline performance. The result: 10 instances all running slowly — 10x the cost, 0x the improvement.

We spent the first two weeks on baseline optimization (query tuning, cache strategy, payload reduction) and achieved a 2x performance gain without adding a single instance.

2. Spike Tests Matter More Than Load Tests

Load tests that gradually increase traffic always produce flattering results because the autoscaler has plenty of time to react. Spike tests — where traffic surges instantaneously — expose vulnerabilities that load tests never reveal.

If you can only run one test, make it a spike test.

3. Warm-up Time Is the Silent Killer

A new instance needs 30–60 seconds after spinning up before it can actually serve traffic. During that window, if existing instances can't handle the load, errors spike before reinforcements arrive.

The fix: Pre-warm connection pools, reduce application boot time, and set minimum instances high enough to absorb the initial wave.

4. Test With Real Scenarios, Not Just Homepage Hits

Load tests that repeatedly hit GET / produce dangerously misleading results because they don't test:

  • Database writes (checkout, registration)
  • Complex search queries
  • Session management under high load
  • Third-party API calls (payment gateways, SMS OTP)

5. Set Budget Alerts Alongside Autoscaling

Autoscaling without an upper limit can lead to runaway costs from DDoS attacks or bot traffic. Always configure a maximum instance limit and cost alerts.


What We'd Still Improve

We believe in transparency — here's what we'd do differently or continue working on:

  • Multi-region Failover — Currently running in a single region. If that region goes down, everything goes down with it
  • Canary Deployment + Autoscale — We haven't tested the scenario where a canary release coincides with a traffic spike
  • AI-based Predictive Scaling — Currently using rule-based scaling. In the future, an ML model could predict traffic patterns more accurately

Facing the Same Challenge?

A website that crashes during peak traffic isn't just a technical problem — it's a business problem that translates directly into lost revenue.

What Enersys can help with: assessing how much traffic your current infrastructure can actually handle, identifying bottlenecks, and building a plan to fix them — before your next campaign goes live.

Sources


Talk to Us About Infrastructure & Performance →

Related Articles

เบื้องหลัง CI/CD Pipeline ของ Enersys — จาก git push ถึงเว็บไซต์ Live ใน 5 นาที

เปิดเบื้องหลังวิธีที่เราส่งมอบเว็บไซต์ enersys.co.th ขึ้น Production — ตั้งแต่ Pull Request, Docker Multi-Stage Build, DigitalOcean Registry จนถึง Kubernetes Rolling Update บน Self-Hosted Runner

PDPA Compliance Automation — เมื่อต้องจัดการข้อมูลส่วนบุคคล 500,000 Records แล้วจะรู้ได้ยังไงว่าไม่หลุด?

Case Study การสร้างระบบ PDPA Compliance อัตโนมัติสำหรับองค์กรที่มีข้อมูลส่วนบุคคลกว่า 500,000 Records — ตั้งแต่ Data Mapping, Consent Management จนถึง Breach Detection ที่ทำงาน 24/7

AI Chatbot ROI — เมื่อลูกค้าถาม 3,000 คำถาม/วัน แล้ว Bot ตอบถูกแค่ 60% จะปรับยังไง?

Case Study การปรับปรุง AI Chatbot ที่ตอบถูกแค่ 60% ให้กลายเป็นระบบที่ Accuracy 94% ลด Cost per Interaction 68% และ Handle 85% ของคำถามทั้งหมดโดยไม่ต้องส่งต่อคน

"Empowering Innovation,
Transforming Futures."

Contact us to make your project a reality.