Online Voting System Design for Election

Designing a System to Handle Hundreds of Millions of Online Votes for an Election

Sep 12, 2024

Welcome to Coderbased, a system design newsletter that literally about designing a system that become foundation of a product and feature.
Before we start, kindly to subscribe, follow me on Linkedin, Twitter and share this content to your friends. Enjoy.

With US Election Day approaching, I’ve been thinking about designing a system for online voting. While voting systems may seem simple, handling hundreds of millions of votes within a short time frame could present significant challenges.

Building an Online Voting System

Requirements & Scope

In this system design challenge, we are tasked with creating an online voting system for an election scheduled on day X in the year 2024, with 250 million voters expected to participate between 08:00 AM and 12:00 PM. The system must be designed to handle and withstand heavy traffic.

Detailed Requirements & Scope:

Authentication and security are out of scope. The focus is on building a system capable of handling high traffic.
An estimated 250 million voters are expected to participate.
Each user can vote for one candidate from two or more options.
Users cannot vote more than once. If a second attempt is made, the first vote will be counted.
Users should be able to view real-time results for each state. “Real-time” means a user perceives it as real-time, so a delay of one or two minutes is acceptable.

For more clarity, see the illustration below.

Election Online Voting System Requirement — Election Online Voting Illustration

Breaking Down the Challenge

Before diving into the solution, it's important to understand why this task is challenging. Designing an online voting system for low traffic might be straightforward, but managing high traffic at this scale is much more complex.

Typically, the traffic pattern shows a sharp spike in the first 5-10 minutes, after which it starts to flatten. This initial peak is the critical point where the system must be designed to handle the load.

There are two main use cases to address in an online voting system: inserting votes and showing live results.

Challenges in Displaying Live Results

Let’s assume that at least 50% of voters will cast their votes during the peak period (5 after voting opens). This translates to:

125 million voters ÷ 5 minutes = around 400,000 requests per second (RPS).

Given the unpredictability of traffic spikes, we should scale for 2x-4x the estimated load, meaning the system must handle at least 1 to 2 million RPS during peak times.

Challenges in Votes Submission

High traffic with numerous simultaneous insertions into the database can cause significant delays. A single insert query to databases like MySQL or Postgres typically takes 2-5ms. So, what happens when there are 10 million inserts?

10,000,000 × 2ms (optimistic scenario) = 5.5 hours.

Given the expectation of more than 10 million requests within the first 5 minutes, inserting them all directly into the database would take 5.5 hours. For 250 million votes, it would take approximately 138 hours (5.7 days).

Basically,tThe main challenge in designing an online voting system for an election is efficiently handling high read and high write loads simultaneously.

Question: How do you design a system that can handle hundred millions online voting for election? Feel free to contribute and share it by filling out the form at this link.

Now let’s go to our proposed solution

Proposed Solution

High Level Design High Scalable Online Voting System Design

At a high level, we propose the system design shown in the picture above. Let’s break down each component:

Database (Vote DB): We will use MySQL or Postgres because we need ACID properties. For this solution, we will use Postgres.
Vote Input Service: This service handles voting submissions from voters and logs them in a vote log file. We will discuss the specifics later.
Vote DB Service: This service processes the vote log file and inserts the data into the Postgres database in bulk.
Vote Summarizer Service: Instead of calculating results directly from the database for every request, this service will run every minute, query the data, and cache the required information (in this case, using Redis) for displaying results.
Vote Result Service: This service is a layer that presents the vote results from the Redis cache.

Handling Votes Submission Use Case

Here’s the idea: we cannot write every request directly into the database, so we need to insert votes in bulk.

Why?

Inserting a large number of records one-by-one is inefficient and resource-intensive. The back-and-forth communication consumes a lot of I/O resources. Let’s compare single inserts to bulk inserts:

A single insert requires 2-5 ms.
A bulk insert (5000 records) requires 20-25 ms.

Calculation:

(250 million records ÷ 5000) * 25 ms = 21 minutes

This means that between 08:00 AM and 08:21 AM, there might be a queue of data waiting to be inserted. After that, the system will process in near real-time. I believe this still offers a good real-time experience.

Next, How do we handle bulk insertion?

We need to queue vote data into a persistent system. The simplest solution is to use a log file, where every data entry posted through the Vote Input Service is appended to a persistent log file.

There are 4 stages that the log file will go through until the data is bulk-inserted into the database:

Current: Every minute, the Vote Input Service will open a single current file, indicated by the hour and minute in its name. For example, if the current time is 08:02:33 AM, the file will be named voting-08:02.log.current. When a vote was submitted, it will be appended to this file.
Waiting: Every minute, the current file is closed and renamed to a waiting file. The naming pattern will change to something like voting-08:02.log.waiting.
Ready: The waiting file may contain more than 5000 records (or however many you choose for each chunk). The Vote DB Service will split the file into smaller ready files. For example, if voting-08:00.log.waiting contains 12,000 records, it will be split into:
- voting-08:00.log.1.ready (5000 datas)
- voting-08:00.log.2.ready (5000 datas)
- voting-08:00.log.3.ready (2000 datas)
Done: The Vote DB Service has another cron job that detects ready files, processes them, and bulk inserts the data into the database. After a successful insertion, the files are renamed to done, for example:
- voting-08:00.log.1.ready → voting-08:00.log.1.done
- voting-08:00.log.2.ready → voting-08:00.log.3.done
- voting-08:00.log.2.ready → voting-08:00.log.3.done

Handling Showing Vote Result

Now that inserting vote data into the database is clear, we need to handle how to display the vote results, where traffic can be much higher than for vote submissions.

There’s nothing particularly complex about the database design; it can be kept simple, as shown in the diagram above.

If we want to display the result based on who’s winning in each state, as illustrated in the product example, a simple query like this could work:

SELECT s.state, v.candidate, count(vo.*) as total_vote
FROM vote v 
INNER JOIN voter vo on vo.identity_card_number = v. identity_card_number
INNER JOIN state s on s.id = vo.state
GROUP BY s.state, v.candidate

However, running this query for every request is not scalable. Therefore, we need to implement a caching mechanism.

A typical caching mechanism using Redis follows this flow:

Hit the Vote Result Service.
Check Redis for cached data.
If cached, return the data.
If not cached, query the database and cache the result.

We cannot use this approach because it can lead to inconsistencies between different virtual machines (VMs), especially if we plan to scale horizontally. Inconsistent data is likely because vote data is increasing rapidly.

That’s why the Vote Summarizer Service is essential. This service will run a scheduled job (cron) that executes the query mentioned above and stores the result in Redis as a cache.

Since Redis would be a single point of failure in this setup, we need to implement Redis replication (e.g., primary and slave).

With this design, when a viewer requests the vote results, they will hit the Vote Result Service, which retrieves the data from the Redis cache. This approach is highly scalable—we only need to scale the service and Redis replication.

Note: To make the website appear real-time, it can implement a timer to refresh and re-request the data from the Vote Result Service.

That’s all for today. If you have a system design topic you're interested in, or any other interesting system design solution for specific topic, don't hesitate to let me know—I’d be more than happy to share them here.
Propose Topic For Next Post
Propose Your Solution