DB Design #1 - Requirement analysis
Before designing any database, it is essential to analyse the requirements to avoid complex migrations and to make our database highly scalable.
75% of the interviews that I attended had database-related questions, either designing a database or writing a DB query for a given design.
Knowledge of database design and optimization is critical for any back-end/Full-Stack developer.
The performance of any software application is directly related to how data is stored and retrieved.
This is Part 1 of the DBDesign series which covers the first step to designing any database — Requirement analysis.
What is requirement analysis?
Before designing the database, we must have a deep understanding of the following:
Expectations of the user
Intended use of the database
Once these factors are understood and laying out a basic idea of what our database is going to do, we start analyzing more intricate requirements.
The following list of factors is analyzed to determine the usage frequency of our database.
Origin of users (Geographically)
Frequency of data addition and access
Destination of reports from the database
With this data, we will have a basic understanding of the input and output format of the transactions (Data in and Data out) for our database.
This analysis is done before we begin designing the database to make sure we have all edge cases covered in advance.
Other factors like User views and UX functionalities will also be considered while designing our database. For example, if we are planning to give an extensive search functionality, then we’ll have to consider the scalability factor.
Scenario
Let us consider a scenario for which we will be designing a database step-by-step throughout this series.
We have to design a database for an e-commerce site where customers can browse for products, make payments, place orders, and view the list of previous orders with order status.
Note that we will not dive deep into how we handle payments, third-party vendors, and intricate details of each data type.
However, I’ll try to provide references for you to learn more about these concepts in upcoming articles.
Initial analysis
By looking at the above statement, we will have a basic idea of what our application and database will provide the users.
We will try to answer the following questions to make the requirements clear.
What are the attributes that we will store for our customers?
Are there multiple types of products to be listed?
Do we provide extensive search functionality for our products? (Like Flipkart or Amazon)
How many transactions per second can we expect at most? (Millions of transactions per second happen during GreatIndianSale on Amazon)
Answers to these questions make the requirements clearer for us to start working on our database design.
Keep in mind that we will focus only on the database architecture and not the backend logic for handling these transactions.
How will we efficiently store data and how will we retrieve it as soon as possible for our use cases?
Type of database
Multiple types of databases are suitable for various use cases. We have to choose which type we will be using for our scenario.
For our scenario, we will be using a relational database as there are multiple relationships between different entities.
Check out my article on SQL VS NOSQL databases for more information on this decision.
All-in-one table
With the given information, let us try to store everything in one table. It is easy to implement and all data will be in the same place, right?
The table shown in the screenshot is a small portion of the one table, in which we are storing all necessary data.
The parts of the table highlighted in green have duplicate data stored repeatedly. This data is redundant and uses excessive memory.
In a case where we have millions of orders placed, this table will have a huge number of redundant data which will cost us a lot of memory capacity and money.
Entity segregation
One of the most important steps in laying down the base of the database design is Entity segregation.
Entity segregation is the process of defining the entities from our requirements.
Let us segregate entities for our requirement from the above “all-in-one” table.
1) First green box
All data in these two columns are related to the customers and it is not affected by any other column. We also notice the repetition of this data.
We can segregate this as a separate entity — CUSTOMERS.
2) Second green box
All data in these two columns are related to the products and it is not affected by any other column.
We can segregate this as a separate entity — PRODUCTS.
3) Third green box
All data in these two columns are related to the orders and it is not affected by any other column.
We can segregate this as a separate entity — ORDERS.
Analyzing these groups of redundant data shows that, everything remains the same for a particular entity.
Why segregation?
To answer this question, we will first look at the obvious disadvantages of the “All-in-one” table.
Updating our customer name will become extremely tedious as we have duplicate data all over our table.
Searching of any kind for any entity is an expensive operation due to a large number of columns.
Adding a new data parameter for customers called “date of birth” becomes complicated, as this involves adding a new column to the already complicated single table.
These are just very few reasons for the all-in-one table being a bad idea.
Entity segregation helps us eliminate all these disadvantages and enables smooth CRUD operations for all our entities.
Entities for our scenario
The entities for our scenario is listed below.
Customers
Products
Product categories
Orders
OrderDetails
Payments
How do we link two different entities?
Now that we have segregated all our entities from our requirement statement, how do we establish a link between different entities?
Next week, we will look at table relationships to understand this better.
If you have any queries on entity segregation, feel free to comment below or reach out to me. I’d be happy to assist you.
Until next time!