Today, I was advising a team on building their data warehouse solution. I realized that even 40 years after the term “data warehouse” was first introduced, there are still questions about why we need a data warehouse and why we don’t get all of the data from application databases, especially by executives. I write this post to answer these questions by clarifying the terms OLTP and OLAP, which are frequently used in discussions about data warehouse database architecture. Then I will explain why OLTP databases are inefficient for OLAP queries and why you need a separate database known as a data warehouse.

OLTP

OLTP, or Online Transaction Processing, is a pattern by which we access and manipulate data in the database transaction-by-transaction. A transaction refers to a single unit of work, such as a money transfer, a book, a blog post, and so on. Typically, users often only interact with one or a few transactions at a time. Therefore, most of the time, applications look up a small number of records in databases by some keys. Application databases implement special indexing techniques such as B-tree or LSM-tree to handle OLTP efficiently. They can quickly access a particular transaction given its indexed key.

OLAP

As businesses grow and accumulate data, they need to analyze it to gain valuable insights about their market and customers. Then they can make informed decisions and gain competitive advantage. When it comes to analytics, access patterns will be very different. Typically, analytic queries consume a large number of records, look for only a few specific columns of each record, and often aggregate data to calculate statistics (min, max, sum, average,…). This pattern of accessing data in the database is called Online Analytic Processing (OLAP).

Difference between OLTP and OLAP

From the above definitions, we can somehow distinguish OLTP vs OLAP. The following table shows the typical differences:

OLTPOLAP
AccessSmall number of records, using indexed keysLarge amount of records, often aggregate
PurposeApplication transactional consistency and speedComplex queries and analysis
UsersApplication end usersAnalysts and business users.
Data volumeRelatively small, frequently accessedLarge datasets, accessed less frequently
Data typeReal-time, current dataHistorical, aggregated data

* Differences between OLTP and OLAP

Problems of OLTP Databases with OLAP queries

When your business is still young, it is easy to run analysis directly on application databases. However, as the volume of data and the need for analysis grows along with the business, problems arise. Databases that were optimized for OLTP using indexing techniques such as LSM tree or B-tree now struggle to execute OLAP queries efficiently. As a result, running OLAP queries becomes costly and negatively impacts application performance, which is critical to business success.

As the business continues to grow, different business units tend to operate independently with their own goals, priorities, concerns, and IT budgets. Each business unit will maintain its own applications running on separate databases. Performing analysis when data is scattered in different locations is difficult. And analysts often end up exporting data from different places, putting it into a single Excel file, and VLOOKUP.

Data Warehouse

In response to the challenges of running OLAP queries on traditional business databases, the concept of a data warehouse is emerging as a solution.

  • Data Warehouse functions as a dedicated space for analytical purpose. It allows business to store massive amounts of historical and current data without impacting operational databases.
  • Data Warehouses are designed with a focus on analytical processing. Their storage engines use specialized techniques to speed up OLAP queries. We may explore these techniques in other posts.
  • Data warehouses serve as a centralized repository for data from various sources. Analysis is now easier because all of the necessary data is in a single place.

OLTP Databases to OLAP Data Warehouse

Conclusion

In this post, we’ve gone over the definitions and differences between OLTP and OLAP. We also looked into the role of the data warehouse in conducting business analysis. Understanding them should give you the confidence when you say to your boss, “We need a data warehouse.”