Data Warehouse, Data Mart, Data Lake and Operational Data Storage(ODS)
Data Warehouse
The data warehouse, the Internet, and large-scale technological development have led to the explosive growth of data in today’s world. Corporate decision makers, on the other hand, want to examine the relationship between data, tap into the hidden features of data, and analyze and explore deeper levels of data.
However, data sharing between different databases of the enterprise is not possible, due to multiple databases in the same enterprise, integration between databases poses great challenges, especially in terms of consolidation and storage of big data.
Operational databases may be scattered around the Microsoft SQL Server database or Oracle database, the purpose of the data warehouse is to extract and process multiple databases to collect data from hundreds of gigabytes of data and, according to the required format, transform, clean, process, and finally store the data into the warehouse is to install.
According to IBM researchers (Barry Devlin and Paul Murphy) “A data warehouse is a subject-oriented, integrated, relatively stable collection of data that reflects historical changes, used to support management decision making.”
- By its nature, a data warehouse is used to complete decision making for management and analytics of business data operation, but it is different from the operational database of the enterprise.
- A data warehouse is the efficient integration and management of multiple heterogeneous data sources in a single repository, organized in terms of historical data, and there is no need for transactional modification of data in a data warehouse.
After the emergence of data warehouses, the information needs of businesses have moved from relational databases to a decision support system. This decision support system is actually what we call Business Intelligence (BI).
Data Mart
Compared to a data warehouse, the data mart can be understood as a “small data warehouse”, it does not depend on heterogeneous databases, but only on a single instance of an operational database, and the data coverage is not wide enough. Data mart specifically targets a specific business operation (sales, production) data mart users find the data they need quickly, in data mart you only need to design and create database tables, populate database tables with relevant data and decide who can access the dataset.
For more detailed information about data mart, you can reach my previous article here.
Data Lake
Much like flowing water in its natural state, data flows from the multi-source system into this lake, users can obtain, validate, manage and perform other BI tasks outside of the data lake. The data lake can evolve to implement the following features;
- Imports all data from source systems, no data loss from source systems.
- The data is stored in its original state without converting the original data.
- The data lake schema accurately satisfies the data analysis requirements.
- It has a data lock, control and management.
Operational Data Storage (ODS)
Operational Data Storage (ODS) is a database for transactional processing data, data in ODS is mainly raw data, data from ODS is always moved to data warehouse or data mart for further processing. In ODS you can query data and access only the latest developments in business operation.
I tried to give information about data mart as much as I could.
If you want me to give more detailed information about data warehouse, data lake or operational data storage, please write in the comments :)
Stay healthy :)
Best regards,
Erdem