Data Warehousing , Data Marts and Compelling Needs

Data Warehousing , Data Marts and Compelling Needs

What is Data Warehousing?

  • A data warehouse is a centralized and integrated repository of data collected from various sources within an organization.
  • Essentially, it's like a big storage unit where all the data from different parts of the business is gathered,
  • organized, and made accessible for analysis and decision-making.
  • Instead of scattered files and databases, everything is stored in one place, making it easier to access and analyze.

Database System vs Data Warehouse

Purpose

  • The primary purpose of a data warehouse is to store and analyze large volumes of data.
  • Databases are designed primarily for transactional processing.

Data Structure

  • Data in a data warehouse is typically structured for analysis and reporting.
  • Databases generally store normalized data to minimize redundancy and ensure data integrity.

Data Usage

  • Data warehouses are used for strategic decision-making, trend analysis, and forecasting.
  • Databases are used for operational tasks such as adding, modifying, or deleting records in real-time.

Data Latency

  • Data warehouses often incorporate data from various sources with different update frequencies.
  • As a result, they may have longer data latency.
  • Databases typically have low latency, ensuring that data changes are reflected almost immediately.

Database Usage

  • The retail company operates an online store where customers can make purchases.
  • The operational database manages real-time transactions
  • such as processing orders, updating inventory levels, and recording customer information.

Data Warehouse Usage

  • In addition to the operational database,
  • the retail company maintains a data warehouse to analyze sales trends, customer behavior, and inventory management.

Compelling Needs for Data warehousing

  • The compiling needs of a data warehouse refer to the process of gathering, integrating,
  • and organizing data from various sources to meet the specific requirements of the data warehouse environment.

Data Integration

  • Data warehouses often collect data from multiple sources within an organization, such as operational databases, CRM systems and external sources.
  • This integration ensures that all relevant data is available for analysis and reporting.

Data Cleansing and Transformation

  • Raw data collected from different sources may contain inconsistencies, errors, or missing values.
  • Before loading the data into the data warehouse, it's essential to cleanse and transform it to ensure accuracy and consistency.
  • This may involve tasks such as removing duplicates, standardizing formats, correcting errors, and filling in missing values.

Data Modeling

  • Data modeling is the process of designing the structure of the data warehouse to support efficient querying and analysis.
  • It involves defining data tables, relationships, hierarchies, and metadata within the data warehouse.
  • During the compiling process, data modeling is essential to organize the
  • integrated data into a logical schema that aligns with the business requirements and analytical needs of the organization.

Performance Optimization

  • Efficient data retrieval and query performance are critical for a data warehouse environment,
  • especially when dealing with large volumes of data.
  • The compiling process may include performance optimization techniques
  • such as indexing, partitioning, and aggregating data to enhance query speed and reduce processing time.

Features of Data Warehouse

for remembering the four main features of data warehouse is SINT

Subject-Oriented

  • Data warehouses are designed and organized around specific subjects or
  • themes relevant to the organization's business processes, such as sales, marketing or inventory.
  • This subject-oriented approach ensures that the data stored in the warehouse is tailored to support analysis and decision-making within these specific areas.

Integrated

  • Data warehouses consolidate data from various sources within the organization.
  • including operational systems, databases, and external sources.
  • This integrated approach ensures that data is consolidated and
  • standardized in a unified format, making it accessible and consistent for analysis and reporting.

Non-Volatile

  • Data warehouses are non-volatile, meaning that once data is stored in the warehouse,
  • it remains unchanged and is not overwritten or deleted.
  • This ensures data consistency and reliability for analysis and reporting purposes.

Time-Variant

  • Data warehouses store historical data and maintain a record of changes over time.
  • This time-variant feature allows users to analyze trends, patterns, and historical performance by examining data at different points in time.
  • It enables organizations to track changes, monitor progress, and make informed decisions based on historical insights.
Note:
  • These four features collectively define the essence of a data warehouse,
  • However other features are also available for but these are the main features.

What are Data Marts?

  • Data marts are subsets of a data warehouse that focus on specific business functions or departments.
  • Data marts are designed to provide quicker and more targeted access to data
  • for specific analytical purposes, such as sales analysis, marketing analytics, or finance reporting.
  • They are optimized for the needs of particular user groups or business functions, enabling more efficient analysis and decision-making.
  • Unlike data warehouses, which are centralized repositories, data marts are decentralized in nature.
  • They can be implemented independently or as part of a larger data warehouse architecture, depending on the organization's requirements and priorities.
  • Due to their focused scope and smaller size, data marts can be deployed more quickly than a comprehensive data warehouse.

Conclusion

So now we have basic understanding of these topics difference between Database System and Data Warehouse, The Compelling Need for data warehousing and Data marts in data warehousing.