Components, Three tier architecture and Metadata in Dwdm

Components, Three tier architecture and Metadata in Dwdm

Components of Data Warehouse

The components of a data warehouse include

Data Sources

  • Data sources refer to the various systems, databases, applications,
  • and external sources from which data is collected and extracted for storage in the data warehouse.

Types of Data Sources

  • Production Data
  • Internal Data
  • External Data
  • Archived Data

Production Data

  • Production data refers to the operational data generated by the day-to-day activities of an organization's business processes.
  • This includes transactional data, customer interactions, sales records, and other real-time data captured by operational systems.
  • Such as (ERP), and customer relationship management (CRM).

Internal Data

  • Internal data sources are data generated and collected within the organization's own systems and databases.
  • These may include data from internal applications, databases, spreadsheets, files, and other proprietary sources.

External Data

  • External data sources are data obtained from sources outside the organization
  • such as third-party vendors, partners, suppliers and public sources.
  • External data may include demographic data, market trends, etc.

Archived Data

  • Archived data refers to historical data that is no longer actively used in production systems
  • but is retained for reference, compliance, or analytical purposes.
  • This may include historical transaction data, customer records, financial statements,
  • and other archived data that is stored in data warehouses .

Data Staging

The data staging process involves three main components: data extraction, data transformation, and data loading. Let's explore each component:

Data Extraction

  • Data extraction is the first step in the data staging process.
  • It involves retrieving data from various source systems, databases, applications, or external sources.
  • The data extraction process can be performed using different methods
  • such as database queries, file transfers, APIs (Application Programming Interfaces), or Change Data Capture (CDC) mechanisms.
  • The goal of data extraction is to gather relevant data from the source systems and prepare it for further processing.

Data Transformation

  • Data transformation is the second step in the data staging process,
  • where the extracted data is processed and prepared for loading into the data warehouse.
  • This involves applying various operations to the extracted data to cleanse, standardize, enrich, and integrate it with existing data.
  • Common data transformation tasks include:
  • Cleaning: Removing duplicates, correcting errors, and standardizing formats to ensure data consistency and accuracy.
  • Formatting: Converting data types, standardizing date formats, and applying data validations to meet the requirements of the data warehouse.
  • Enriching: Enhancing the extracted data with additional attributes, calculations to enrich its value for analysis.
  • Aggregating: Summarizing or aggregating data at different levels of granularity or unfiltered to facilitate analysis and reporting.
  • The transformed data is typically stored in an intermediate staging area or temporary storage before being loaded into the data warehouse.

Data Loading

  • Data loading is the final step in the data staging process,
  • where the transformed data is loaded into the data warehouse for storage and analysis.
  • This involves inserting, updating, or appending the transformed data into the appropriate tables within the data warehouse.
  • Data loading can be performed using different loading techniques such as
  • bulk loading, incremental loading, or real-time loading, depending on the volume and frequency of data updates.
  • The loaded data is stored in the data warehouse in a structured format,
  • ready to be queried, analyzed by users or business intelligence tools.

Data Storage Component

  • Data storage encompasses or bound the infrastructure and
  • processes involved in securely storing the transformed and integrated data collected from various sources.
  • This component utilizes a relational database management system (RDBMS) to organize and
  • manage structured data based on a predefined schema, which can be either dimensional or normalized.
  • The data is structured into tables, rows, and columns to facilitate efficient querying, analysis, and reporting.
  • Data storage also manages data retention, ensuring that historical data is retained for analysis.

Information Delivery Component

  • The Information Delivery component encompasses or surround tools and
  • processes that enable users to access, analyze, and derive insights from the data stored in the warehouse.
  • This includes query and reporting tools for retrieving specific information and generating reports,
  • OLAP tools for multidimensional analysis, data visualization tools for creating visual representations of data.

Three Tier Architecture

  • The three-tier architecture refers to a design framework that organizes the components of a data warehouse into
  • three distinct layers: the presentation layer, the application layer, and the data layer.
image

Presentation Layer (Front-end Layer)

User Interfaces

  • The presentation layer includes user interfaces such as web-based dashboards, reporting tools,
  • and interactive visualization applications that allow users to access and interact with the data stored in the warehouse.

Data Visualization

  • It provides tools and capabilities for visualizing data through charts, graphs, heatmaps,
  • and other graphical representations, making it easier for users to understand and interpret complex data sets.

Customization Options

  • Users can customize the presentation layer according to their preferences, such as selecting specific metrics,
  • adjusting visualization settings, and creating personalized dashboards tailored to their analytical needs.

Interactivity

  • The presentation layer enables users to interact with the data dynamically, such as filtering data,
  • drilling down into details, and exploring different dimensions, to gain deeper insights and make informed decisions.

Accessibility

  • It ensures that data is presented in a user-friendly and accessible manner, with intuitive navigation,
  • responsive design, and support for various devices and screen sizes to accommodate diverse user needs.

Application Layer (Middle Tier)

Business Logic

  • The application layer encapsulates or summarize the business logic and processing rules that govern how data is accessed,
  • manipulated, and managed within the data warehouse environment.

Query Processing

  • It handles tasks related to query optimization, execution, and performance
  • tuning to ensure that queries are processed efficiently and accurately, delivering timely results to users.

Data Transformation

  • The application layer performs data transformation tasks, such as cleansing, standardization,
  • to prepare data for analysis and reporting in the presentation layer.

Security Enforcement

  • It enforces security policies and access controls to ensure that only authorized
  • users can access and modify data, protecting sensitive information and maintaining data privacy and confidentiality.

Integration Services

  • The application layer facilitates data integration by integrating data from multiple source systems,
  • arrange data flows, and ensuring data consistency and reliability across the data warehouse environment.

Data Layer (Back-end Layer or Storage Layer)

Data Storage

  • The data layer stores the cleansed, and transformed data from various source systems in a structured format within the data warehouse,
  • typically using a relational database management system (RDBMS) or data storage technology.

Data Management

  • It manages data storage, retrieval and organization to ensure data integrity,
  • and accessibility for querying, analysis, and reporting purposes.

Scalability

  • The data layer is designed to scale horizontally or vertically to accommodate growing data volumes and
  • evolving analytical needs, ensuring that the data warehouse can handle increasing data loads and user demands.

Data Governance

  • It implements data governance policies and practices to govern data quality,
  • metadata management, data lineage, and compliance with regulatory requirements and organizational standards.

Performance Optimization

  • The data layer optimizes database performance through techniques such as
  • indexing, partitioning, caching, and compression to enhance query speed, reduce latency, and improve overall system performance.

Meta Data

  • Metadata in a data warehouse refers to data about the data stored within the warehouse.
  • It provides essential information about the structure, content, and context of the data, facilitating
  • understanding, management, and utilization of the data within the warehouse environment.
  • Metadata includes details such as data definitions, data lineage, data sources, transformations, and business rules.
  • It helps users interpret and utilize the data effectively, ensuring data quality,
  • uniformity, and adherence to organizational standards and regulations.

Conclusion

We have explored Components such as Data sources, Data Staging , data loading , three tier architecture such as Data, Application layer and Metadata in the data warehouse.