Data Warehouse Testing

by Amol 17. March 2011 20:03

What is Data Warehouse?

A data warehouse is a repository where data is stored for analysis, reporting, archival and security purposes. A data warehouse is a system that retrieves data from source systems and puts it into a dimensional data store or a normalized data store. Data warehouse is system capable of having strong Business Intelligence. It is primarily used by the top level management to make business critical decisions. 

Is data warehouse testing important?

Following questions will answer the question.

  • Are decisions based on correct data?
  • Do organization and its employees have sufficient confidence in the new system?
  • How can organization be sure there are no anomalies in the huge amount of migrated data from source to data warehouse?
  • Does it comply with all organization needs?
  • Is sufficient security applied on sensitive data?

To get confident answers for above questions, yes, testing is one of the critical and important part of data warehouse application. 

Challenges in Data Warehouse Testing:

There are many challenges in Data Warehouse testing. Main challenge is due to the huge amount data from heterogeneous data sources.

Other challenges are

  • Complete organization-wide enterprise data knowledge and business knowledge may not be easily feasible.
  • Asynchronously updating of data from heterogeneous sources may result in data inconsistency or may contain partial loading of data into data warehouse.
  • Data quality is not assured at all the sources and no accurate picture of quality of underlying data.
  • 100% data verification will not be feasible unless all extraction, transformation and loading component tested thoroughly to ensure all the data loaded as expected.

Due to such challenges there is difference between normal database testing and data warehouse testing. Below table illustrate difference between database testing and data warehouse testing. 

Database Testing Data Warehouse Testing
Smaller in size Larger in size
Normalized Data De-Normalized data
Usually Homogeneous Data Usually Heterogeneous data is getting loaded.
Usually Data is being tested at source instead of using UI Extraction, Transformation and Loading component are the major one to test
Usually all kind of operations happen Usually read-only operations happen

Data warehouse testing phases:

Following are the main processes involved in data warehouse testing

Requirement Testing:

The main purpose for doing Requirements testing is to check defined requirements for completeness.

For example:

Are the requirements Complete?
Are the requirements Testable?
Are the requirements Ambiguous?
Are the requirements Developable?
Are the requirements Singular?

Unit testing:

Unit testing is usually whitebox testing. It should check the all ETL components,   procedures used in ETL, source and target mappings and transformation for the same, all the scheduled jobs and the reports. This kind of testing is usually done by developers.

Unit testing will involve following

  • Whether the entire source mapping done as per stated and ETL components are accessing right data from right source. 
  • All the data transformations are correct according to the business rules and data warehouse is correctly populated with the transformed data.
  • Testing rejected records that don’t fulfill transformation rules.

Integration testing:

Integration testing involves

  • Initial loading of data into data warehouse.
  • Incremental loading of data into data warehouse.
  • Sequence of extraction objects, transformation objects and loading objects.
  • Sequence of ETLs jobs.
  • Testing rejected records that don’t fulfill transformation rules.
  • Log and error log generation.
  • Record count validation by executing queries against source and target.
  • Data integrity between the various source tables and relationships.
  • Statistical analysis by validation for various calculations.
  • Data Quality Validation to check for missing data.
  • Field by field data verification to check the consistency of source and target data.
  • Validate at the possible lowest granular level. 

Security Testing:

Under the security testing, make sure only the authenticated users should be able to access the resources. Security testing involve 

Confidentiality:
        A security measure which protects against the disclosure of information to parties other than the intended recipient.

Authentication:
        Authentication can be done in many ways like passwords, biometrics, radio frequency identification etc. 

Authorization:
        Authorization determines a requester is allowed to receive or perform an operation.

Performance Testing:

Performance testing verifies that the data warehouse or application can handle the required load and volume. Performance Testing involves

Scalability:
        Will the application handles the expected users load.

Response Time:
        Applications Response time, does the application respond quickly enough under various loads.

Capacity:
        Will the application able to handle defined capacity of data or load.

Throughput:
        For increasing number of concurrent users, high transaction rate should the hardware and software achieves the required throughput without any performance issues.

Acceptance Testing:

Application is tested with full functionality and is expected to function on production.
At the end of UAT, the system should be acceptable to the client for use in terms of ETL process integrity and business functionality and reporting.

Tags: ,

Data Warehouse

Organize Data Warehouse Objects

by Amol 17. March 2011 08:35

Organize Data Warehouse Objects using schema.

Since my last data warehouse project I started using schemas to identify the data warehouse objects in SQL Server database. Earlier I was using suffix or prefix to identify data warehouse objects.  For example DimCustomer, DimProducts for dimension tables and SaleFact or FactSales for fact tables. This is also one of the practices we usually follow in any data warehouse project.

Microsoft introduced schemas in SQL Server, after that I realize this could be another best approach to organize data warehouse objects. So decided to use schemas for different objects in databases according to their role. The objects above would then rename as Dim.Customer, Dim.Products. So all dimesion tables built with schema named "Dim". Similarly schema "Fact" is created to for all fact tables. Then fact tables would rename as Fact.Sales.

Apart from dimension and fact, there are other objects as well so I have started to follow schema based objects and created below schemas to create different kind of objects.

Schema Pupose
Dim

Dim schema used for all the table which are dimensions in the nature.

Fact Fact schema used for all the table which are facts in the nature.
ETL ETL schema is used for all the objects which are used in ETL.
Stg Stg schema is used for all the objects which are used for staging purpose.
Report Report schema is used the objects which are used for reporting purpose.

There are some advantages of using schemas for objects.

  • Schema based security can be implemented easily.
  • Under SSMS, objects are getting sorted alphabetically by schema so its become more handy to find the objects.

Tags: ,

Data Warehouse

Types of Data Warehouse Dimension

by Amol 18. February 2011 08:51

In a data warehouse, dimension is one of the perspectives that can be used to analyze the data in an OLAP cube.
When you are browsing the data in a cube, you can view the data from the perspective of different combinations of dimensions.

With dimension, data can be slice and dice to analyse it in different prospective. Dimensions are used to provide structured lebeling information to measures. Another use of dimension other than lebeling is to provide filtering, grouping.

For example, for a Sales database, the dimensions could include Product, Time, Store, and Promotion.

Each dimension may have one or more hierarchies applied to it. For the "Date" dimension, there are several possible hierarchies: "Day > Month > Year", "Day > Week > Year", "Day > Month > Quarter > Year", etc.

Types of dimension :

  1. Conformed dimension
  2. Junk dimension
  3. Degenerate dimension
  4. Role-playing dimensions

 

Tags: , , ,

Data Warehouse

Category

Recent Posts

Tag cloud