Generate insert data script

by Amol 21. March 2011 08:12

Problem: 

In many situations we need to populate table with required data and for that we need INSERT INTO statements in SQL Server. Writing such script would be time consuming if you have thousands of records. Mostly we need this script to populate Meta data tables from development box to production server. This can be achieve in couple of ways, like import export functionality, BCP out and BCP in functionality. These options are also time consuming and involved little bit efforts to do it. Also for these options you need some extra rights to perform these operations. 

Solution:

Isn't it be nice to call just one procedure to generate all insert data script and just run that script wherever you need to run? So here is the procedure to generate insert data script. Click to download the procedure prcCreateInsertScript and create this procedure into your database.

Create_Insert_Data_Script.sql (3.34 kb)

How it works?

Let’s take an example, you have Meta data table Country having the list of all the countries. 

 

CREATE TABLE [dbo].[Country] 
(
      Id INT NOT NULL,
      Name VARCHAR(255) NOT NULL
)
GO

 

Add some countries in country table. 

 

INSERT INTO [dbo].[Country]
SELECT 1, 'India'
UNION ALL SELECT 2, 'United States of America'
UNION ALL SELECT 3, 'Japan'
UNION ALL SELECT 4, 'Austrelia'
UNION ALL SELECT 5, 'England'
UNION ALL SELECT 6, 'Keneya'
UNION ALL SELECT 7, 'China'
UNION ALL SELECT 8, 'South Africa'
UNION ALL SELECT 9, 'West Indies'

GO

 

Run the above script to create Country table development box. There are many countries; I have just listed some of them to show as an example.  

Now execute the procedure [dbo].[prcCreateInsertScript] to generate insert script for countries listed in Country table.

 

EXEC [dbo].[prcCreateInsertScript] 
@ObjectName = 'Country', 
@Schema = 'dbo'

GO

 

Execute the above procedure to generate insert data script. It will show the resultset shown as below. Just copy these statements and run it wherever you needed.

 

 

[dbo].[prcCreateInsertScript]  procedure accept two parameters, one is @ObjectName, this is the object name (Table Name) for which you need to generate insert data script. And second one is @Schema, this is the schema (owner) of the table. By default it takes "dbo" as schema. The procedure will run without @Schema parameter if table owner is dbo otherwise you can specify the owner of the table. 

Another case here is to consider IDENTITY column while generating insert data script. Let's revise the above example by considering Id as identity column. 

 

CREATE TABLE [dbo].[Country] 
(
     Id INT IDENTITY(1,1) NOT NULL,
     Name VARCHAR(255) NOT NULL
)

GO

 

Add some countries in country table. 

 

INSERT INTO [dbo].[Country]

SELECT 'India'
UNION ALL SELECT 'United States of America'
UNION ALL SELECT 'Japan'
UNION ALL SELECT 'Austrelia'
UNION ALL SELECT 'England'
UNION ALL SELECT 'Keneya'
UNION ALL SELECT 'China'
UNION ALL SELECT 'South Africa'
UNION ALL SELECT 'West Indies'

GO

 

Execute the procedure [dbo].[prcCreateInsertScript] to generate insert script for countries listed in Country table having Id as IDENTITY column.

 

EXEC [dbo].[prcCreateInsertScript]  
       @ObjectName = 'Country',   
       @Schema = 'dbo'

 

 

 

You can observe the generated script, it adds "SET IDENTITY_INSERT [dbo].[Country] ON" as very first line to allow insert value into IDENTITY column. Also it sets OFF at the last line as "SET IDENTITY_INSERT [dbo].[Country] OFF" 

Limitations:

All the table datatypes has not been considered while generating insert data script. It may fail for new datatypes. 

Note: Please consider this script as baseline script and use it on your own risk.

Create_Insert_Data_Script.sql (3.34 kb)

Tags: , ,

SQL Server Scripts

Data Warehouse Testing

by Amol 17. March 2011 20:03

What is Data Warehouse?

A data warehouse is a repository where data is stored for analysis, reporting, archival and security purposes. A data warehouse is a system that retrieves data from source systems and puts it into a dimensional data store or a normalized data store. Data warehouse is system capable of having strong Business Intelligence. It is primarily used by the top level management to make business critical decisions. 

Is data warehouse testing important?

Following questions will answer the question.

  • Are decisions based on correct data?
  • Do organization and its employees have sufficient confidence in the new system?
  • How can organization be sure there are no anomalies in the huge amount of migrated data from source to data warehouse?
  • Does it comply with all organization needs?
  • Is sufficient security applied on sensitive data?

To get confident answers for above questions, yes, testing is one of the critical and important part of data warehouse application. 

Challenges in Data Warehouse Testing:

There are many challenges in Data Warehouse testing. Main challenge is due to the huge amount data from heterogeneous data sources.

Other challenges are

  • Complete organization-wide enterprise data knowledge and business knowledge may not be easily feasible.
  • Asynchronously updating of data from heterogeneous sources may result in data inconsistency or may contain partial loading of data into data warehouse.
  • Data quality is not assured at all the sources and no accurate picture of quality of underlying data.
  • 100% data verification will not be feasible unless all extraction, transformation and loading component tested thoroughly to ensure all the data loaded as expected.

Due to such challenges there is difference between normal database testing and data warehouse testing. Below table illustrate difference between database testing and data warehouse testing. 

Database Testing Data Warehouse Testing
Smaller in size Larger in size
Normalized Data De-Normalized data
Usually Homogeneous Data Usually Heterogeneous data is getting loaded.
Usually Data is being tested at source instead of using UI Extraction, Transformation and Loading component are the major one to test
Usually all kind of operations happen Usually read-only operations happen

Data warehouse testing phases:

Following are the main processes involved in data warehouse testing

Requirement Testing:

The main purpose for doing Requirements testing is to check defined requirements for completeness.

For example:

Are the requirements Complete?
Are the requirements Testable?
Are the requirements Ambiguous?
Are the requirements Developable?
Are the requirements Singular?

Unit testing:

Unit testing is usually whitebox testing. It should check the all ETL components,   procedures used in ETL, source and target mappings and transformation for the same, all the scheduled jobs and the reports. This kind of testing is usually done by developers.

Unit testing will involve following

  • Whether the entire source mapping done as per stated and ETL components are accessing right data from right source. 
  • All the data transformations are correct according to the business rules and data warehouse is correctly populated with the transformed data.
  • Testing rejected records that don’t fulfill transformation rules.

Integration testing:

Integration testing involves

  • Initial loading of data into data warehouse.
  • Incremental loading of data into data warehouse.
  • Sequence of extraction objects, transformation objects and loading objects.
  • Sequence of ETLs jobs.
  • Testing rejected records that don’t fulfill transformation rules.
  • Log and error log generation.
  • Record count validation by executing queries against source and target.
  • Data integrity between the various source tables and relationships.
  • Statistical analysis by validation for various calculations.
  • Data Quality Validation to check for missing data.
  • Field by field data verification to check the consistency of source and target data.
  • Validate at the possible lowest granular level. 

Security Testing:

Under the security testing, make sure only the authenticated users should be able to access the resources. Security testing involve 

Confidentiality:
        A security measure which protects against the disclosure of information to parties other than the intended recipient.

Authentication:
        Authentication can be done in many ways like passwords, biometrics, radio frequency identification etc. 

Authorization:
        Authorization determines a requester is allowed to receive or perform an operation.

Performance Testing:

Performance testing verifies that the data warehouse or application can handle the required load and volume. Performance Testing involves

Scalability:
        Will the application handles the expected users load.

Response Time:
        Applications Response time, does the application respond quickly enough under various loads.

Capacity:
        Will the application able to handle defined capacity of data or load.

Throughput:
        For increasing number of concurrent users, high transaction rate should the hardware and software achieves the required throughput without any performance issues.

Acceptance Testing:

Application is tested with full functionality and is expected to function on production.
At the end of UAT, the system should be acceptable to the client for use in terms of ETL process integrity and business functionality and reporting.

Tags: ,

Data Warehouse

Organize Data Warehouse Objects

by Amol 17. March 2011 08:35

Organize Data Warehouse Objects using schema.

Since my last data warehouse project I started using schemas to identify the data warehouse objects in SQL Server database. Earlier I was using suffix or prefix to identify data warehouse objects.  For example DimCustomer, DimProducts for dimension tables and SaleFact or FactSales for fact tables. This is also one of the practices we usually follow in any data warehouse project.

Microsoft introduced schemas in SQL Server, after that I realize this could be another best approach to organize data warehouse objects. So decided to use schemas for different objects in databases according to their role. The objects above would then rename as Dim.Customer, Dim.Products. So all dimesion tables built with schema named "Dim". Similarly schema "Fact" is created to for all fact tables. Then fact tables would rename as Fact.Sales.

Apart from dimension and fact, there are other objects as well so I have started to follow schema based objects and created below schemas to create different kind of objects.

Schema Pupose
Dim

Dim schema used for all the table which are dimensions in the nature.

Fact Fact schema used for all the table which are facts in the nature.
ETL ETL schema is used for all the objects which are used in ETL.
Stg Stg schema is used for all the objects which are used for staging purpose.
Report Report schema is used the objects which are used for reporting purpose.

There are some advantages of using schemas for objects.

  • Schema based security can be implemented easily.
  • Under SSMS, objects are getting sorted alphabetically by schema so its become more handy to find the objects.

Tags: ,

Data Warehouse

Category

Recent Posts

Tag cloud