View Document

Source Data Extract Controls Standard

This is the current version of this document. You can provide feedback on this policy document by navigating to the Feedback tab.

Section 1 - Purpose

(1) This document provides requirements on how data is to be extracted from an operational source system. It outlines the need for data to be taken from the system without adjustment and outlines the controls that need to be in place to ensure down-stream data is replicated correctly, in a timely manner, and remains synchronised with the master source.

Top of Page

Section 2 - Authority

(2) Authority for this document is established by the Information Governance Policy.

Top of Page

Section 3 - Scope

(3) This standard applies:

  1. to all staff requesting or providing data from source systems
  2. regardless of the integration style used, such as batch files or APIs.
Top of Page

Section 4 - Standard

Introduction

(4) RMIT has many systems capturing and generating a significant amount of data. This data is often needed by systems or processes across the organisation and externally, outside of the system from which it originates.

(5) Requests for the data have created numerous extracts from the sources systems, most bespoke to meet the consumers’ needs. This leads to significant processing load, as well as entanglement of systems, inhibiting updates of both producers and consumers.

(6) This set of standards and guidelines for source system owners intends to point towards providing a reduced, base set of ‘raw’ data extracts, allowing focus on the original primary intent of the system.

Data Extraction Standards

Data Extract Set

(7) Data extract set design must:

  1. be based on the source system’s data structure, not a consumer’s data requirements (accurate);
  2. ensure all of the system’s business data is included (complete);
  3. ensure there are no overlaps between extracts (efficient); and
  4. not extract data that did not originate in the system (original).

Data Extract

(8) Data extract design must:

  1. be based on the source system’s data structure, not a consumer’s data requirements (accurate);
  2. include all data that naturally fits in the extract at hand, and does not omit data if it adds value (complete);
  3. include all records (unfiltered); and
  4. be natural, not enriched using business rules foreign to the system (not enriched).

Data Movement Controls

(9) All batch data extracts must have the following data movement controls:

  1. extraction process end date-time
  2. data batch record count
  3. file checksum
  4. data batch sequence number or business scheduled date
  5. source system unique identifier
  6. data set time zone identifier.

(10) Delta batch data extracts must also have the following data movement controls:

  1. delta batch data point start date-time
  2. delta batch data point end date-time.