Skip to content

Step 3. Scan Operation

The Scan Operation is executed on a Datastore to assert the data quality checks defined for the named collections of data (e.g. tables, views, files, topics) within it. The operation will:

  • produce a record anomaly for any record where anomalous values are detected
  • produce a shape anomaly for anomalous values that span multiple records
  • record the anomaly data along with related analysis in the associated Enrichment Datastore

Info

To assert data quality checks to find anomalies, a user needs to perform a Scan operation.

  • Scan operation enables the user to assert the checks in incremental vs full loads, with options to limit the number of records, run a scan on a select list of tables / files, and to set schedules for future scans.

Operation Configuration

Screenshot Screenshot

A Scan Operation can be configured with the following options:

  • Full - To process all records ignoring the previous scan.
  • Incremental - To scan only new data updated since the previous scan.
  • Record limit - To limit the total number of records scanned.

  • Target selection

    • You can select to all tables.
    • To target only a subset of the available named collections.

    Screenshot Screenshot

  • Remediation strategy

    • To specify how enrichment tables should be migrated to reflect changes in source tables.

    Screenshot Screenshot

  • There's also an option to schedule the operation by:

    • Hourly
    • Daily
    • Weekly
    • Monthly
    • Advanced
      • Cron job expression

Screenshot Screenshot


Last update: September 29, 2023