Skip to content

Expected Schema

Definition

Asserts that all of the selected fields must be present in the datastore.

Behavior

The expected schema is the first check to be tested during a scan operation. If it fails, the scan operation will result as Failure with the following message:

<container-name>: Aborted because schema check anomalies were identified.

General Properties

Name Supported
Filter
Allows the targeting of specific data based on conditions
Coverage Customization
Allows adjusting the percentage of records that must meet the rule's conditions

Specific Properties

Specify the fields that must be present in the schema, and determine if a schema change caused by additional fields should fail or pass the assertion.

Name Description
Fields
List of fields that must be presented in the schema.
Allow other fields
If true, then new fields are allowed to be presented in the schema. Otherwise, the assertion will be stricter.

Anomaly Types

Type Supported
Record
Flag inconsistencies at the row level
Shape
Flag inconsistencies in the overall patterns and distributions of a field

Example

Objective: Ensure that expected fields such as L_ORDERKEY, L_PARTKEY, and L_SUPPKEY are always present in the LINEITEM table.

Sample Data

Valid
FIELD_NAME FIELD_TYPE
L_ORDERKEY NUMBER
L_PARTKEY NUMBER
L_SUPPKEY NUMBER
L_LINENUMBER NUMBER
L_QUANTITY NUMBER
L_EXTENDEDPRICE NUMBER
... ...
Invalid

L_SUPPKEY is missing from the schema

FIELD_NAME FIELD_TYPE
L_ORDERKEY NUMBER
L_PARTKEY NUMBER
L_LINENUMBER NUMBER
L_QUANTITY NUMBER
L_EXTENDEDPRICE NUMBER
... ...

Anomaly Explanation

Among the presented sample schemas, the second one is missing one of the expected schema. Only the first schema has the correct expected schema.

graph TD
A[Start] --> B{Check for Field Presence}
B -.->|Field is missing| C[Mark as Shape Anomaly]
B -.->|All fields present| D[End]
-- An illustrative SQL query to check the existence of columns.
select 
    column_name 
from 
    information_schema.columns 
where 
    table_name = 'LINEITEM' and 
    column_name in ('L_ORDERKEY', 'L_PARTKEY', 'L_SUPPKEY');

Potential Violation Messages

Shape Anomaly

The required fields (L_SUPPKEY) are not present.


Last update: February 27, 2024