Time Distribution Size
Definition
Asserts that the count of records for each interval of a timestamp is between two numbers.
In-Depth Overview
The Time Distribution Size
rule helps in identifying irregularities in the distribution of records over time intervals such as hours, days, or months.
For instance, in a retail context, it could ensure that there’s a consistent number of orders each month to meet business targets. A sudden drop in orders might highlight operational issues or shifts in market demand that require immediate attention.
Field Scope
Single: The rule evaluates a single specified field.
Accepted Types
Type | |
---|---|
Timestamp |
|
Date |
Specific Properties
Name | Description |
---|---|
Interval |
Defines the time interval for segmentation. |
Min Count |
Specifies the minimum count of records in each segment. |
Max Count |
Specifies the maximum count of records in each segment. |
Anomaly Types
Type | Supported |
---|---|
Record Flag inconsistencies at the row level |
|
Shape Flag inconsistencies in the overall patterns and distributions of a field |
Example
Objective: Ensure that the number of orders for each month is consistently between 5 and 10.
Sample Data
O_ORDERKEY | O_ORDERDATE |
---|---|
1 | 2023-01-01 |
2 | 2023-01-15 |
3 | 2023-01-20 |
4 | 2023-01-25 |
5 | 2023-02-01 |
6 | 2023-02-05 |
7 | 2023-02-10 |
8 | 2023-02-15 |
9 | 2023-02-20 |
10 | 2023-02-25 |
11 | 2023-02-28 |
Anomaly Explanation
In the sample data above, the January segment fails the rule because there are only 4 orders, which is below the specified minimum count of 5.
graph TD
A[Start] --> B[Retrieve O_ORDERDATE]
B --> C{Segment data by month}
C --> D{Is count of records in each segment between 5 and 10?}
D -->|Yes| E[End]
D -->|No| F[Mark as Anomalous]
F --> E
Potential Violation Messages
Shape Anomaly
50.000% of the monthly segments of O_ORDERDATE
have record counts not between 5 and 10.