Overview of computed tables and files
Computed tables and computed files are powerful virtual tables within the Qualytics platform
Key Concepts
Computed Tables
A container created from SQL queries on JDBC datastores, allowing advanced data manipulation (joins, where clauses, etc.).
Computed Files
A container derived from Spark SQL transformations on DFS datastores.
When to Use Computed Tables
- Data Preparation and Transformation: Clean, shape, and restructure raw data from JDBC datastores.
- Complex Calculations and Aggregations: Perform calculations not easily supported by standard containers.
- Data Subsetting: Extract specific data subsets based on filters using SQL's WHERE clause.
- Joining Data Across Datastores: Combine data from multiple JDBC datastores using SQL joins.
When to Use Computed Files
- Data Preparation and Transformation: Clean and restructure data from raw files stored in a DFS.
- Column-Level Transformations: Apply Spark SQL functions to individual columns for data manipulation and cleaning.
- Filtering Data: Create subsets of data within a DFS container using Spark SQL's WHERE clause.
- Important Note: Computed files currently do not support joins or union operations. If these operations are required, consider using a computed table or alternative data transformation techniques.
Key Differences
Feature | Computed Table (JDBC) | Computed File (DFS) |
---|---|---|
Source Data | JDBC Datastores | DFS Datastores |
Query Language | SQL (database-specific functions) | Spark SQL |
Supported Operations | Joins, where clauses, database functions | Column transforms, where clauses (no joins), SparkSQL functions |
Important Notes
- Computed tables/files behave like normal tables. You can profile them, create checks, and detect anomalies.
- Updating a computed table's query triggers a profiling operation.
- Updating a computed file's select clause or where clause triggers a profiling operation.
- Upon creation, a basic profile (max 1000 records) is automatically generated.
Last update:
April 27, 2024