Skip to content

Overview of a datastore

  • A Datastore can be any Apache Spark-compatible data source, such as:

    1. Traditional RDBMS.
    2. Raw files (CSV, XLSX, JSON, Avro, Parquet) on:
      1. AWS S3.
      2. Azure Blob Storage.
      3. GCP Cloud Storage.
  • A Datastore is a medium holding structured data. Qualytics supports Spark-compatible Datastores via the conceptual layers depicted below

    Screenshot


Configuration

  • The first step of configuring a Qualytics instance is to Add New Datastore:

    • In the main menu, select Datastores tab
    • Click on Add New Datastore button:

    • Screenshot Screenshot

    Screenshot Screenshot

    Info

    • A datastore can be any Apache Spark-compatible data source:

      1. traditional RDBMS,
      2. raw files (CSV, XLSX, JSON, Avro, Parquet etc...) on :
        1. AWS S3.
        2. Azure Blob Storage.
        3. GCP Cloud Storage

    Screenshot Screenshot

Credentials

  • Configuring a datastore will require you to enter configuration credentials dependent upon each datastore. Here is an example of a Snowflake datastore being added:

    Screenshot Screenshot

  • When a datastore is added, it’ll be populated in the home screen along with other datastores:

    Screenshot Screenshot

  • Clicking into a datastore will guide the user through the capabilities and operations of the platform.

When a user configures a datastore for the first time, they’ll see an empty Activity tab.

Screenshot Screenshot

  • Heatmap view

Screenshot Screenshot

Running a Catalog of the Datastore

  • The first operation of Catalog will automatically kick off. You can see this through the Activity tab.

    • This operation typically takes a short amount of time to complete.
    • After this is completed, they’ll need to run a Profile operation (under Run -> Profile) to generate metadata and infer data quality checks.

    Screenshot Screenshot

    Screenshot Screenshot


Last update: April 27, 2024