Skip to main content

About the Discovery API

Every time dbt Cloud runs a project, it generates and stores information about the project. The metadata includes details about your project’s models, sources, and other nodes along with their execution results. With the dbt Cloud Discovery API, you can query this comprehensive information to gain a better understanding of your DAGA DAG is a Directed Acyclic Graph, a type of graph whose nodes are directionally related to each other and don’t form a directional closed loop. and the data it produces.

By leveraging the metadata in dbt Cloud, you can create systems for data monitoring and alerting, lineage exploration, and automated reporting. This can help you improve data discovery, data quality, and pipeline operations within your organization.

You can access the Discovery API through ad hoc queries, custom applications, a wide range of partner ecosystem integrations (like BI/analytics, catalog and governance, and quality and observability), and by using dbt Cloud features like model timing and [data health tiles]/(docs/collaborate/data-tile).

A rich ecosystem for integration A rich ecosystem for integration

You can query the dbt Cloud metadata:

  • At the environment level for both the latest state (use the environment endpoint) and historical run results (use modelByEnvironment) of a dbt Cloud project in production.
  • At the job level for results on a specific dbt Cloud job run for a given resource type, like models or test.

Prerequisites

What you can use the Discovery API for

Click the following tabs to learn more about the API's use cases, the analysis you can do, and the results you can achieve by integrating with it.

To use the API directly or integrate your tool with it, refer to Uses case and examples for detailed information.

Use the API to look at historical information like model build time to determine the health of your dbt projects. Finding inefficiencies in orchestration configurations can help decrease infrastructure costs and improve timeliness. To learn more about how to do this, refer to Performance.

You can use, for example, the model timing tab to help identify and optimize bottlenecks in model builds:

Model timing visualization in dbt CloudModel timing visualization in dbt Cloud

Types of project state

There are two types of project state at the environment level that you can query the results of:

  • Definition — The logical state of a dbt project’s resources that update when the project is changed.
  • Applied — The output of successful dbt DAG execution that creates or describes the state of the database (for example: dbt run, dbt test, source freshness, and so on)

These states allow you to easily examine the difference between a model’s definition and its applied state so you can get answers to questions like, did the model run? or did the run fail? Applied models exist as a table/view in the data platform given their most recent successful run.

0