Skip to main content
Version: Next

Vertex AI

Testing

Important Capabilities

CapabilityStatusNotes
DescriptionsExtract descriptions for Vertex AI Registered Models and Model Versions

Ingesting metadata from VertexAI requires using the Vertex AI module.

Prerequisites

Please refer to the Vertex AI documentation for basic information on Vertex AI.

Credentials to access to GCP

Please read the section to understand how to set up application default Credentials to GCP GCP docs.

Create a service account and assign roles

  1. Setup a ServiceAccount as per GCP docs and assign the previously created role to this service account.

  2. Download a service account JSON keyfile.

    • Example credential file:
    {
    "type": "service_account",
    "project_id": "project-id-1234567",
    "private_key_id": "d0121d0000882411234e11166c6aaa23ed5d74e0",
    "private_key": "-----BEGIN PRIVATE KEY-----\nMIIyourkey\n-----END PRIVATE KEY-----",
    "client_email": "test@suppproject-id-1234567.iam.gserviceaccount.com",
    "client_id": "113545814931671546333",
    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
    "token_uri": "https://oauth2.googleapis.com/token",
    "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
    "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/test%suppproject-id-1234567.iam.gserviceaccount.com"
    }
  3. To provide credentials to the source, you can either:

  • Set an environment variable:

    $ export GOOGLE_APPLICATION_CREDENTIALS="/path/to/keyfile.json"

    or

  • Set credential config in your source based on the credential json file. For example:

    credential:
    private_key_id: "d0121d0000882411234e11166c6aaa23ed5d74e0"
    private_key: "-----BEGIN PRIVATE KEY-----\nMIIyourkey\n-----END PRIVATE KEY-----\n"
    client_email: "test@suppproject-id-1234567.iam.gserviceaccount.com"
    client_id: "123456678890"

Integration Details

Ingestion Job extract Models, Datasets, Training Jobs, Endpoints, Experiment and Experiment Runs in a given project and region on Vertex AI.

Concept Mapping

This ingestion source maps the following Vertex AI Concepts to DataHub Concepts:

Source ConceptDataHub ConceptNotes
ModelMlModelGroupThe name of a Model Group is the same as Model's name. Model serve as containers for multiple versions of the same model in Vertex AI.
Model VersionMlModelThe name of a Model is {model_name}_{model_version} (e.g. my_vertexai_model_1 for model registered to Model Registry or Deployed to Endpoint. Each Model Version represents a specific iteration of a model with its own metadata.
Dataset

DatasetA Managed Dataset resource in Vertex AI is mapped to Dataset in DataHub.
Supported types of datasets include (Text, Tabular, Image Dataset, Video, TimeSeries)
Training JobDataProcessInstanceA Training Job is mapped as DataProcessInstance in DataHub.
Supported types of training jobs include (AutoMLTextTrainingJob, AutoMLTabularTrainingJob, AutoMLImageTrainingJob, AutoMLVideoTrainingJob, AutoMLForecastingTrainingJob, Custom Job, Custom TrainingJob, Custom Container TrainingJob, Custom Python Packaging Job )
ExperimentContainerExperiments organize related runs and serve as logical groupings for model development iterations. Each Experiment is mapped to a Container in DataHub.
Experiment RunDataProcessInstanceAn Experiment Run represents a single execution of a ML workflow. An Experiment Run tracks ML parameters, metricis, artifacts and metadata
ExecutionDataProcessInstanceMetadata Execution resource for Vertex AI. Metadata Execution is started in a experiment run and captures input and output artifacts.

Vertex AI Concept Diagram:

Lineage

Lineage is emitted using Vertex AI API to capture the following relationships:

  • A training job and a model (which training job produce a model)
  • A dataset and a training job (which dataset was consumed by a training job to train a model)
  • Experiment runs and an experiment
  • Metadata execution and an experiment run

CLI based Ingestion

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: vertexai
config:
project_id: "acryl-poc"
region: "us-west2"
# You must either set GOOGLE_APPLICATION_CREDENTIALS or provide credential as shown below
# credential:
# private_key: '-----BEGIN PRIVATE KEY-----\\nprivate-key\\n-----END PRIVATE KEY-----\\n'
# private_key_id: "project_key_id"
# client_email: "client_email"
# client_id: "client_id"

sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
project_id 
string
Project ID in Google Cloud Platform
region 
string
Region of your project in Google Cloud Platform
bucket_uri
string
Bucket URI used in your project
vertexai_url
string
VertexUI URI
env
string
The environment that all assets produced by this connector belong to
Default: PROD
credential
GCPCredential
GCP credential information
credential.client_email 
string
Client email
credential.client_id 
string
Client Id
credential.private_key 
string
Private key in a form of '-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n'
credential.private_key_id 
string
Private key id
credential.auth_provider_x509_cert_url
string
Auth provider x509 certificate url
credential.auth_uri
string
Authentication uri
credential.client_x509_cert_url
string
If not set it will be default to https://www.googleapis.com/robot/v1/metadata/x509/client_email
credential.project_id
string
Project id to set the credentials
credential.token_uri
string
Token uri
credential.type
string
Authentication type
Default: service_account

Code Coordinates

  • Class Name: datahub.ingestion.source.vertexai.vertexai.VertexAISource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for Vertex AI, feel free to ping us on our Slack.