Examples - DocDB REST API

This page provides examples to interact with the Document Database (DocDB) REST API using the provided Python client.

Querying Metadata

Note

There are a several indexed fields in the DocDB metadata collection to support efficient query execution by limiting the number of documents to be scanned. Please use these fields in your filter queries whenever possible.

Currently, the following fields are indexed in the v1 (metadata_index/data_assets) collection: _id, name, location, subject.subject_id, data_description.project_name, and data_description.modality.abbreviation.

The same fields are indexed in the v2 (metadata_index_v2/data_assets) collection, other than spelling for the modality field (data_description.modalities.abbreviation).

Count Example 1: Get # of records with a certain subject_id

import json

from aind_data_access_api.document_db import MetadataDbClient

API_GATEWAY_HOST = "api.allenneuraldynamics.org"
# Default database and collection names are set in the client
# To override the defaults, provide the database and collection
# parameters in the constructor

docdb_api_client = MetadataDbClient(
    host=API_GATEWAY_HOST,
)

filter = {"subject.subject_id": "731015"}
count = docdb_api_client._count_records(
    filter_query=filter,
)
print(count)

Filter Example 1: Get records with a certain subject_id

filter = {"subject.subject_id": "731015"}
records = docdb_api_client.retrieve_docdb_records(
    filter_query=filter,
)
print(json.dumps(records, indent=3))

With projection (recommended):

filter = {"subject.subject_id": "731015"}
projection = {
    "name": 1,
    "created": 1,
    "location": 1,
    "subject.subject_id": 1,
    "subject.date_of_birth": 1,
}
records = docdb_api_client.retrieve_docdb_records(
    filter_query=filter,
    projection=projection,
)
print(json.dumps(records, indent=3))

Filter Example 2: Get records with a certain breeding group

filter = {
    "subject.breeding_info.breeding_group": "Slc17a6-IRES-Cre;Ai230-hyg(ND)"
}
records = docdb_api_client.retrieve_docdb_records(
    filter_query=filter
)
print(json.dumps(records, indent=3))

With projection (recommended):

filter = {
    "subject.breeding_info.breeding_group": "Slc17a6-IRES-Cre;Ai230-hyg(ND)"
}
projection = {
    "name": 1,
    "created": 1,
    "location": 1,
    "subject.subject_id": 1,
    "subject.breeding_info.breeding_group": 1,
}
records = docdb_api_client.retrieve_docdb_records(
    filter_query=filter,
    projection=projection,
)
print(json.dumps(records, indent=3))

Aggregation Example 1: Get all subjects per breeding group

agg_pipeline = [
    {
        "$group": {
            "_id": "$subject.breeding_info.breeding_group",
            "subject_ids": {"$addToSet": "$subject.subject_id"},
            "count": {"$sum": 1},
        }
    }
]
result = docdb_api_client.aggregate_docdb_records(
    pipeline=agg_pipeline
)
print(f"Total breeding groups: {len(result)}")
print("First 3 breeding groups and corresponding subjects:")
print(json.dumps(result[:3], indent=3))

Aggregation Example 2: Fetch records by filter list

A utility method is provided in the client to help with fetching records that match any value in a list of subject IDs.

records = docdb_api_client.fetch_records_by_filter_list(
    filter_key="subject.subject_id",
    filter_values=["731015", "741137", "789012"],
    projection={
        "name": 1,
        "location": 1,
        "subject.subject_id": 1,
        "data_description.project_name": 1,
    },
)
print(f"Found {len(records)} records. First 3 records:")
print(json.dumps(records[:3], indent=3))

For more info about aggregations, please see MongoDB documentation: https://www.mongodb.com/docs/manual/aggregation/

Advanced Example: Custom Session Object

It’s possible to attach a custom Session to retry certain requests errors:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util import Retry

from aind_data_access_api.document_db import MetadataDbClient

API_GATEWAY_HOST = "api.allenneuraldynamics.org"

retry = Retry(
    total=5,
    backoff_factor=1,
    status_forcelist=[429, 500, 502, 503, 504],
    allowed_methods=["GET", "POST", "DELETE"],
)
adapter = HTTPAdapter(max_retries=retry)
session = requests.Session()
session.mount("https://", adapter)

with MetadataDbClient(
    host=API_GATEWAY_HOST,
    session=session,
) as docdb_api_client:
    records = docdb_api_client.retrieve_docdb_records(limit=10)

Updating Metadata

Note

Manual updates to metadata in DocDB should be made through the aind-data-migration-scripts repository to track changes. Please see the README in that repository for instructions.

For special use cases, a minimal example of updating metadata directly using the API is provided below.

  1. Permissions: Request permissions for AWS Credentials to write to DocDB through the API Gateway. Note that the asset de/registration endpoints are intended for administrative use and require elevated AWS credentials/permissions.

  2. Query DocDB: Filter for the records you want to update.

  3. Update DocDB: Use upsert_one_docdb_record or upsert_list_of_docdb_records to update the records.

Note

Records must be read and written as dictionaries from DocDB (not Pydantic models).

For example, to update the “instrument” and “session” metadata of a record in DocDB:

# filter for records you want to update
records = docdb_api_client.retrieve_docdb_records(
    filter_query=filter,
    projection=projection, # recommended
)
print(f"Found {len(records)} records in DocDB matching filter.")

for record in records:
    # NOTE: provide core metadata as dictionaries
    # e.g. update some field from the queried result
    instrument = record["instrument"] # dictionary
    instrument["instrument_type"] = "New Instrument Type"
    # e.g. replace entirely from file
    with open(INSTRUMENT_FILE_PATH, "r") as f:
        instrument = json.load(f)
    # e.g. convert Pydantic model to dictionary
    session = session_model.model_dump()

    # update record in docdb
    record_update = {
        "_id": record["_id"],
        "instrument": instrument,
        "session": session
    }
    response = docdb_api_client.upsert_one_docdb_record(
        record=record_update
    )
    print(response.json())

You can also make updates to individual nested fields:

record_update = {
    "_id": record["_id"],
    "data_description.project_name": project_name, # nested field
}

response = docdb_api_client.upsert_one_docdb_record(
    record=record_update
)
print(response.json())

Note

While DocumentDB supports fieldnames with special characters (“$” and “.”), they are not recommended. There may be issues querying or updating these fields.

It is recommended to avoid these special chars in dictionary keys. E.g. {"abc.py": "data"} can be written as {"filename": "abc.py", "some_file_property": "data"} instead.