Skip to content

confusius.validation

validation

Data validation utilities for confusius.

Modules:

  • coordinates

    Coordinate validation utilities.

  • iq

    IQ data validation utilities.

  • mask

    Mask validation utilities.

  • time_series

    Time series validation utilities.

Functions:

validate_iq

validate_iq(
    iq: DataArray, require_attrs: bool = False
) -> None

Validate that a DataArray contains valid IQ data.

This function performs validation of an IQ DataArray to ensure it meets all requirements for processing with confusius functions. Validation checks include:

  1. Dimensions: The IQ DataArray must have exactly 4 dimensions in the order: (time, z, y, x).
  2. Coordinates: All dimensions must have corresponding coordinates.
  3. Data type: The data must be complex-valued (complex64 or complex128).
  4. Attributes (optional): If require_attrs is True, the DataArray must have the following attributes needed for axial velocity computation:

  5. transmit_frequency: Ultrasound probe central frequency in Hz.

  6. beamforming_sound_velocity: Speed of sound assumed during beamforming in meters per second.

Parameters:

  • iq

    (DataArray) –

    Input DataArray to validate. Must have dimensions (time, z, y, x) and the required structure and attributes.

  • require_attrs

    (bool, default: False ) –

    Whether to validate that all required attributes (transmit_frequency, beamforming_sound_velocity) are present in the DataArray attributes.

Raises:

  • ValueError

    If the DataArray does not have dimensions ("time", "z", "y", "x") or corresponding coordinates, or if required attributes are missing (when require_attrs=True).

  • TypeError

    If the IQ data is not complex-valued.

Examples:

Validate a properly formatted IQ DataArray:

>>> import xarray as xr
>>> import numpy as np
>>> iq = xr.DataArray(
...     np.ones((10, 4, 6, 8), dtype=np.complex64),
...     dims=("time", "z", "y", "x"),
...     coords={
...         "time": np.arange(10),
...         "z": np.arange(4),
...         "y": np.arange(6),
...         "x": np.arange(8),
...     },
...     attrs={
...         "transmit_frequency": 15e6,
...         "beamforming_sound_velocity": 1540.0,
...     },
... )
>>> validate_iq(iq, require_attrs=True)

Skip attribute validation for intermediate processing:

>>> # DataArray missing attributes
>>> iq_no_attrs = xr.DataArray(
...     np.ones((10, 4, 6, 8), dtype=np.complex64),
...     dims=("time", "z", "y", "x"),
...     coords={"time": np.arange(10), "z": np.arange(4),
...             "y": np.arange(6), "x": np.arange(8)},
... )
>>> validate_iq(iq_no_attrs, require_attrs=False)

validate_labels

validate_labels(
    labels: DataArray,
    data: DataArray,
    labels_name: str = "labels",
    rtol: float = 1e-05,
    atol: float = 1e-08,
) -> None

Validate that a label map matches data spatial dimensions and coordinates.

Parameters:

  • labels

    (DataArray) –

    Label map to validate. Must have integer dtype and coordinates must match data. Accepts two formats:

    • Flat label map: Spatial dims only, e.g. (z, y, x). Background voxels labeled 0; each unique non-zero integer identifies a distinct, non-overlapping region. The regions coordinate of the output holds the integer label values.
    • Stacked mask format: Has a leading mask dimension followed by spatial dims, e.g. (mask, z, y, x). Each layer has values in {0, region_id} and regions may overlap. The region coordinate of the output holds the mask coordinate values (e.g., region label).
  • data

    (DataArray) –

    Data array to validate labels against.

  • labels_name

    (str, default: "labels" ) –

    Name of the labels parameter (used in error messages).

  • rtol

    (float, default: 1e-5 ) –

    Relative tolerance for coordinate comparison.

  • atol

    (float, default: 1e-8 ) –

    Absolute tolerance for coordinate comparison.

Raises:

  • TypeError

    If labels is not an integer dtype DataArray.

  • ValueError

    If labels dimensions don't match data or if coordinates don't match.

validate_mask

validate_mask(
    mask: DataArray,
    data: DataArray,
    mask_name: str = "mask",
    rtol: float = 1e-05,
    atol: float = 1e-08,
) -> None

Validate that a mask matches data spatial dimensions and coordinates.

Parameters:

  • mask

    (DataArray) –

    Mask to validate. Must have boolean dtype, or integer dtype with exactly one non-zero value (0 = background, one region id = foreground). The latter format is produced by Atlas.get_masks. Coordinates must match data.

  • data

    (DataArray) –

    Data array to validate mask against.

  • mask_name

    (str, default: "mask" ) –

    Name of the mask parameter (used in error messages).

  • rtol

    (float, default: 1e-5 ) –

    Relative tolerance for coordinate comparison.

  • atol

    (float, default: 1e-8 ) –

    Absolute tolerance for coordinate comparison.

Raises:

  • TypeError

    If mask is not a boolean or single-label integer DataArray.

  • ValueError

    If mask dimensions don't match data or if coordinates don't match.

validate_matching_coordinates

validate_matching_coordinates(
    left: DataArray,
    right: DataArray,
    coord_names: Hashable
    | Iterable[Hashable]
    | None = None,
    *,
    left_name: str = "left array",
    right_name: str = "right array",
    rtol: float = 1e-05,
    atol: float = 1e-08,
) -> None

Validate that selected coordinates match between two DataArrays.

Comparison is performed on coordinate values rather than the full coordinate DataArray, so unrelated attached coordinates do not cause false mismatches. Numeric coordinates are compared with tolerance to accommodate harmless floating-point drift (for example after serialization and reload). Non-numeric coordinates are compared exactly.

Parameters:

  • left

    (DataArray) –

    First array to compare.

  • right

    (DataArray) –

    Second array to compare.

  • coord_names

    (Hashable | Iterable[Hashable] | None, default: None ) –

    Coordinate names to compare. If not specified, all shared dimension coordinates are checked.

  • left_name

    (str, default: "left array" ) –

    Label used for left in error messages. Override with a context-specific name (e.g. "run 0", "map 0") for more actionable errors.

  • right_name

    (str, default: "right array" ) –

    Label used for right in error messages.

  • rtol

    (float, default: 1e-5 ) –

    Relative tolerance used for numeric coordinate comparison.

  • atol

    (float, default: 1e-8 ) –

    Absolute tolerance used for numeric coordinate comparison.

Raises:

  • ValueError

    If a requested coordinate is missing or if coordinates do not match.

validate_time_series

validate_time_series(
    time_series: DataArray,
    operation_name: str,
    check_time_chunks: bool = True,
) -> int

Validate time series for time series processing operations.

Performs common validation checks:

  1. Time series have a time dimension.
  2. Time dimension has more than 1 timepoint.
  3. Time dimension is not chunked for Dask arrays (optional).

Parameters:

  • time_series

    (DataArray) –

    Input time series to validate. Must have a time dimension.

  • operation_name

    (str) –

    Name of the operation (used in error/warning messages).

  • check_time_chunks

    (bool, default: True ) –

    Whether to raise an error when time dimension is chunked in a Dask array. Set to False for operations that can handle chunked time (e.g., confusius.signal.standardize).

Returns:

  • int

    Axis number for the time dimension.

Raises:

  • ValueError

    If time_series has no time dimension, if the time dimension has only 1 timepoint, or if the time dimension is chunked in a Dask array (when check_time_chunks=True).