confusius.validation¶
validation ¶
Data validation utilities for confusius.
Modules:
-
coordinates–Coordinate validation utilities.
-
fusi–Validation helpers for ConfUSIus-style fUSI DataArrays.
-
iq–IQ data validation utilities.
-
mask–Mask validation utilities.
-
time_series–Time series validation utilities.
Functions:
-
validate_fusi_dataarray–Validate that a DataArray follows ConfUSIus fUSI conventions.
-
validate_iq_dataarray–Validate that a DataArray contains valid IQ data.
-
validate_labels–Validate that a label map matches data spatial dimensions and coordinates.
-
validate_mask–Validate that a mask matches data spatial dimensions and coordinates.
-
validate_matching_coordinates–Validate that selected coordinates match between two DataArrays.
-
validate_time_series–Validate time series for time series processing operations.
validate_fusi_dataarray ¶
validate_fusi_dataarray(
data: DataArray,
*,
require_time: bool = False,
allow_pose: bool = True,
allow_extra_dims: bool = True,
minimum_spatial_dims: int = 2,
require_regular_spacing: bool = False,
regular_spacing_tolerance: float = 0.01,
regular_spacing_dims: RegularSpacingDims = "space",
require_canonical_dim_order: bool = False,
require_spatial_voxdim: bool = False,
require_spatial_units: bool = False,
require_time_units: bool = False,
) -> None
Validate that a DataArray follows ConfUSIus fUSI conventions.
A valid fUSI DataArray must:
- Have dimension names from the set
(time, pose, z, y, x), with optional extra dimensions ifallow_extra_dimsisTrue(e.g.,region,component,mask, etc.). - Have matching 1D coordinates for all core dimensions (
time,pose,z,y,x). Extra-dimension coordinates are optional. - Have numeric, finite, and strictly increasing core dimension coordinates (
time,pose,z,y,x).
Additional requirements can be enforced using the function parameters.
Parameters:
-
(data¶DataArray) –DataArray to validate.
-
(require_time¶bool, default:False) –Whether to require a
timedimension. -
(allow_pose¶bool, default:True) –Whether to allow a
posedimension. -
(allow_extra_dims¶bool, default:True) –Whether dimensions outside the ConfUSIus core set (
time,pose,z,y,x) are allowed. -
(minimum_spatial_dims¶int, default:2) –Minimum number of spatial dimensions from
("z", "y", "x")required in the DataArray. -
(require_regular_spacing¶bool, default:False) –Whether numeric dimension coordinates must have regular spacing.
-
(regular_spacing_tolerance¶float, default:1e-2) –Relative tolerance used to assess coordinate regularity.
-
(regular_spacing_dims¶('space', 'core', 'all'), default:"space") –Dimensions that must satisfy regular-spacing checks when
require_regular_spacing=True. Use"space"for presentz,y,xdimensions,"core"for present core dimensions (time,pose,z,y,x),"all"for all present dimensions, a string for one explicit dimension name, or a sequence for multiple explicit dimension names. Non-numeric coordinates are ignored. -
(require_canonical_dim_order¶bool, default:False) –Whether the ConfUSIus core dimensions present in the DataArray must appear in canonical relative order
(time, pose, z, y, x). -
(require_spatial_voxdim¶bool, default:False) –Whether present spatial coordinates must define a
voxdimattribute. -
(require_spatial_units¶bool, default:False) –Whether present spatial coordinates must define a
unitsattribute. -
(require_time_units¶bool, default:False) –Whether the
timecoordinate must define aunitsattribute when present.
Raises:
-
TypeError–If
datais not anxarray.DataArray. -
ValueError–If dimension names are invalid, required dimensions or coordinates are missing, there are too few spatial dimensions, core numeric coordinate constraints fail, optional stricter checks fail, or required metadata is missing.
validate_iq_dataarray ¶
validate_iq_dataarray(
iq: DataArray, require_attrs: bool = False
) -> None
Validate that a DataArray contains valid IQ data.
This function performs validation of an IQ DataArray to ensure it meets all requirements for processing with ConfUSIus functions. Validation checks include:
- Dimensions: The IQ DataArray must have exactly 4 dimensions in the
order:
(time, z, y, x). - Coordinates: All dimensions must have corresponding coordinates.
- Data type: The data must be complex-valued (
complex64orcomplex128). -
Attributes (optional): If
require_attrsisTrue, the DataArray must have the following attributes needed for axial velocity computation: -
transmit_frequency: Ultrasound probe central frequency in Hz. beamforming_sound_velocity: Speed of sound assumed during beamforming in meters per second.
Parameters:
-
(iq¶DataArray) –Input DataArray to validate. Must have dimensions
(time, z, y, x)and the required structure and attributes. -
(require_attrs¶bool, default:False) –Whether to validate that all required attributes (
transmit_frequency,beamforming_sound_velocity) are present in the DataArray attributes.
Raises:
-
ValueError–If the DataArray does not have dimensions
(time, z, y, x), if required coordinates are missing, or if required attributes are missing whenrequire_attrs=True. -
TypeError–If the IQ data is not complex-valued.
Examples:
Validate a properly formatted IQ DataArray:
>>> import numpy as np
>>> import xarray as xr
>>> iq = xr.DataArray(
... np.ones((10, 4, 6, 8), dtype=np.complex64),
... dims=("time", "z", "y", "x"),
... coords={
... "time": np.arange(10),
... "z": np.arange(4),
... "y": np.arange(6),
... "x": np.arange(8),
... },
... attrs={
... "transmit_frequency": 15e6,
... "beamforming_sound_velocity": 1540.0,
... },
... )
>>> validate_iq_dataarray(iq, require_attrs=True)
Skip attribute validation for intermediate processing:
validate_labels ¶
validate_labels(
labels: DataArray,
data: DataArray,
labels_name: str = "labels",
rtol: float = 1e-05,
atol: float = 1e-08,
) -> None
Validate that a label map matches data spatial dimensions and coordinates.
Parameters:
-
(labels¶DataArray) –Label map to validate. Must have integer dtype and coordinates must match data. Accepts two formats:
- Flat label map: Spatial dims only, e.g.
(z, y, x). Background voxels labeled0; each unique non-zero integer identifies a distinct, non-overlapping region. Theregionscoordinate of the output holds the integer label values. - Stacked mask format: Has a leading
maskdimension followed by spatial dims, e.g.(mask, z, y, x). Each layer has values in{0, region_id}and regions may overlap. Theregioncoordinate of the output holds themaskcoordinate values (e.g., region label).
- Flat label map: Spatial dims only, e.g.
-
(data¶DataArray) –Data array to validate labels against.
-
(labels_name¶str, default:"labels") –Name of the labels parameter (used in error messages).
-
(rtol¶float, default:1e-5) –Relative tolerance for coordinate comparison.
-
(atol¶float, default:1e-8) –Absolute tolerance for coordinate comparison.
Raises:
-
TypeError–If
labelsis not an integer dtype DataArray. -
ValueError–If
labelsdimensions don't matchdataor if coordinates don't match.
validate_mask ¶
validate_mask(
mask: DataArray,
data: DataArray,
mask_name: str = "mask",
rtol: float = 1e-05,
atol: float = 1e-08,
require_exact_dims: bool = False,
) -> None
Validate that a mask matches data spatial dimensions and coordinates.
Parameters:
-
(mask¶DataArray) –Mask to validate. Must have boolean dtype, or integer dtype with exactly one non-zero value (0 = background, one region id = foreground). The latter format is produced by
Atlas.get_masks. Coordinates must match data. -
(data¶DataArray) –Data array to validate mask against.
-
(mask_name¶str, default:"mask") –Name of the mask parameter (used in error messages).
-
(rtol¶float, default:1e-5) –Relative tolerance for coordinate comparison.
-
(atol¶float, default:1e-8) –Absolute tolerance for coordinate comparison.
-
(require_exact_dims¶bool, default:False) –Whether
mask.dimsmust match all non-timedimensions ofdatain the same order.
Raises:
-
TypeError–If
maskis not a boolean or single-label integer DataArray. -
ValueError–If
maskdimensions don't matchdataor if coordinates don't match.
validate_matching_coordinates ¶
validate_matching_coordinates(
left: DataArray,
right: DataArray,
coord_names: Hashable
| Iterable[Hashable]
| None = None,
*,
left_name: str = "left array",
right_name: str = "right array",
rtol: float = 1e-05,
atol: float = 1e-08,
) -> None
Validate that selected coordinates match between two DataArrays.
Comparison is performed on coordinate values rather than the full coordinate
DataArray, so unrelated attached coordinates do not cause false mismatches.
Numeric coordinates are compared with tolerance to accommodate harmless
floating-point drift (for example after serialization and reload). Non-numeric
coordinates are compared exactly.
Parameters:
-
(left¶DataArray) –First array to compare.
-
(right¶DataArray) –Second array to compare.
-
(coord_names¶Hashable or Iterable[Hashable], default:None) –Coordinate names to compare. If not provided, all shared dimension coordinates are checked.
-
(left_name¶str, default:"left array") –Label used for
leftin error messages. Override with a context-specific name (e.g."run 0","map 0") for more actionable errors. -
(right_name¶str, default:"right array") –Label used for
rightin error messages. -
(rtol¶float, default:1e-5) –Relative tolerance used for numeric coordinate comparison.
-
(atol¶float, default:1e-8) –Absolute tolerance used for numeric coordinate comparison.
Raises:
-
ValueError–If a requested coordinate is missing or if coordinates do not match.
validate_time_series ¶
validate_time_series(
time_series: DataArray,
operation_name: str,
check_time_chunks: bool = True,
) -> int
Validate time series for time series processing operations.
Performs common validation checks:
- Time series have a
timedimension. - Time dimension has more than 1 timepoint.
- Time dimension is not chunked for Dask arrays (optional).
Parameters:
-
(time_series¶DataArray) –Input time series to validate. Must have a
timedimension. -
(operation_name¶str) –Name of the operation (used in error/warning messages).
-
(check_time_chunks¶bool, default:True) –Whether to raise an error when time dimension is chunked in a Dask array. Set to
Falsefor operations that can handle chunked time (e.g.,confusius.signal.standardize).
Returns:
-
int–Axis number for the
timedimension.
Raises:
-
ValueError–If
time_serieshas notimedimension, if thetimedimension has only 1 timepoint, or if thetimedimension is chunked in a Dask array (whencheck_time_chunks=True).