arolariu.Backend.Domain.Invoices.Brokers.AnalysisBrokers.IdentifierBroker
arolariu.Backend.Domain.Invoicesβ
arolariu.Backend.Domain.Invoices.Brokers.AnalysisBrokers.IdentifierBroker Namespaceβ
Classesβ
AzureFormRecognizerBroker Classβ
Azure Document Intelligence concrete broker that performs best-effort OCR + structural extraction over invoice scans and projects recognized signals (merchant, products, payment) into a domain aggregate.
public sealed class AzureFormRecognizerBroker : arolariu.Backend.Domain.Invoices.Brokers.AnalysisBrokers.IdentifierBroker.IFormRecognizerBroker
Inheritance System.Object π‘ AzureFormRecognizerBroker
Implements IFormRecognizerBroker
Remarksβ
Role (Broker Standard): Implements IFormRecognizerBroker by delegating to Azure.AI.DocumentIntelligence.DocumentIntelligenceClient
(prebuilt receipt model: prebuilt-receipt). It performs ONLY external service invocation + minimal mapping. No:
domain validation, retry policy, logging, metrics, authorization, enrichment chaining, or persistence.
Lifecycle: Stateless wrapper around a single Azure.AI.DocumentIntelligence.DocumentIntelligenceClient instance (thread-safe). Scoped lifetime registration is acceptable; underlying client could be promoted to singleton if connection reuse optimization is required.
Resilience: Lets Azure SDK exceptions bubble (network / 429 / service faults) for higher-layer classification (retry / circuit breaker). Partial extraction failures (missing fields, unexpected field types) are tolerated silently β unrecognized values remain at sentinel defaults.
Security: Uses Azure.AzureKeyCredential with the Cognitive Services API key from application configuration. Production deployments should rotate keys regularly and use Azure Key Vault for secret management.
Output Model Fidelity: Mapping intentionally narrow: only fields required for initial enrichment pipeline are projected. Backlog: field provenance (confidence, bounding boxes) exposure for advanced UI / validation workflows.
Performance: Dominated by service round-trip latency and image size. Caller SHOULD parallelize at orchestration layer for bulk imports and consider idempotent hashing to skip duplicate scans.
Backlog: Cancellation token support, adaptive model routing (custom vs prebuilt), multi-page invoices, locale normalization, normalization of currency codes, confidence threshold filtering, telemetry decorators.
Constructorsβ
AzureFormRecognizerBroker(IOptionsManager) Constructorβ
Initializes the broker with configured Azure Cognitive Services (Document Intelligence) endpoint credentials.
public AzureFormRecognizerBroker(arolariu.Backend.Common.Options.IOptionsManager optionsManager);
Parametersβ
optionsManager arolariu.Backend.Common.Options.IOptionsManager
Abstraction providing strongly typed application options (endpoint and API key credentials).
Exceptionsβ
System.ArgumentNullException
Thrown when optionsManager is null.
Remarksβ
Builds a single Azure.AI.DocumentIntelligence.DocumentIntelligenceClient using Azure.AzureKeyCredential. The API key is sourced from application configuration via arolariu.Backend.Common.Options.ApplicationOptions.CognitiveServicesKey. Throws fast on null dependency to fail early in composition root.
No network calls are made during construction; the client performs lazy connection initialization on first request.
Methodsβ
AzureFormRecognizerBroker.ExtractCurrencyInformation(DocumentFieldDictionary) Methodβ
Extracts currency information from document fields.
private static (System.Nullable<arolariu.Backend.Common.DDD.ValueObjects.Currency> Currency,decimal Amount) ExtractCurrencyInformation(Azure.AI.DocumentIntelligence.DocumentFieldDictionary photoFields);
Parametersβ
photoFields Azure.AI.DocumentIntelligence.DocumentFieldDictionary
The document fields from OCR analysis.
Returnsβ
<System.Nullable<arolariu.Backend.Common.DDD.ValueObjects.Currency>,System.Decimal>
A tuple of the extracted Currency (or null) and the amount.
AzureFormRecognizerBroker.ExtractMonetaryValue(DocumentFieldDictionary, string) Methodβ
Extracts a monetary value from item fields, handling both Double and Currency field types.
private static decimal ExtractMonetaryValue(Azure.AI.DocumentIntelligence.DocumentFieldDictionary itemFields, string fieldName);
Parametersβ
itemFields Azure.AI.DocumentIntelligence.DocumentFieldDictionary
Dictionary of item fields.
fieldName System.String
Name of the field to extract.
Returnsβ
System.Decimal
The extracted decimal value, or 0 if not found.
AzureFormRecognizerBroker.ExtractProductFromItemFields(DocumentFieldDictionary) Methodβ
Extracts a single product from the item dictionary fields.
private static arolariu.Backend.Domain.Invoices.DDD.ValueObjects.Products.Product ExtractProductFromItemFields(Azure.AI.DocumentIntelligence.DocumentFieldDictionary itemFields);
Parametersβ
itemFields Azure.AI.DocumentIntelligence.DocumentFieldDictionary
Dictionary of item fields from the prebuilt-receipt model.
Returnsβ
Product
A populated Product instance with extracted field values.
AzureFormRecognizerBroker.ExtractTimeSpan(DocumentField) Methodβ
Extracts a TimeSpan from a transaction time field, handling various field types.
private static System.TimeSpan ExtractTimeSpan(Azure.AI.DocumentIntelligence.DocumentField transactionTimeField);
Parametersβ
transactionTimeField Azure.AI.DocumentIntelligence.DocumentField
The transaction time field from OCR.
Returnsβ
System.TimeSpan
The extracted TimeSpan, or TimeSpan.Zero if parsing fails.
AzureFormRecognizerBroker.ExtractTotalAmount(DocumentFieldDictionary) Methodβ
Extracts the total amount from document fields.
private static decimal ExtractTotalAmount(Azure.AI.DocumentIntelligence.DocumentFieldDictionary photoFields);
Parametersβ
photoFields Azure.AI.DocumentIntelligence.DocumentFieldDictionary
The document fields from OCR analysis.
Returnsβ
System.Decimal
The extracted total amount, or 0 if not found.
AzureFormRecognizerBroker.ExtractTotalTax(DocumentFieldDictionary) Methodβ
Extracts the total tax amount from document fields.
private static decimal ExtractTotalTax(Azure.AI.DocumentIntelligence.DocumentFieldDictionary photoFields);
Parametersβ
photoFields Azure.AI.DocumentIntelligence.DocumentFieldDictionary
The document fields from OCR analysis.
Returnsβ
System.Decimal
The extracted tax amount, or 0 if not found.
AzureFormRecognizerBroker.ExtractTransactionDateTime(DocumentFieldDictionary) Methodβ
Extracts the transaction date and time from document fields.
private static System.DateTimeOffset ExtractTransactionDateTime(Azure.AI.DocumentIntelligence.DocumentFieldDictionary photoFields);
Parametersβ
photoFields Azure.AI.DocumentIntelligence.DocumentFieldDictionary
The document fields from OCR analysis.
Returnsβ
System.DateTimeOffset
The extracted transaction datetime, or current time if not found.
AzureFormRecognizerBroker.ParseTimeFromDigits(char[], bool) Methodβ
Parses a TimeSpan from an array of digit characters.
private static System.TimeSpan ParseTimeFromDigits(char[] digits, bool hasSeconds);
Parametersβ
digits System.Char[]
Array of digit characters (HHMM or HHMMSS format).
hasSeconds System.Boolean
Whether the digits include seconds.
Returnsβ
System.TimeSpan
The parsed TimeSpan.
AzureFormRecognizerBroker.PerformOcrAnalysisOnSingleInvoice(Invoice, AnalysisOptions) Methodβ
Executes OCR + structured field extraction against the invoice's scan URI and merges recognized data into the provided aggregate.
public System.Threading.Tasks.ValueTask<arolariu.Backend.Domain.Invoices.DDD.AggregatorRoots.Invoices.Invoice> PerformOcrAnalysisOnSingleInvoice(arolariu.Backend.Domain.Invoices.DDD.AggregatorRoots.Invoices.Invoice invoice, arolariu.Backend.Domain.Invoices.DTOs.AnalysisOptions options);
Parametersβ
invoice Invoice
Target invoice aggregate (MUST NOT be null; MUST contain a Scans.Location URI).
options AnalysisOptions
Analysis directives (currently advisory placeholder).
Implements PerformOcrAnalysisOnSingleInvoice(Invoice, AnalysisOptions)
Returnsβ
System.Threading.Tasks.ValueTask<Invoice>
Same invoice instance enriched with recognized data.
Exceptionsβ
System.ArgumentNullException
Thrown when invoice is null.
Remarksβ
Model: Invokes AnalyzeDocumentAsync("prebuilt-receipt"). Assumes Scans contains a resolvable, accessible URI.
Mutation: Populates (or overwrites) MerchantReference, Items, and PaymentInformation via internal transformation helpers.
Existing collection contents are appended (current implementation performs additive population; upstream deduplication MAY be required).
Failure Handling: Throws on null invoice argument and propagates Azure SDK exceptions (network/service) without translation. Partial field absence results in sentinel defaults without exception.
Options: Current implementation does not conditionally short-circuit based on options (backlog: selectively disable OCR stage).
AzureFormRecognizerBroker.PerformOcrAnalysisOnSingleMerchant(InvoiceScan, Merchant, AnalysisOptions) Methodβ
Performs optical character recognition + structural field extraction on a single merchant document image
public System.Threading.Tasks.ValueTask<arolariu.Backend.Domain.Invoices.DDD.Entities.Merchants.Merchant> PerformOcrAnalysisOnSingleMerchant(arolariu.Backend.Domain.Invoices.DDD.AggregatorRoots.Invoices.InvoiceScan scan, arolariu.Backend.Domain.Invoices.DDD.Entities.Merchants.Merchant merchant, arolariu.Backend.Domain.Invoices.DTOs.AnalysisOptions options);
Parametersβ
scan InvoiceScan
merchant Merchant
options AnalysisOptions
Implements PerformOcrAnalysisOnSingleMerchant(InvoiceScan, Merchant, AnalysisOptions)
Returnsβ
System.Threading.Tasks.ValueTask<Merchant>
Interfacesβ
IFormRecognizerBroker Interfaceβ
Thin OCR (document intelligence) broker abstraction for extracting structured invoice signals (merchant identity, line items, payment data) from a raw scanned image / PDF using Azure Form Recognizer (Document Intelligence) prebuilt models.
public interface IFormRecognizerBroker
Derived
β³ AzureFormRecognizerBroker
Remarksβ
Role (Broker Standard): Wraps a single external SDK (DocumentAnalysisClient) and exposes a minimal, task-oriented
operation. No business validation, persistence, enrichment orchestration, retry policy, telemetry, or authorization logic is performed here.
Scope: Currently targets the Azure prebuilt receipt model (prebuilt-receipt). Future enhancement may introduce:
custom-trained models, adaptive model routing, multi-page aggregation, locale normalization, or confidence-based filtering.
Output Semantics: The supplied Invoice instance is returned (same reference or mutated clone in implementations) with merchant reference, line item collection, and payment information populated when recognizable. Unrecognized fields remain at sentinel defaults. Implementations MUST avoid throwing for partial extraction failure β only catastrophic / argument errors should escape.
Thread Safety: Implementations are expected to be registered as scoped services; underlying Azure SDK clients are thread-safe.
Performance Considerations: OCR latency dominates; callers SHOULD parallelize across invoices externally when bulk importing. Consider upstream caching / deduplication for identical source images.
Backlog: Cancellation token support, confidence threshold filtering, partial page segmentation, raw field provenance exposure, and metrics hooks (latency, pages, confidence distribution).
Methodsβ
IFormRecognizerBroker.PerformOcrAnalysisOnSingleInvoice(Invoice, AnalysisOptions) Methodβ
Performs optical character recognition + structural field extraction on a single invoice scan and projects results into the provided aggregate.
System.Threading.Tasks.ValueTask<arolariu.Backend.Domain.Invoices.DDD.AggregatorRoots.Invoices.Invoice> PerformOcrAnalysisOnSingleInvoice(arolariu.Backend.Domain.Invoices.DDD.AggregatorRoots.Invoices.Invoice invoice, arolariu.Backend.Domain.Invoices.DTOs.AnalysisOptions options);
Parametersβ
invoice Invoice
Target invoice aggregate to enrich (MUST NOT be null; MUST have initialized collections).
options AnalysisOptions
Analysis directives controlling which enrichment phases are active (broker may short-circuit when disabled).
Returnsβ
System.Threading.Tasks.ValueTask<Invoice>
The enriched invoice aggregate (same instance reference).
Exceptionsβ
System.ArgumentNullException
Thrown when invoice is null.
Remarksβ
Mutation: The passed invoice instance is enriched in-place (merchant, items, payment, metadata hooks) and then returned. Callers requiring immutability SHOULD clone prior to invocation.
Model: Uses the Azure prebuilt receipt model (identifier: prebuilt-receipt). This may evolve; callers SHOULD NOT hardβcode assumptions
about recognition fidelity or field naming beyond domain mapping provided here.
Failure Handling: Argument null results in System.ArgumentNullException. Provider / transport exceptions bubble for higher-layer classification (retry / circuit breaker). Partial extraction never throws.
Options: The options parameter allows higher orchestration to toggle OCR participation within a larger enrichment pipeline.
IFormRecognizerBroker.PerformOcrAnalysisOnSingleMerchant(InvoiceScan, Merchant, AnalysisOptions) Methodβ
Performs optical character recognition + structural field extraction on a single merchant document image
System.Threading.Tasks.ValueTask<arolariu.Backend.Domain.Invoices.DDD.Entities.Merchants.Merchant> PerformOcrAnalysisOnSingleMerchant(arolariu.Backend.Domain.Invoices.DDD.AggregatorRoots.Invoices.InvoiceScan scan, arolariu.Backend.Domain.Invoices.DDD.Entities.Merchants.Merchant merchant, arolariu.Backend.Domain.Invoices.DTOs.AnalysisOptions options);
Parametersβ
scan InvoiceScan
merchant Merchant
options AnalysisOptions