Batch Database

Introduction
Records
Workflow tracking
Long-term trending
- Service logic
- Realtime versus historical checks

Introduction

The batch database has two main functionalities:

track the workflow (incl. handling of comments) of a checked event
long-term trending to check for inter-batch trends

A batch can be moved to the batch database by the following sources:

commenting on a batch from process analytics (real-time/historical)
a background check for the batch triggered by a notification
a manual triggered backfill (as a part of the admin actions)

Records

erDiagram
Batch ||--|{ BatchCalculation: has
BatchCalculation ||--o{ BatchCalculationTags: "described by"
BatchCalculation ||--|{ BatchComment: "handled"
BatchComment ||--o{ BatchCommentDataPoint: "added"
BatchComment ||--o{ BatchCommentTags: "described by"
BatchCalculation ||--o{ BatchViolation: "has"
BatchCalculation ||--o{ BatchViolationSnapshot: "has"
BatchCalculation ||--|{ BatchCalculationSummaryStatistic: "aggregated to"

Batch {
 int ID "Unique Identifier"
 nvarchar BatchId "Human readable identifier for the batch - unique constrain not guaranteed"
 nvarchar EventFrameId "Unique identifier of batch on the datasource layer"
 datetime2 DateCreated "The creation date"
 nvarchar DeviceWebId ""
 datetime2 BatchStartTime "The start time of the batch/event in UTC as defined on the datasource"
 datetime2 BatchEndTime "The start time of the batch/event in UTC as defined on the datasource"
 int SiteId "the belonging site Id"
 datetime2 SysEndTime ""
 datetime2 SysStartTime ""
 int LastModifiedBy "The userId, who performed the last change - used for audit trail"
 int OwnerUserId "The userId, who is owning the record - may have special privileges"
}

BatchCalculation {
 int ID "Unique identifier"
 int ModelID "The linked model from which the calculation was based on"
 int ModelVersion "The linked model version from which the calculation was based on"
 int BatchID ""
 tinyint CalculationType "The calculation type - describing the type of the model output"
 tinyint Status "The main classification status - after SME review"
 bit IsValidated "The validation status of the components when the record was created"
 int LastModifiedBy "The userId, who performed the last change - used for audit trail"
 int OwnerUserId "The userId, who is owning the record - may have special privileges"
 datetime2 SysEndTime ""
 datetime2 SysStartTime ""
 tinyint ReCalculationStatus "The state of the model in the re-calculation process"
 tinyint SubStatus "The sub classification status - after SME review"
 tinyint ViolationCheckStatus "An indication whether the snapshot values were available at the time of the check"
}

BatchCommentDataPoint {
 int ID ""
 int BatchCommentID ""
 datetime2 ObsID "Unique identifier of the observation/data point on the datasource layer"
}

BatchCommentTags {
 int ID ""
 int BatchCommentID ""
 int LastModifiedBy ""
 int OwnerUserId ""
 nvarchar TagName "The name/identifier of the tag - describing the model output"
 nvarchar TagValue "The value of the tag - describing the model output"
 tinyint TagType "The type of the tag (calculation or limit related) - describing the model output"
 datetime2 SysEndTime ""
 datetime2 SysStartTime ""
}

BatchCalculationSummaryStatistic {
 int ID ""
 int BatchCalculationID "FK to the BatchCalculation Record"
 nvarchar SummaryParameter "The name of the summary parameter which is calculated"
 float Value "The value of the summary parameter for the batch calculation"
}

BatchCalculationTags {
 int ID ""
 int BatchCalculationID ""
 nvarchar TagName "The name/identifier of the tag - describing the model output"
 nvarchar TagValue "The value of the tag - describing the model output"
 tinyint TagType "The type of the tag (calculation or limit related) - describing the model output"
}

BatchComment {
 int ID ""
 int BatchCalculationID ""
 nvarchar Comment "The user entered comment"
 tinyint Type "The type whether specific data points or a whole batch has been commented on"
 int UserIDCreated ""
 datetime2 DateCreated ""
 nvarchar EventFrameId "Unique identifier of batch on the datasource layer"
}

BatchViolation {
 int ID ""
 int BatchCalculationID ""
 datetime2 ObsID "Unique identifier of the observation/data point on the datasource layer"
 float Maturity "The (model-dependent) maturity value of the data point"
 float Value "The value of the calculation point at the maturity timepoint"
 float DistanceMetric "The calculated distance metric for the datapoint - used for limit assement"
 datetime2 DateCreated ""
}

BatchViolationSnapshot {
 int ID ""
 int BatchCalculationID ""
 datetime2 ObsID "Unique identifier of the observation/data point on the datasource layer"
 float Maturity "The (model-dependent) maturity value of the data point"
 float Value "The value of the calculation point at the maturity timepoint"
 float DistanceMetric "The calculated distance metric for the datapoint - used for limit assement"
 datetime2 DateCreated ""
}

Seeing the multivariate process monitoring as a classification problem, to identify abnormal process behavior, the batch database is used to build the classification matrix. In one dimension the model assessment is used, therefore the defaultLimitSpecification is used as a threshold, the other dimension is the process SME assessment which can be done via the batch database. The life cycle including the state transitions can be seen in the event lifecyle. The statues are set on a model and event perspective. If a event has been evaluated using multiple calculation types, the assessment results are aggregated.

Status aggregation The status aggregation is performed dynamically, every BatchCalculation record has a saved classification result. The aggregation is implemented using status hierarchy. The status hierarchy within the first dimension model classification results: pending > positive > negative.

Another responsibility of the batch database is the long-term trending to identify inter-batch variability.

Service logic

The batch database is filled by the BatchDatabaseWorker. The batch database worker hooks into the sign up release procedure. During the batch release the sign up is moved to the batch table(s) in a pending state. The batch database worker service fetches all batch calculation records in a pending state.

stateDiagram-v2

s1: Recalculation pending
s2: Check data stability
s3: Abnormal behavior
s4: Stable data - initiate re-processing

[] --> s1
s1 --> s2
s2 --> if_state
    if_state --> s4: if last archived Event > BatchEndTime
    if_state --> s2 : if n last archived Event < BatchEndTime
    if_state --> s3 : timediff > 1.2 AvgRuntime

s3 --> [*]
s4 --> [*]

The implementation is based on the method CalculateBatchSummaryStatistics() which needs to be implemented by every model. Every model type should implement a set of parameters which are aggregating the entire batch into a single value per batch/event. These values are then trended over time.

Realtime versus historical checks

DCP is working with different EquipmentDataSources. Some data systems (e.g. Aveeva PI) implement compression to reduce the required amount for data storage. This compression is in most cases not compression free.

DCP performs real-time monitoring. As data sources might return the uncompressed data in real-time. Even though the difference (based on the compression settings) is relative little can result in different assessment results when checking the same time point in real-time and historical mode. In order to address this issue - DCP MVDA performs a recalculation of the entire batch when a stable state is detected. Stable state in this context, refers to a state that all inputs from the data source going into a calculation have at least on archived value which is after the end time of the batch/event which should be checked.

Not utilizing incoming snapshots in real-time mode (e.g. when they are not provided, which can be the case if the PI System Connector is used in the architecture to connect to) can lead to bigger interpolation errors for real-time monitoring. Recalculation in the best tradeoff in terms of accuracy and long term data stability - when working with compressed data sources.

This page was last edited on 03 May 2024, 07:57 (UTC).

Batch Database

Introduction

Records

Workflow tracking

Long-term trending

Service logic

Realtime versus historical checks