The MVDA module implements various algorithms, in order to provide different monitoring capabilities. From an implementation point of view they can be separated along two dimensions. Refer to them as:
The first dimension, the model class, is based on the view on how the timeseries going into the analysis are represented. DCP distinguishes between:
The second dimension DCP uses to classify models, is referred to as model type and is mainly an encoding of the applied algorithm. The model structure (available outputs, required-encoding, etc.) is algorithm dependent and therefore need to be managed.
These dimensions can be re-assembled into the following hierarchy. This diagram contains only the most important methods.
For a complete list of methods or for method details check the inline documentation of the interface, model interfaces are named: IModelService
, IBatchEvolutionModelService
and IBatchLevelService
.
classDiagram class Model{ +GetModelConfigurationFieldsAsync() +GetConfigurationDefaults() +CreateModelAsync() +UpdateModelAsync() +ModelUploadAsync() +TestModelAsync() +ParameterLimitsAsync() +RawParameterLimitsAsync() +GetModelParameters() +GetModelSensors() } class BEMModel{ +CreateSignUpsForActiveBatchesAsync() +BatchDataHistoricalWithCacheAsync() +BatchDataAtTimeWithCacheAsync() +GetBulkValuesUpdate() +ContributionPlotAsync() +BatchRawDataHistoricalAsync() +BatchRawDataAtTimeAsync() +BatchDataAtTimeAsync() +CheckDataPointViolationsAsync() +CalculateBatchSummaryStatistics() } class BLMModel{ +BatchDataAsync() +SourceOfVariationPlotAsync() +BatchRawDataAsync() +BatchDataPointCommentAsync() +CheckBatchViolations() +CalculateBatchSummaryStatistics() } Model <|-- BEMModel Model <|-- BLMModel class PLSBEMModel BEMModel ..|> PLSBEMModel class PCABLMModel BLMModel ..|> PCABLMModel
\newpage
At this point of time, the following model types are implemented:
The model class is mainly defining the functionality what could be done with the model. As BEM models support real-time caching in addition more methods need to be implemented. The following table compares the supported functionality based on the model class:
Functionality | Batch Evolution Models | Batch Level Models |
---|---|---|
Summary Statistics | Yes, implemented on concrete model type | Yes, implemented on concrete model type |
Real-time | Caching supported | No support |
Diagnostic plots | Contribution plot - event specific | Source of variation plot - event agnostic |
Commenting | Single observations or complete batches | Complete batches only |
The classes BEMModelService
respective BLMModelService
are abstract classes, which only implement model class specific but model type agnostic functionality. For example, working with raw data BatchRawDataAtTimeAsync()
. Due to their abstract nature they need a concrete implementation which is providing algorithm/model type specific logic.
The main differences in the implementation per model type are in method validations, calculating distance metrics. The concrete mathematical calculations are performed on the calculation node. See the acdcMvda
R package documentation for further details.
DCP MVDA uses JSON as storage format. The JSON documents are saved in the database utilizing the JSON support provided by MS SQL Server. The model format has some common sections, within the sections the format is adopted to fit the needs of the model type. The common sections are presented below.
This section contains general information on the model, e.g. the class, the format version, etc. The basic information is strong typed and mandatory for every model implementation. If a model might not require a specific field, null values are allowed:
This section contains a set/list of unique Ids at the data source layer which are used to read the input variables. This section is strong typed. Every sensor has the following properties:
In this section the limits of model parameters are stored, the structure is adopted to the concrete model class/structure, therefore the section has lose types.
This section contains information on model fit quality, typical examples are R2X (goodness of fit), as fit measure might vary across different model types, this section has lose types.
This section contains the internal coefficients/weights used to calculate model outputs or for diagnosing models. Available coefficients are normally stored as matrices, however the available components heavily depend on the model and therefore only lose types are used.
This section contains the pre-processing information (transformation, scaling, etc. ). In more detail this includes:
exp(X + 2)
@Sensor1@ - first(@Sensor1@)
The model format has a separate version identifier to allow the evolution of storage formats over time. The load model function implements adapters to translate the historical formats to the current format.
There are two types of model vectors:
In principle, it has to be distinguished between two groups of model output vectors:
input signals: which can be raw signals as ingested into the connected data source. As the data pre-processing is model type independent, there is one implementation for the raw signals within a model class. From a signal processing pipeline, the system differentiates between a 1:1 forwarding and cases where transformations or mathematical formulas need to be applied in order to get the desired output.
output signals: which are the result of a mathematical calculation. The model output is model class and type specific.
A model output specification consists of a mandatory CalculationType, this is an enumeration with two additional attributes: the description is used to convert into a human readable format and the OcpuName which is used as a mapping to the identifier inside the calculation node. Every calculation type may be further described by a set of tags.
Tags have the following properties:
In order to introduce new model vectors into the system the following steps needs to be done:
BatchDataHistorical()
CalculateDistanceMeth
and CheckDataPointViolationsAsync()
used for violation assessmentConfigurationToDisplayString()
GetModelConfigurationFieldsAsync()
and GetConfigurationDefaults()
In order to resolve to correct instances of the class in the different modules the factory pattern is used. Based on the usage in the application - different factories are used. There is a general factory which provides the IModel
interface which contains the methods which needs to be implemented by all models. In multiple places of the application dedicated implementations based on the model class are implemented. In these cases model class specific factories are used to return IBEMModel
or IBLMModel
. This is due to the supported functionality by class (see table above). As an example the Cache worker service utilize IBEMModel
as real-time caching is supported by batch evolution models only.
Models can be in different states, which can be best illustrated in the following diagram:
stateDiagram [*] --> InProgress InProgress --> Completed: complete model Completed --> InValidation: Open Validation Wizard Completed --> InProgress: Changed coefficients InProgress --> InProgress: Iteration without completion InValidation --> Completed: Model Report approved Completed --> Obsolete: User delete Completed --> Archived: User delete Completed --> [*] Obsolete --> [*] Archived --> [*]
The states have the following context:
Inside MVDA Frontend application Model Development Wizard
is separated in a few modules and classes,
that implements different parts of steps and logic.
📂 MVDA app
└──📂 src
├──📂 dataset
├──📂 model-diagnostics
├──📂 model-wizard
└──📂 workset
The src/ folder is the main code and contains:
The most important classes and corresponding methods, are illustrated below. This is not a complete list purposely and only intended to give a high level overview of the most important concepts.
classDiagram class Model { -enum class -enum type +modelClass IModelClass +workset: IWorksetClass +details: IModelDetails +isPublic: bool } class ModelClassFactory { +enum class createPLSModel() createPCAModel() } class Workset { +number Id +Model model +DatasetCollection datasetCollection } class DatasetCollection { -number selectedIndex +Dataset[] datasets createNewDataset() deleteDataset() removeParameter() addSensor() } class Dataset { +string Name +string DeviceWebId +bool IsDefault removeParameter() addSensor() } class BEMFactory { enum class createPLSModel() } class BLMFactory { enum class createPCAModel() } class PCAModel { enum type } class PLSModel { enum type } DatasetCollection o-- Dataset Workset <|-- DatasetCollection BEMFactory <|-- ModelClassFactory BLMFactory <|-- ModelClassFactory PLSModel <|-- BEMFactory PCAModel <|-- BLMFactory Model <|-- PCAModel Model <|-- PLSModel Model <|-- Workset
The usage of the model class depends on the workflow mode. DCP differentiates between Create or Read/Edit mode.
During Create the process is separate on a few steps:
During the Edit/Read mode (inside Model Wizard or Model Validation) same structure is used, but with different steps.
The model entities can be seen below:
erDiagram Model { int ID "" nvarchar Name "The model name - defined by the user" int Version "The model version - incremented when the coefficients are changing" nvarchar Conditions "" nvarchar ReportUrl "The URL to the model report stored as a PDF on the disk" int SiteID "" bit IsPublic "Can the model be accessed/applied by all MVDA users" nvarchar Sensors "" bit InProgress "Is the model currently being edited/reworked" nvarchar Description "The user entered description of the model" nvarchar TestSettings "The test settings used during internal testing" nvarchar DeviceWebIds "" nvarchar DefaultLimitSpecification "The default limit specification - defined by the model developer" int LastModifiedBy "The userId, who performed the last change - used for audit trail" int OwnerUserId "The userId, who is owning the record - may have special privileges" datetime2 SysEndTime "" datetime2 SysStartTime "" nvarchar Batches "" nvarchar Coefficients "The fitted coefficients used to calculate the model output(s)" nvarchar Fit "The fit measures used to access the model quality" nvarchar Info "" nvarchar Limits "The calculated model limit parameters - used for the limit assessment" nvarchar Parameter "" int WorksetID "" bit IsArchived "Flag indicating whether the model is archived" } Workset { int ID "" nvarchar ModelBatches "An array of unique Ids from the datasource - describing the batches used in the training set" nvarchar TestBatches "An array of unique Ids from the datasource - describing the batches used in the internal testing set" tinyint UnfoldingType "the applied unfolding type - either BEM or BLM" int LastModifiedBy "The userId, who performed the last change - used for audit trail" int OwnerUserId "The userId, who is owning the record - may have special privileges" int SiteId "" datetime2 SysEndTime "" datetime2 SysStartTime "" } Dataset { int ID "" nvarchar Name "The name identifying the datasource" nvarchar Filter "The global filter - representing the hirachey to the element" bit IsDefault "Flag indicating the leading dataset" int Interval "The interpoaltion interval between datapoints in seconds" nvarchar DeviceWebId "The equipment identifier on the datasource" int LastModifiedBy "The userId, who performed the last change - used for audit trail" int OwnerUserId "The userId, who is owning the record - may have special privileges" int SiteId "" int WorksetID "" datetime2 SysEndTime "" datetime2 SysStartTime "" nvarchar Batches "" nvarchar Parameter "" nvarchar Sensors "" nvarchar TimeRange "" } ModelSession { int ID "" int ModelID "The model as defined in the database" nvarchar SessionID "The session - referring to a model present on the calculation node" datetime2 SessionDateUpdated "The session date - used to check the validity" } WorksetSession { int ID "" int WorksetID "" nvarchar SessionID "The session - referring to a workset present on the calculation node" datetime2 SessionDateUpdated "The session date - used to check the validity" } DatasetSession { int ID "" int DatasetID "The dataset as defined in the database" nvarchar SessionID "The session - referring to a dataset present on the calculation node" datetime2 SessionDateUpdated "The session date - used to check the validity" } Model ||--|| ModelSession : "has session cache" Model ||--|| Workset : owns Workset ||--|| WorksetSession : "has session cache" Workset ||--|{ Dataset : "consists of" Dataset ||--|| DatasetSession : "has session cache"
Relations to notifications and dashboards are hidden on purpose to simplify the illustration.
Records classification and audit trail
For the session tables:
Specification | Value |
---|---|
Content/Overview | Sessions on the calculation node |
Data classification | Cache only |
Change Tracking | No |
Audit Trail | No |
Retention period | N/A |
All other tables:
Specification | Value |
---|---|
Content/Overview | State and definitions of a MVDA model and the dataset definition |
Data classification | Official records |
Change Tracking | SystemVersioned table features inside SQL |
Audit Trail | Module specific audit trails |
Retention period | 10 years |
erDiagram ModelSharedUsers { int ID "" int ModelId "The linked model from which the calculation was based on" int UserId "The contributor userId - allowed to perform edits" int LastModifiedBy "The userId, who performed the last change - used for audit trail" int OwnerUserId "The userId, who is owning the record - may have special privileges" datetime2 SysEndTime "" datetime2 SysStartTime "" } ModelReview { int ID "" int ModelID "The linked model from which the review was based on" int Version "The linked model version from which the review was based on" nvarchar Comment "The user added comment - descrbing the findings of the review" bit IsAbnormalitiesDetected "Where there any abnormal behaviors identified during the review" int LastModifiedBy "The userId, who performed the last change - used for audit trail" int OwnerUserId "The userId, who is owning the record - may have special privileges" datetime2 SysEndTime "" datetime2 SysStartTime "" } ModelLock { int ID "" int ModelID "" bit IsLocked "Flag indicating whether the model validation is locked" int LockedBy "UserId of the user who locked the model" datetime2 LockedOn "Timestamp in UTC when the model has been locked" } Model ||--o{ ModelSharedUsers : "shared with" Model ||--o{ ModelReview : "documented as" Model ||--|| ModelLock : locks
Records classification and audit trail
For the lock table:
Specification | Value |
---|---|
Content/Overview | User lock state of the model validation |
Data classification | Cache only |
Change Tracking | No |
Audit Trail | No |
Retention period | N/A |
All other tables
Specification | Value |
---|---|
Content/Overview | Model state related records e.g. sharing |
Data classification | Convenience records |
Change Tracking | SystemVersioned table features inside SQL |
Audit Trail | Module specific audit trails |
Retention period | 3 years |
The main reason for separating out the cache service is the usage of temporal features and the implementation of the audit trail. As sessions have a lifetime (of 24 hours by default) records are updated frequently - which would result in incorrect audit trail updates.
The model validation inside the DCP context are only help steps for generating the actual model validation record which is stored in EDMS.
erDiagram ModelSimulationBatches { int ID "" int ModelValidationInformationId "" nvarchar Name "The user defined name of the test scenario" nvarchar Description "The user defined description of the test scenario" tinyint ModelTestResult "The SME defined expected test result" nvarchar SimulatedSensor "" int LastModifiedBy "The userId, who performed the last change - used for audit trail" int OwnerUserId "The userId, who is owning the record - may have special privileges" datetime2 SysEndTime "" datetime2 SysStartTime "" tinyint LikelyHoodOfOccurence "The likelihood of the described event happening" tinyint Severity "The servity of the described event happening" } ValidationDocuments { int ID "" int ModelValidationInformationID "" tinyint DocumentType "The role of the document in the DCP context plan/report" nvarchar LocalFile "The UNC path to the hard disk location where the DCP generated document version is stored" nvarchar EDMSDocumentId "The unique identifier of the document in EDMS" int LastModifiedBy "The userId, who performed the last change - used for audit trail" int OwnerUserId "The userId, who is owning the record - may have special privileges" datetime2 SysEndTime "" datetime2 SysStartTime "" tinyint DocumentStatus "The document status in EDMS e.g. approved, effective, draft, etc." } ModelValidationLock { int ID "" int ModelValidationInformationID "" bit IsLocked "Flag indicating wheather the model validation is locked" int LockedBy "UserId of the user who locked the model validation" datetime2 LockedOn "Timestamp in UTC when the model validation has been locked" } ModelValidationInformation { int ID "" int ModelId "The linked model to be validated" int Version "The linked model version to be validated" tinyint Scope "The user defined scope of the validation" nvarchar OtherScope "The user defined scope details of the validation" nvarchar IntendedUse "The user defined model intended use of the validation" nvarchar ProcessDescription "The user defined process description of the validation" nvarchar AcceptanceCriteria "The user defined acceptance criteria for the validation" nvarchar TestBatches "The user selected test batches" int LastModifiedBy "The userId, who performed the last change - used for audit trail" int OwnerUserId "The userId, who is owning the record - may have special privileges" datetime2 SysEndTime "" datetime2 SysStartTime "" } ModelValidationInformation || --o{ ValidationDocuments: "based on" ModelValidationInformation ||--|| ModelValidationLock: "locks" ModelValidationInformation || --o{ ModelSimulationBatches: "described by"
Records classification and audit trail
For the lock table:
Specification | Value |
---|---|
Content/Overview | user lock state of the model validation |
Data classification | Cache only |
Change Tracking | No |
Audit Trail | No |
Retention period | N/A |
All other tables:
Specification | Value |
---|---|
Content/Overview | Information for the model validation, official validation plan/report in EDMS |
Data classification | Official records |
Change Tracking | SystemVersioned table features inside SQL |
Audit Trail | Module specific audit trails |
Retention period | 10 years |
📂 MVDA app
└──📂 src
├──📂 generic-batch
└──📂 model-validation
The src main folder contains: * generic-batch folder - Contains module, service, store and components that are related to third step of Model Validation Wizard - Simulation * model-validation folder - Contains module, service, store and components that are used to create Model Validation Wizard
Most components that are used to create Model Validation Wizard reside in the ModelValidation module. That includes the wizard component itself and most of the components that are used for the steps inside the wizard:
ValidationWizard component hold main Form and Validators for it. For example validation for Expectation (Total To Fall and To Pass) of selected Batches and Simulation Batches.
Logically, Simulation Batches represent a separate entity and all CRUD operations for it are situated inside GenericBatch module
. The module contains the store, service and models that are used for managing simulation batches. The main components inside the module are:
The GenericBatchEditor
contains all logic for editing or to create a new simulation batch. The GenericBatchGrid
represent all simulation batches, created for the current validation, and uses form control for test expectation per batch.