Caching

Introduction

Analytical calculations can be resources intensive. Especially during the training or fitting a model. As DCP made the decision that data remain in their source systems with an online connection, handling workloads has a big significance. In order to reduce load on the source systems, DCP implements multiple layers of caching:

  • Datasource caching - if the data source provides some native caching functionality (e.g. Aveeva webAPI cache) these features should be enabled and utilized
  • AC-DC caching - the AC-DC component adds an caching layer to openCPU which can be utilized for caching datasets
  • Real-time while the layers above are generic for DCP - the MVDA module adds another caching layer. Based on likelihood the access to real-time data is considered more likely (and concurrent) and therefore a dedicated cache service is implemented.

Real-time caching

MVDA caches the real-time evaluation of batch evolution models, as batch level models do not support real-time updates (the model vectors are only defined after the end of a batch).

Sources In the MVDA module the following sources can lead to a sign up which should be cached:

  • a created dashboard element
  • a configured notification
  • a user is opening an active batch in process analytics
  • a batch with running status is present in the batch database (commenting/backfilling)

These sources are collected and registered for EventSubscription, see the implementation in BatchEventController for the exact conditions.

The MVDA Cache worker services combines multiple WorkerServices using the ServiceCollectionHostedService method. The following services are registered:

  • CacheHandlerWorker is responsible for handling batch subscription messages from the message bus - in order to create/update or release SignUps
  • BatchDurationWorker is responsible for checking the duration of active sign ups to detect abnormal behavior and remove long running signups
  • CacheRealDataWorker is responsible for updating the cache data for the active sign ups
  • FixBadValuesWorker is attempting to updated cached values which are marked as bad by either the data source or the calculation node
  • ModelValidationStatusCacheWorker is caching the validation status of a model as related EDMS infrastructure only supports single instance checks

Cache Handler

The cache handler is listening for BatchStart and BatchEnd events from the message broker. The high-level concept can be seen in the illustration below:

stateDiagram-v2

    state if_state1 <>
    state if_state2 <>

    [*] --> SignupSource: BatchStart
    SignupSource --> if_state1
    if_state1--> ConditionCheck: Sources found
    if_state1 -->  [*]: No sources found
    ConditionCheck --> if_state2
    if_state2 --> InactiveSignUp: Condition do not match
    if_state2 --> ActiveSignUp: Condition match
    state ActiveSignUp {
      Update --> Update: every 20s
    }
    ActiveSignUp --> ReleaseSignUp: BatchEnd 
    ReleaseSignUp --> [*]

    InactiveSignUp --> [*]

With the start of an event the service performs a condition check. The conditions to satisfy are set by the model developer and loaded from the model entity. The current context is taken from the message. If a context attribute is missing - this is treated as a no match. When the conditions are not fulfilled the state is saved in the database without any further actions. If the condition test is passed the service is responsible for populating the initial cache (the period from the batch start time to the current time). The state is saved in the database.

Batch Duration

There are some scenarios where a batch end event has been missed (e.g. service restart) or the event is erroneous not closed on the data source layer (e.g. wrong analysis configuration, connection timeout). If for these events active sign-ups would be present they would never be released and cached forever - to avoid this and related performance degradation the batch duration worker is implemented. The responsibility of the worker is to detect abnormal long-running events and re-lease them in a defined way.

The high level concept is illustrated in the figure below:

stateDiagram-v2
    state if_state <>
    [*] --> ActiveSignUp
    ActiveSignUp --> if_state
    if_state --> Release: if Model.AvgRuntime * 3 > DateTime.Now Event.Starttime
    if_state --> KeepCaching : if Model.AvgRuntime * 3 <= DateTime.Now Event.Starttime
    KeepCaching --> ActiveSignUp

Model Validationstatus Cache

The validation status cache is a very simple background task which is running every hour and updates the document status and the validation status of the related models. The document status needs to be cached as the API for the EDMS status only accepts one documentId per request. This would result in a big amount of parallel requests whenever the models are presented in a list view. The cache workflow is illustrated in the figure below:

sequenceDiagram
  ValidationStatusCache->>WorkerService: Fetch oldest 5% of the records
  WorkerService->>EDMS: Request document status
  EDMS->>WorkerService: Return document status
  WorkerService->>ValidationStatusCache: Update document status in cache

As the oldest 5% of the records are updated on each iteration and the iteration is performed on a hourly basis the maximum cache age in the system is 24 hours. As document status changes are not a volatile process this can be accepted.

Cache Real Data

This worker is performing the actual updates to the CacheData table. The Cache uses the following database tables:

erDiagram

    SignUp {
        int ID ""
        int ModelId "The linked model from which the calculation was based on"
        tinyint CalculationType "The calculation type - describing the type of the model output"
        nvarchar BatchStartTime ""
        nvarchar BatchId "Human readable identifier for the batch - unique constrain not guaranteed"
        nvarchar EventFrameId "Unique identifier of batch on the datasource layer"
        tinyint EventType "Indicating the source of the sign up (dashboard, notification, etc.)"
        nvarchar DeviceWebId "The equipment identifier on the datasource"
        int ModelVersion "The model version which the calculation are based on"
        nvarchar YAxisLabel "The human readable axis label describing the output"
        tinyint BatchMachingConditions "Is the batch passing the conditions as defined by the model"
        bit IsDurationExeeded "Is the signup present longer than the expected duration"
    }

    CacheData {
      bigint ID ""
      float Maturity "The (model-dependent) maturity value of the data point"
      float Value "The value of the calculation point at the maturity timepoint"
      datetime2 ObsID "Unique identifier of the observation/data point on the datasource layer"
      float DistanceMetric "The calculated distance metric for the datapoint - used for limit assessment"
      bit BadValue "Is the data point marked as bad (data source or numerical problems)"
      int SignUpID ""
      tinyint SpecialValueType "Encoding special values not supported by the database e.g. Inf, NaN, etc."
    }

    SignUpTags {
      int ID ""
      int SignUpID ""
      nvarchar TagName "The name/identifier of the tag - describing the model output"
      nvarchar TagValue "The value of the tag - describing the model output"
      tinyint TagType "The type of the tag (calculation or limit related) - describing the model output"
    }

    Model ||--|| SignUp : "has"
    SignUp  ||--|{ CacheData : "has"
    SignUp  ||--|{ SignUpTags : "identified by"
Specification Value
Content/Overview Currently active batches and their values
Data classification Cache only
Change Tracking No
Audit Trail No
Retention period N/A

In order to minimize the requests all signUps are grouped by the model and then a bulkRequest update is performed (As some restrictions are given by calculation node no further grouping is allowed).

This page was last edited on 03 May 2024, 07:57 (UTC).