The Data Computation Platform (DCP) is intended to be an expandable and robust platform supporting data analytics in pharmaceutical manufacturing and having a strong focus on its integration in the surrounding business process. Therefore, it needs to be developed in a way that allows the rollout and update of each module individually without affecting the functionality of any other module. DCP needs to appear as one holistic application for the user, but is in fact a collection of independent micro-services that interact with each other.
Another characteristic property of DCP is the intent to integrate modules that can be used in GxP context (validated modules) and modules with are intended to be used for information only (non-validated modules) in the same tool. Thus, DCP acts as an umbrella or framework to integrate various modules.
DCP is based on independent frontend and backend microservices. The intention of the microservices is true for the frontend and backend in general, as well as splitting modules in general. Each module consists of at least one microservice for the front-end and one microservice for the backend. This allows a completely decoupled deployment of all components and secures that the deployment of one component is not affecting another component in an unexpected manner. However, elements from the DCP Framework can be reused in other components (e.g. login approach and presentation principles).
The Portal shall be realized as a single-page application (SPA) running in a web-browser for easy distribution and access. SPAs dynamically update only parts of the web page as the user interacts with the system and avoid constant page reloads to create a fluid and responsive user experience. It shall be based on current web-technologies (Angular 11), namely HTML5 and TypeScript/JavaScript to ensure portability and maintainability. The Portal shall display process data in dynamic ways, so that end users can interact with charts and visualizations. The browser-based Web-Portal is hosted using Microsoft IIS 10. The different modules (shared and dedicated) are hosted as independent sites and each backend is listening to a different port. The same approach is followed for the Front-End of each module: The different modules (shared and dedicated) are independent micro-applications which can be developed, compiled and deployed without impacting each other. The data is stored in a shared environment using Microsoft SQL Server instances. For each module an individual and independent database instance is used. The messaging queue is utilizing RabbitMQ 3.9.x.
DCPs Data Layer is utilizing off the shelf OSIsoft PI software products. Those are the PI data archives and PI AF servers, which are operated at the sites and synchronized with a global reporting solution (GReTA). DCP will not connect to any PI data archive directly, because DCP requires OSIsoft's PI Web API as an interface which is provided by GReTA AF servers. The PI Web API is a REST layer to call PI AF functions. The AF servers support the two following key elements that are required and used by DCP:
The Analytics Layer of DCP is based on the Analytical Core of Digital Clone (AC-DC), which was developed during the Pilot Phase. AC-DC will become a validated system component if following the defined installation verification document. In its core, AC-DC is utilizing R scripts to perform calculations and is exposing R functions as callable REST endpoints. All data transfer (data input and results) utilizes JSON objects. One of the major advantages of AC-DC is that the native R language can be used (no specific coding for endpoints required) and, thus, only a very small overhead (compared to running the R script in a standalone environment) is added. Returned calculation results are reusable and can be used as input for additional calls. AC-DC provides the option to perform advanced data caching. This approach would separate the browser-based web-portal from analytical R code, thus fostering collaboration between programmers on the IT side and data scientists on the business side to mitigate likely expert-knowledge transfer issues.
The Analytics Layer will consist of multiple independent calculation engine nodes hosted in different data centers across the Roche/Genentech network. It is not necessary that for each site an individual calculation engine node is created. Instead it is possible that sites are sharing calculation engine nodes.
Microservices have to solve the problem of duplicating parts of code. While there is a big debate on how to solve this problem, the DCP core team has made the decision to provide shared models/interfaces/functionality as a Nuget package. Packing this way has the advantage that code can be shared without introducing a deployment dependency, different projects can base on top of different package versions. Nevertheless the changes to the core package a controlled and tried to be minimized. The DCP Framework provides:
appsettings.json
file. However secrets e.g. connection strings, etc. are removed and provided by a different source. The default SecretConfigurationProvider is Hashicorp Vault.Frontend code is developed as a single page application (SPA) using the angular framework [^1]. The code is separated into micro frontends. We define a microfrontend as a section of the DCP UI, often consisting of dozens of components, that use frameworks like React, Vue, and Angular to render their components. Each microfrontend is managed independently (separated repository, team, build- and deploy process). Each micro frontend has its own git repository, its own package.json file, and its own build tool configuration.
[^1]: DCP is set-up that different modules can use different Angular versions. The current deploy uses Angular version 11, shared components are prepared to support version 17, upgrade of modules is started.
The Single-SPA library is used to combine multiple JavaScript micro frontends in a frontend application. Single SPA allows:
The DCP framework consists of the following micro-frontends:
Dedicated modules consist of one or multiple frontend apps, which are imported by the single-SPA library.
The base application contains a index.html file. This file loads all other applications in DCP. This is done using:
SystemJS Importmaps
Importmaps allow to import dynamically javascript files like Anuglar builded main.js
or VueJS app.js
.
<!--index.html-->
<meta name="importmap-type" content="systemjs-importmap">
<script type="systemjs-importmap">
{
"imports": {
"@dcp/login": "./app-login/main.js?version=4",
"@dcp/administration": ".app-administration/main.js?version=4",
"@dcp/framework": "./app-framework/main.js?version=5",
"@dcp/mvda": "./app-mvda/main.js?version=8",
"@dcp/chrom-ta": "./app-chrom-ta/main.js?version=4",
"@dcp/api-service": "./app-api-service/main.js?version=4",
"@dcp/reporting": "./app-reporting/main.js?version=3",
"single-spa": "js/single-spa.min.js",
"single-spa-layout": "js/single-spa-layout.js"
}
}
</script>
<script src="js/system.min.js"></script>
<script>
Promise.all([
System.import('single-spa'),
System.import('single-spa-layout')
])
.then(([singleSpa, singleSpaLayout]) => {
// Import all applications ....
});
</script>
Single Spa Layout
Purpose of Single Spa Layout is to provide simple routing API, that gives information to Sing Spa when and where to load given applications.
<!--index.html-->
<head>
<template id="single-spa-layout">
<single-spa-router>
<route path="/app/:siteId">
<route path="/administration">
<!-- We load Administration application -->
<application name="@dcp/administration" loader="administrationLoader"></application>
</route>
<route path="/menu">
<!-- We load some HTML, Side Menu Application and Top Navigation Application -->
<div class="wrapper">
<nav class="side-menu">
<application name="@dcp/side-menu"></application>
</nav>
<header class="top-navigation">
<application name="@dcp/top-navigation"></application>
</header>
<main class="main-content dcp-app-wrapper">
<route path="/saw">
<!-- We load SAW application -->
<application name="@dcp/saw" loader="sawLoader"></application>
</route>
</main>
</div>
</route>
<route default>
<application name="@dcp/framework" loader="frameworkLoader"></application>
</route>
</route>
<route path="/login">
<!-- We load Login application -->
<application name="@dcp/login" loader="loginLoader"></application>
</route>
<route default>
<!-- Default option -->
<application name="@dcp/framework" loader="frameworkLoader"></application>
</route>
</single-spa-router>
</template>
</head>
After that we construct Routes based on this template, and also we can create different loaders for applications.
<!--index.html-->
<script>
Promise.all([
System.import('single-spa'),
System.import('single-spa-layout')
])
.then(([singleSpa, singleSpaLayout]) => {
const simpleLoading = `Loading`;
function createLoaderHtml(module) {
return `Loading ${module}`
}
const {constructApplications, constructLayoutEngine, constructRoutes} = singleSpaLayout;
const {registerApplication, start} = singleSpa;
const routes = constructRoutes(
document.querySelector("#single-spa-layout"),
{
loaders: {
frameworkLoader: createLoaderHtml('Framework'),
administrationLoader: createLoaderHtml('Administration'),
simpleLoader: simpleLoading
}
}
);
const applications = constructApplications({
routes,
loadApp({name}) {
// @ts-ignore
return System.import(name);
},
});
const layoutEngine = constructLayoutEngine({routes, applications});
applications.forEach(registerApplication);
layoutEngine.activate();
start({
urlRerouteOnly: true,
});
});
</script>
Different apps communicate with each other using the browser LocalStorage and NGRX. The ngrx-store-localstorage is used to populate UI and user data, while loading some apps.
Transitioning between applications is implemented using the mount and unmount lifecycle methods. The logic is implemented inside Single-SPA no custom code is required.
Single-SPA needs static files names to reference in the imports map. Static file names can cause trouble when a new version is deployed, as the client browser will read the file from cache instead of loading the new version. To overcome this issue the set-up is performed a GET parameter containing the version number is added to the systemjs-importmap, e.g.: ./app-chrom-ta/main.js?version=1 . When releasing a new version, this is counter is incremented and the different route will force the browser to load the new file versions.
In order to realize the benefits of a platform approach, some functionality needs to be shared. Sharing functionality and combining multiple modules under one hood requires some inter-module communication. In order to maintain the separation, the following rules are applied:
Inside DCP two services messaging approaches are used. Both are based on the AMQP protocol, implemented in RabbitMQ. These are implementing different types of EventMessaging. The type of messaging used is described in the specification of the queues.
Communication between the SPAs or backend webAPIs requires communication across different origins. These problems are known as CORS issues. To overcome this issue, all domains are allowed using the allow-domains-origin-header settings in the web.config of every backend API.
To maintain the validated state for GxP components and remain the freedom of the non-GxP components, DCP implements a distributed hosting concept. Every DCP environment consists of at least two app server nodes. One node is operated as a controlled environment, following IT System Operations for validated systems, such as change control and incident management and installation verification. On this node, all GxP modules are hosted. The other, non-GxP, node contains the non-validated modules. Shared resources for inter service communication between GxP and non-GxP components (e.g. RabbitMQ) are hosted on the GxP node or on dedicated nodes.
The Frontend uses single-spa library to distribute code into different modules and infrastructure nodes. For each infrastructure node an IIS site is created. The different micro frontend apps are hosted as Applications of this site using the same application pool. The IIS site folder on the GxP node contains the web.config and the single-spa library and the map files to load the micro-apps from the corresponding micro-frontends. In order to load from different server nodes (physical locations) absolute URL path are used. The IIS site for the non-GxP node consists of a Folder containing the web.config file and (if applicable) the web components, in order to configure CORS.
For the calculation nodes the message size are expected to be bigger than the raw data transfer, therefore decentralized calculation nodes near the data (in the site data center) are deployed. Decentralizing on a site level acts as a "load balancer" which distributes computation requests to site specific nodes. As indicated above, each deployment consists of a GxP and a non-GxP node which will be used respectively.