Extensions

Introduction

Fast Session Deletion

The extension for fast session deletion, consist of:

  • A function in the acdcClient acdcClient::cache\$fastSessionDeletion()
  • A table temp_keys in the database (see section Data Design)
  • A shell script scheduled cleantempkeys.sh by crontab

The interfaces of the described extension to the other components of AC-DC is described in the figure below:

sequenceDiagram
    Alice->>John: Hello John, how are you?
    John-->>Alice: Great!
    Alice-)John: See you later!

If acdcClient::cache\$fastSessionDeletion() is called during a session execution on AC-DC, the future session location ocpu-store is identified by replacing ocpu-tmp with ocpu-store in the output by the getwd() R command. The function is writing the session path as a new entry to the temp_keys table. The session is now signed Up for fast deletion and will be removed with the next cleaning cycle (every 10min).

Crontab is scheduling the bash script cleantempkeys.sh every 10min. The bash script is querying the temp_keys table on the local PostgreSQL database using the psql utility with the local user postgres. As a local connection is used no password needs to be supplied.

All session_path values from the entries which are older than 1min (created_at) are deleted from the session cache at /tmp/ocpu-store/ using the rm --rf command. If the deletion completes successful the entry in the database is deleted using psql. Errors during rm --rf are caught and written to the database field error_msg.

sequenceDiagram
    Alice->>John: Hello John, how are you?
    John-->>Alice: Great!
    Alice-)John: See you later!

Session Management

The extension for session look-Up, consist of:

  • Two function in the acdc client (storing a RPC result, or a data object) acdcClient::cache$cacheFunCall(fun, args), acdcClient::cache$cacheData(data, identifier), acdcClient::cache$readCache(identifier)
  • A table datasets in the database (see section Data Design)
  • A shell script scheduled cleancache.sh by crontab

The interfaces of the described extension to the other components of AC-DC is described in the figure below:

sequenceDiagram
    Alice->>John: Hello John, how are you?
    John-->>Alice: Great!
    Alice-)John: See you later!

If the an arbitrary function call is wrapped in the acdcClient::cache$cacheFunCall(fun, args) the following workflow is executed. The list of arguments together with the function name is wrapped in a vector and passed to the jsonlite::toJSON serializer

All functions arguments needs to implement this function, for custom objects maybe an individual method dispatch is required.

The serialized string is hashed using the md5 algorithm. Using this hash, the datasets table is queried. Depending if an entry could be found or not the following methods are executed:

  • If an entry was found, the path to the RDS object is read from the DB and the object is restored into the current R session and returned (instead of executing the function call). Prior to the restore the bad_value rate is checked. If some bad values were detected the function call is re-executed and the entry in the DB updated with the new sessionID

  • If no entries were found, the function call is executed. The result is saved to the working session using saveRDS() and a new entry in the DB using the hash and the path to the rds object is created.

Instead of wrapping a function call using the acdcR package, already created data objects can be stored using the acdcClient::cache$cacheData(data, identifier) function. The workflow is the same as above, with the difference that the user needs to select manually a identifier (for a data frame the identifier c(colnames(df),rownames(df))) is suggested and needs to read manually.

Errors during the restore of the rds object are caught, by re-executing the function call and returning the result.

The restored/original object is returned together with a timestamp of the function execution generating the data. In this case third-party applications can implement custom behavior based on cache or not cached data (e.g. force reload using cache = FALSE argument)

The script cleancache.sh is scheduled by crontab every 10 min an deletes all entries in the datasets table, which are older than 24hours in order to keep the entries in the database synchronized with the existing directories in /tmp/ocpu-store on the hard drive.

erDiagram
    CUSTOMER ||--o{ ORDER : places
    ORDER ||--|{ LINE-ITEM : contains
    CUSTOMER }|..|{ DELIVERY-ADDRESS : uses

Dependencies

The introduced extensions to openCPU, requires the following dependencies for AC-DC:

R Package dependencies

The openCPU extensions import the following R packages:

  • RPostgreSQL used as a database driver for accessing PostgreSQL from R
  • DBI a dependency used for accessing the database
  • R6 Package used for implementing OOP in R, lightweight alternative to S3 and S4 classes.
  • lubridate Package for easy date handling. Dealing with cached dates and databases, paring of ISO-8601 dates.
  • safer A consistent interface to encrypt/decrypt strings, objects, files and connections in R. Both symmetric and asymmetric encryption methods are supported. Thanks to excellent packages sodium and base64enc.

Linux Package dependencies

  • postgres (>= 13) Implementing an RDBMS database server
  • psql (>= 13.3) Utility for CLI to PostgreSQL database, used in bash scripting for database synchronizing
This page was last edited on 03 May 2024, 07:57 (UTC).