OpenCPU

Introduction

OpenCPU is build on top of the Apache webserver. With the version 2.0 auf the apache webserver almost every functionality comes as a module. For openCPU the most important module is mod_R which forwards HTTP(S) requests to the embedded R interpreter.

The configurations to the R process are applied when starting the apache webserver. On startup the R script /usr/lib/opencpu/rapache/onstartup.R is executed. This loads the configuration from /etc/opencpu/server.conf and instantiated the parent and the child R processes as defined by the MPM module in apache.

Using the openCPU package, the R handler of the rApache webserver is set to opencpu:::rapachehandler. In the request handler the in-built apache module apreq is prevented from parsing POST requests. The complete request data is collected from rapache and forwarded to the opencpu::serve* function. From this point in time the request is handled inside R, and there is no difference in the logical flow between a single user server (development/testing) on windows and the cloud server.

Depending on the request method HTTP(S) verb, different actions are triggered:

  • If the HTTP(S) verb is HEAD, OPTIONS or GET, the request data is passed to the opencpu::main function* for the further processing of the request
  • For every other HTTP(S) verb a new directory is created using dir.create() and the request is passed to opencpu:::run_worker for further request handling

Handling of HTTP Requests

The entry point for the handling of all requests is the R function main.R. If the request data is in raw format, the RAW payload is parsed (see section Parsing the request payload).

The request object is initiated, the response object reset and the processing of the request is started using the function opencpu::httpget().

The request URL is parsed, based on the called API different request handlers are called. See the section API entry point for details on the request handling.

Parsing the request payload

The request payload parsing is implemented in the opencpu::parse_post() function. Depending on content type header of the request (payload encoding) different handlers are called. The handlers are called using fuzzy logic for matching the specified content type header with the implementation. The table below, lists the implemented payload encodings.

Encoding Request handler Description
Multipart/form-data opencpu::multipart Parses the request payload using webutils::parse_http(). Depending of the filed encoding,the following decodes are used: I().jsonlite::fromJSON() or protolite::unserilize_pb()
application/x-www-form-urlencoded webutils::parse_query Parses the request payload using webutils::parse_query()
application/json jsonlite::fromJSON Parses the request payload using jsonlite::fromJSON()
application/x-protobuf protolite::unserialize_pb Parses the request payload using protolite::unserialize_pb()

All packages used for input parsing are written and maintained by the openCPU core team, assuring compatibility with the openCPU version lifecycle.

Primitives and R code arguments are mapped to R values in the parse_arg() function, called by the execute_function() implementation.

Sessions arguments are mapped to the sessionID::.val object in the parse_arg() function. The namespace is evaluated using session_eval.

API main entry point

The main API root is located at {server-root}/ocpu. The API root can be reconfigured by changing the location mapping in /etc/httpd/conf.d/opencpu.conf. Therefore all third party applications should implement the API root as a configuration parameter.

The request handling is performed by splitting the URL by slash. In every function the head element is evaluated, the required actions are performed in the request handler and the remaining URL is passed to the next functions/request handlers. The entry point for the complete request handling is the R function httpget(). Based on the first parameter the following sub APIs/request handlers are called:


Parameter Request handler Description


lib or library Httpget_library Checks if the library API is enabled (server.conf) and finds the library path in the filesystem, afterwards requests are handled using httpgte_package implementing the R package API

tmp httpget_tmp Checks if the tmp API is enabled (server.conf The session path is extracted and the session API httpget_session() is called for further request handling

doc httpget_doc Finds doc dir and path using R_DOC_DIR variable, the documentation is returned using httpget_file()

user httpget_user Checks if the user in the URL is created as a linux user on the OS.

apps or github httpget_apps Checks if the app is installed, searches the package path containing the app and forward this path to the package API for further request handling.

webhook httpget_webhook API endpoint, which allow the AC-DC admin to register a GitHub Repo for CI of an R package, The R package is installed using remotes::install_github. See the webhook API.

test httpget_testapp Gets the path to the openCPU test page (included in the www directory of the openCPU R package) The webpage is sent to the client using res$sendfile

info httpget_info The object containing the session information is created using the R functions: utils::sessionInfo, libPaths(), envionemnt()$confpaths, The object is forwarded to the http_getobject() handler for the output formatting

R apps API

openCPU apps are static web pages (html, css, js) which are included in an R package. They interface the R functions in this package through the OpenCPU API's. By convention, these apps are placed in the /inst/www/ directory of the R source package.

The package library is located at {API-root}/apps/{pkgname}/{parameter}. The API can be enabled/disabled using enable.api.apps config parameter.

The apps API, checks if the app is installed, searches the package path containing the app and forward this path to the package API for further request handling.

R webhook API

The OpenCPU cloud server includes support for continuous integration (CI). Thereby a GitHub repository can be configured to automatically install a package on an OpenCPU server, every time a commit is pushed to the master branch. To take advantage of this feature, it is required that:

  • The R source package is in the root directory of your repository.
  • The GitHub user account has a public email address

The sub API receives the post commit hook of GitHub. At the moment only GitHub Inc. repos are supported no stash, etc. After receiving the hook from GitHub the package is installed using remotes::install_github(). If enabled, the repo owner is notified about the CI task by sending an E-mail using the SMTP server in the configuration file.

R package API

The package API is used to interact with the R packages installed to the global R package library.

The package library is located at {API-root}/lib/{pkgname}/{parameter}. The API can be enabled/disabled using enable.api.library config parameter. In the package API requested packages and their dependencies not present in the preload configuration are loaded. Packages are loaded by Name from the global R package library. If multiple versions of the same package are installed on the system, the latest (installed) version is loaded.

Based on the request parameter in the request URL, the following request handlers are called:


Parameter Request handler Description


R httpget_package_r Loads the package from the request library and lists the R objects exported by the package. See object API for further information.

data httpget_package_data Data included with this package. Datasets are objects, see R object API.

html httpget_package_html Manuals (help pages) included in this package. Helpfiles are rendered from Rd to HTML tools::Rd2HTML

man httpget_package_man Retrieve help page about topic in output format. Manuals can be formatted into text, html or pdf

info httpget_package_info Show information about this package.

* Httpget_package_file() Sanitize path for traversal attacks, In case of a post Request the requested package is loaded. The file path is forwarded to httpget_file() function for further request handling

R session API

A session is a container that holds resources created from a remote function/script call (RPC). In openCPU every HTTP(S) POST request is mapped to a session, containing all data produced and available during the request execution.

The session API is mapped to the following URL: {API-root}/tmp/{sessionID}/{resource}. The API can be enabled/disabled using enable.api.tmp config parameter. Based on the present resource in the request URL, the following handlers are called:


Parameter Request handler Description Content-Type


R httpget_session_r Reloads the session from set in object .RData. Checks if the API specified object is
present in the session
environment. GET request
are forwarded to the
object API for formatting, in case of POST request
the object is evaluated
using execute_function()

graphics httpget_session_graphics Reloads graphics packages set in object if required. Extracts the API graphics from .REval and creates a list of plots
which is passed to the
object API for output
formatting

files httpget_session_files Sanitize file path from set in object traversal attacks and pass API file_path to the file API
for request handling

source httpget_session_source Extracts the source output set in object from .REval and creates API a list object, which is
passed to the object API
for output formatting

console httpget_session_console Extracts the console set in object output from .REval and API creates a list object,
which is passed to the
object API for output
formatting

warnings httpget_session_warnings Extracts the warnings set in object messages from .REval and API creates a list object,
which is passed to the
object API for output
formatting

messages httpget_session_messages Extracts the messages from set in object .REval and creates a API list object, which is
passed to the object API
for output formatting

stdout httpget_session_stdout Extracts stdout from set in object .REval and creates a API list object, which is
passed to the object API
for output formatting

info httpget_session_info Reads the session set in object information from .RInfo API file and pass the
sessionInfo object to the
object API for output
formatting

zip httpget_session_zip Creates a zip file application/zip (.zip) of the session
directory using
zip::zip, and starts
download of the compressed dictionary

tar httpget_session_tar Creates a tar file application/x-gzip (.tar.gz) of the session directory using
utils::tar, and starts
download of the compressed dictionary

R object API

The package API is used to read R objects, or call R functions.

The session API is mapped to the following URL: {API-root}/.../{R\|data}/{object}/{format} The object API is always enabled and can not be disabled.

The object API has two main request handling paths depending on the request method. For GET requests the httpget_object() function is used to route the object to the correct object serializer for the output formatting. Implemented output formats and the used sterilizers are listed in section package output formatting.

If the request is using the HTTP(S) verb POST, the request handler execute_function() for parsing the function arguments and constructing the function call. The function call is executed using the handler session_eval(). The evaluation is described in the section below.

The POST request can specify some additional format parameter at the end of the URL e.g. /json. If the format parameter is present the object API is called to format the returned object and directly returned to the client. The following format shortcuts are supported: png, svg, pdf, svglite, print, md, bin, csv, feather, json, rda, rds, pb, tab, ndjson,console. If no format parameter is specified, a preview with relative paths to the session API is returned in the response body. The object API checks if the output object can be converted to the specified output format based on the datatype of the captured .val object.

The package output formatting, sterilizer and headers

The output formatting is implemented in the http_get_object function (objectAPI), depending on the output format specification the following sterilizer are called and Content-Type header is set accordingly. If an object can be converted depends if there is an implementation available for the specified encoder. Custom data structures can be added to some encodes using R internal method dispatch.

Format Request handler Encoder Content-type
bin httpget_object_bin Base::writeBin application/octet-stream
csv httpget_object_csv Base::write.csv text/csv; charset=utf-8
feather httpget_object_feather feather::write_feather application/feather
spss httpget_object_spss haven::write_sav application/spss-sav
sas httpget_object_sas haven::write_sas application/sas7bdat
stata httpget_object_stata haven::write_dta application/stata-dta
tab httpget_object_tab Base::Write.table text/plain; charset=utf-8
json httpget_object_json jsonlite ::toJSON application/json
ndjson httpget_object_ndjson jsonlite::stream_out application/x-ndjson charset=utf-8
md httpget_object_md pander::pander text/plain
print httpget_object_print print text/plain
text httpget_object_text cat mimetype
asci httpget_object_ascii deparse text/plain
rda httpget_object_rda save application/octet-stream
rds httpget_object_rds saveRDS application/r-rds
pb httpget_object_pb protolite::serialize_pb application/x-protobuf
png httpget_object_png base::png image/png
pdf httpget_object_pdf base::pdf application/pdf
svg httpget_object_svg svg image/svg+xml
svglite httpget_object_svglite svglite::svglite image/svg+xml

For the implementation of the sterilizers, it is referred to the official package manual on CRAN:

Evaluation of objects

For all post Request a working directory is created. As directory name the session ID is used. The session ID is created using the rand_bytes function from the openssl package, the key length in the server.conf is used to define the number of possible sessions.

In the request session the function call is constructed. This includes:

  • Parsing/evaluation of input arguments
  • Loading the library

The actual evaluation of the function call is performed using the evaluate function from the evaluate package. Compare to eval(), evaluate captures all of the information necessary to recreate the output as if you had copied and pasted the code into a R terminal. It captures messages, warnings, errors and output, all correctly interleaved in the order in which they occurred. It stores the final result, whether or not it should be visible, and the contents of the current graphics device.

Evaluate is executed using the stop_on_error = 1 argument. In this case the evaluation is performed until an error is occurred, the execution is stopped at the point of the error and all results are returned. With this behavior error messages can be captured and forwarded to the user.

OpenCPU is implementing a custom output_handler() defined in evaluate_input() during the evaluation. The custom handler is saving the return value to .val in the session environment and the error object is saved to the global namespace.

For further information regarding the evaluation of a function call, see the official manual on CRAN.

R user API

The session API is mapped to the following URL: {API-root}/user/{userid}/lib}/\*. The API can be enabled/disabled using enable.api.user config parameter.

The user API can be access from all unix users created on the instance running openCPU and listed in /etc/passwd. OpenCPU is reading this file for user validation.

After the user is validated, by checking the unix password file, the requested library is loaded from the user library path, instead of the global library. Package dependencies might still be loaded from the global library.

The remaining URL is passed to the R package API for further request handling.

R file API

The R file API has two different behavior based on the request method (HTTP verb). GET requests are sent to the clients using the res$sendfile() method. All POST requests to the R file API are handled in execute_file(). Based on the format specification in the request URL, the correct handler for building the function call is executed. The handlers construct the function call and pass it to the session_eval() function to execute the call, creating the document. The following document formats are supported:

Format Request handler Interpreter Type
file.r httppost_rscript evaluate::evaluate R script
file.rnw httppost_knittex knitr::knit knitr/sweave tools::texi2pdf
file.rmd httppost_knitpandoc knitr::knit,knitr::pandoc knitr/markdown
file.rmd httppost_knit knitr::knit knitr
file.brew httppost_brew brew::brew brew
file.md httppost_pandoc knitr::pandoc markdown
file.tex httppost_latex tools::texi2pdf latex

Dependencies

openCPU dependencies

The only qualified and tested instance of openCPU can be obtained using RHEL as OS. For using the full capabilities of openCPU has dependencies R packages and some linux utilities. The following sections are listing the openCPU dependencies:

R Package dependencies

The openCPU imports the following R packages:

  • evaluate (>= 0.12) Package for evaluating the R function call/scripts mapped from the request URL and for evaluating the R function capturing the output
  • httpuv (>= 1.3 ) used for implementing the single user server for debugging purposes on windows
  • knitr (>= 1.6) used for knitting markdown documents
  • jsonlite (>= 1.4) used for input output mapping and parsing of JSON arguments
  • remotes (>= 2.0.2) Used to install openCPU apps and other dependencies (for third party applications hosted on openCPU) from Github (if required)
  • sys (>= 2.1) Used as cross platform interface to the platform shell
  • webutils (>= 0.6) Used for parsing request payloads encoded with multipart/form-data Or application/www-urlencoded
  • curl (>= 4.0) Used for handling of webhooks and sending mails
  • rappdirs Used for getting the user API directory
  • zip used for creating a zip archive of the session in the session API
  • mime used for fuzzy matching in the content type request header
  • protolite used for reading and writing to Google protocol-buffers
  • brew used encoder for creating reproducible documents, returning txt, markdown, html
  • openssl implementation of md5 used for session key generation

Utilities

  • Utils used for easy array element access (head/tail)
  • grDevices used for generating PDF documents from the graphical device
  • tools used for document rendering: tools::Rd2text(), tools::Rd2html(), tools::Rd2latex()
  • parallel used for request handling on windows implementing the single user server, coordinating the workers
  • stats used for generating random numbers

Suggest (needed for some output types)

  • haven used output encoder for SPASS/SAS output formats, see object API
  • feather used output encoder for feather output format, see object API
  • pander used interface for pandoc document rendering
  • R.rsp RSP provides a powerful markup for controlling the content and output of LaTeX, HTML, Markdown, AsciiDoc, Sweave and knitr documents.
  • Svglite used for converting vector graphics from the graphics device
  • Unix (>= 1.4) wrapper for unix utilities, used for R script execution in the user library

Linux Package dependencies

  • pandoc (2.0.6) is a Haskell library for converting from one markup format to another. Pandoc can convert between numerous markup and word processing formats, including, but not limited to, various flavors of Markdown, HTML, LaTeX and Word docx.
  • R (4.0.5) R is an open source software environment for statistical computing and graphing.
  • httpd Apache webserver, for serving HTTP(S) requests
  • rApache (1.2.9) R module of the apache webserver on RHEL based unix systems. Mapping CLI R process to the apache web server.
  • libapreq Libapreq is a safe, standards-compliant, high-performance library used for parsing HTTP(S) cookies, query-strings and POST data.
  • libcurl using various protocols. In openCPU libcurl is used for calling HTTP(S) resources.
  • prtobuffer Protocol Buffers (a.k.a., protobuf) are Googles language-neutral, platform-neutral, extensible mechanism for serializing structured data. This library is used for supporting protobuf input and output streams from openCPU.
  • openssl OpenSSL is a general-purpose cryptography library. Implements md5 algorithm and handles TLS and SSL protocols.
  • libxml2 Libxml2 is the XML C parser and toolkit developed for the Gnome project. Libxmnl2 is used for rendering documents inside openCPU.
  • libicu Is the interface for supporting Unicode encoding. In openCPU this is used for parsing the request payload.
  • libssh2 Libssh2 is a client-side C library implementing the SSH2 protocol. Libssh2 is required by the remotes R package to interface with GitHub using SSH.
  • cairo Cairo is a 2D graphics library with support for multiple output devices. Cairo is used for graphics support.
This page was last edited on 03 May 2024, 07:57 (UTC).