OpenCPU is build on top of the Apache webserver. With the version 2.0 auf the apache webserver almost every functionality comes as a module. For openCPU the most important module is mod_R which forwards HTTP(S) requests to the embedded R interpreter.
The configurations to the R process are applied when starting the apache webserver. On startup the R script /usr/lib/opencpu/rapache/onstartup.R
is executed. This loads the configuration from /etc/opencpu/server.conf
and instantiated the
parent and the child R processes as defined by the MPM module in apache.
Using the openCPU package, the R handler of the rApache webserver is set to opencpu:::rapachehandler
. In the request handler the in-built apache module apreq
is prevented from parsing POST requests. The complete request data is collected from rapache
and forwarded to the opencpu::serve
* function. From this point in time the request is handled inside R, and there is no difference in the logical flow between a single user server (development/testing) on windows and the cloud server.
Depending on the request method HTTP(S) verb, different actions are triggered:
opencpu::main
function* for the further processing of the requestopencpu:::run_worker
for further request handlingThe entry point for the handling of all requests is the R function main.R
. If the request data is in raw format, the RAW payload is parsed (see section Parsing the request payload).
The request object is initiated, the response object reset and the processing of the request is started using the function opencpu::httpget()
.
The request URL is parsed, based on the called API different request handlers are called. See the section API entry point for details on the request handling.
The request payload parsing is implemented in the opencpu::parse_post()
function. Depending on content type header of
the request (payload encoding) different handlers are called. The handlers are called using fuzzy logic for matching the specified content type header with the implementation. The table below, lists the implemented payload encodings.
Encoding | Request handler | Description |
---|---|---|
Multipart/form-data | opencpu::multipart |
Parses the request payload using webutils::parse_http() . Depending of the filed encoding,the following decodes are used: I() .jsonlite::fromJSON() or protolite::unserilize_pb() |
application/x-www-form-urlencoded | webutils::parse_query |
Parses the request payload using webutils::parse_query() |
application/json | jsonlite::fromJSON |
Parses the request payload using jsonlite::fromJSON() |
application/x-protobuf | protolite::unserialize_pb |
Parses the request payload using protolite::unserialize_pb() |
All packages used for input parsing are written and maintained by the openCPU core team, assuring compatibility with the openCPU version lifecycle.
Primitives and R code arguments are mapped to R values in the parse_arg()
function, called by the execute_function()
implementation.
Sessions arguments are mapped to the sessionID::.val object in the parse_arg()
function. The namespace is evaluated using session_eval.
The main API root is located at {server-root}/ocpu. The API root can be reconfigured by changing the location mapping in
/etc/httpd/conf.d/opencpu.conf
. Therefore all third party applications should implement the API root as a configuration parameter.
The request handling is performed by splitting the URL by slash. In every function the head element is evaluated, the required actions are performed in the request handler and the remaining URL is passed to the next functions/request handlers. The entry point for the complete request handling is the R function httpget()
. Based on the first parameter the following sub APIs/request handlers are called:
Parameter Request handler Description
lib or library Httpget_library Checks if the library API is enabled (server.conf) and finds the library path in the filesystem, afterwards requests are handled using httpgte_package implementing the R package API
tmp httpget_tmp Checks if the tmp API is enabled (server.conf The session path is extracted and the session API httpget_session() is called for further request handling
doc httpget_doc Finds doc dir and path using R_DOC_DIR variable, the documentation is returned using httpget_file()
user httpget_user Checks if the user in the URL is created as a linux user on the OS.
apps or github httpget_apps Checks if the app is installed, searches the package path containing the app and forward this path to the package API for further request handling.
webhook httpget_webhook API endpoint, which allow the AC-DC admin to register a GitHub Repo for CI of an R package, The R package is installed using remotes::install_github. See the webhook API.
test httpget_testapp Gets the path to the openCPU test page (included in the www directory of the openCPU R package) The webpage is sent to the client using res$sendfile
openCPU apps are static web pages (html, css, js) which are included in an R package. They interface the R functions in this package through the OpenCPU API's. By convention, these apps are placed in the /inst/www/ directory of the R source package.
The package library is located at {API-root}/apps/{pkgname}/{parameter}
. The API can be enabled/disabled using enable.api.apps config parameter.
The apps API, checks if the app is installed, searches the package path containing the app and forward this path to the package API for further request handling.
The OpenCPU cloud server includes support for continuous integration (CI). Thereby a GitHub repository can be configured to automatically install a package on an OpenCPU server, every time a commit is pushed to the master branch. To take advantage of this feature, it is required that:
The sub API receives the post commit hook of GitHub. At the moment only GitHub Inc. repos are supported no stash, etc. After receiving the hook from GitHub the package is installed using remotes::install_github()
. If enabled, the repo owner is notified about the CI task by sending an E-mail using the SMTP server in the configuration file.
The package API is used to interact with the R packages installed to the global R package library.
The package library is located at {API-root}/lib/{pkgname}/{parameter}
. The API can be enabled/disabled using enable.api.library config parameter. In the package API requested packages and their dependencies not present in the preload configuration are loaded. Packages are loaded by Name from the global R package library. If multiple versions of the same package are installed on the system, the latest (installed) version is loaded.
Based on the request parameter in the request URL, the following request handlers are called:
Parameter Request handler Description
R httpget_package_r Loads the package from the request library and lists the R objects exported by the package. See object API for further information.
data httpget_package_data Data included with this package. Datasets are objects, see R object API.
html httpget_package_html Manuals (help pages) included in this package. Helpfiles are rendered from Rd to HTML tools::Rd2HTML
man httpget_package_man Retrieve help page about topic in output format. Manuals can be formatted into text, html or pdf
info httpget_package_info Show information about this package.
A session is a container that holds resources created from a remote function/script call (RPC). In openCPU every HTTP(S) POST request is mapped to a session, containing all data produced and available during the request execution.
The session API is mapped to the following URL: {API-root}/tmp/{sessionID}/{resource}
. The API can be enabled/disabled using enable.api.tmp config parameter. Based on the present resource in the request URL, the following handlers are called:
Parameter Request handler Description Content-Type
R httpget_session_r Reloads the session from set in object
.RData. Checks if the API
specified object is
present in the session
environment. GET request
are forwarded to the
object API for formatting,
in case of POST request
the object is evaluated
using execute_function()
graphics httpget_session_graphics Reloads graphics packages set in object
if required. Extracts the API
graphics from .REval and
creates a list of plots
which is passed to the
object API for output
formatting
files httpget_session_files Sanitize file path from set in object
traversal attacks and pass API
file_path to the file API
for request handling
source httpget_session_source Extracts the source output set in object
from .REval and creates API
a list object, which is
passed to the object API
for output formatting
console httpget_session_console Extracts the console set in object
output from .REval and API
creates a list object,
which is passed to the
object API for output
formatting
warnings httpget_session_warnings Extracts the warnings set in object
messages from .REval and API
creates a list object,
which is passed to the
object API for output
formatting
messages httpget_session_messages Extracts the messages from set in object
.REval and creates a API
list object, which is
passed to the object API
for output formatting
stdout httpget_session_stdout Extracts stdout from set in object
.REval and creates a API
list object, which is
passed to the object API
for output formatting
info httpget_session_info Reads the session set in object
information from .RInfo API
file and pass the
sessionInfo object to the
object API for output
formatting
zip httpget_session_zip Creates a zip file application/zip
(.zip) of the session
directory using
zip::zip, and starts
download of the compressed
dictionary
The package API is used to read R objects, or call R functions.
The session API is mapped to the following URL: {API-root}/.../{R\|data}/{object}/{format}
The object API is always enabled and can not be disabled.
The object API has two main request handling paths depending on the request method. For GET requests the httpget_object() function is used to route the object to the correct object serializer for the output formatting. Implemented output formats and the used sterilizers are listed in section package output formatting.
If the request is using the HTTP(S) verb POST, the request handler execute_function()
for parsing the function arguments and constructing the function call. The function call is executed using the handler session_eval()
. The evaluation is described in the section below.
The POST request can specify some additional format parameter at the end of the URL e.g. /json. If the format parameter is present the object API is called to format the returned object and directly returned to the client. The following format shortcuts are supported: png, svg, pdf, svglite, print, md, bin, csv, feather, json, rda, rds, pb, tab, ndjson,console. If no format parameter is specified, a preview with relative paths to the session API is returned in the response body. The object API checks if the output object can be converted to the specified output format based on the datatype of the captured .val object.
The output formatting is implemented in the http_get_object
function (objectAPI), depending on the output format specification the following sterilizer are called and Content-Type header is set accordingly. If an object can be converted depends if there is an implementation available for the specified encoder. Custom data structures can be added to some encodes using R internal method dispatch.
Format | Request handler | Encoder | Content-type |
---|---|---|---|
bin | httpget_object_bin | Base::writeBin |
application/octet-stream |
csv | httpget_object_csv | Base::write.csv |
text/csv; charset=utf-8 |
feather | httpget_object_feather | feather::write_feather |
application/feather |
spss | httpget_object_spss | haven::write_sav |
application/spss-sav |
sas | httpget_object_sas | haven::write_sas |
application/sas7bdat |
stata | httpget_object_stata | haven::write_dta |
application/stata-dta |
tab | httpget_object_tab | Base::Write.table |
text/plain; charset=utf-8 |
json | httpget_object_json | jsonlite ::toJSON |
application/json |
ndjson | httpget_object_ndjson | jsonlite::stream_out |
application/x-ndjson charset=utf-8 |
md | httpget_object_md | pander::pander |
text/plain |
httpget_object_print | print |
text/plain | |
text | httpget_object_text | cat |
mimetype |
asci | httpget_object_ascii | deparse |
text/plain |
rda | httpget_object_rda | save |
application/octet-stream |
rds | httpget_object_rds | saveRDS |
application/r-rds |
pb | httpget_object_pb | protolite::serialize_pb |
application/x-protobuf |
png | httpget_object_png | base::png |
image/png |
httpget_object_pdf | base::pdf |
application/pdf | |
svg | httpget_object_svg | svg |
image/svg+xml |
svglite | httpget_object_svglite | svglite::svglite |
image/svg+xml |
For the implementation of the sterilizers, it is referred to the official package manual on CRAN:
For all post Request a working directory is created. As directory name the session ID is used. The session ID is created using the rand_bytes
function from the openssl package, the key length in the server.conf
is used to define the number of possible sessions.
In the request session the function call is constructed. This includes:
The actual evaluation of the function call is performed using the evaluate
function from the evaluate package. Compare to eval()
, evaluate captures all of the information necessary to recreate the output as if you had copied and pasted the code into a R terminal. It captures messages, warnings, errors and output, all correctly interleaved in the order in which they occurred. It stores the final result, whether or not it should be visible, and the contents of the current graphics device.
Evaluate is executed using the stop_on_error = 1 argument. In this case the evaluation is performed until an error is occurred, the execution is stopped at the point of the error and all results are returned. With this behavior error messages can be captured and forwarded to the user.
OpenCPU is implementing a custom output_handler()
defined in evaluate_input()
during the evaluation. The custom handler is saving the return value to .val in the session environment and the error object is saved to the global namespace.
For further information regarding the evaluation of a function call, see the official manual on CRAN.
The session API is mapped to the following URL: {API-root}/user/{userid}/lib}/\*
. The API can be enabled/disabled
using enable.api.user config parameter.
The user API can be access from all unix users created on the instance running openCPU and listed in /etc/passwd. OpenCPU is reading this file for user validation.
After the user is validated, by checking the unix password file, the requested library is loaded from the user library path, instead of the global library. Package dependencies might still be loaded from the global library.
The remaining URL is passed to the R package API for further request handling.
The R file API has two different behavior based on the request method (HTTP verb). GET requests are sent to the clients using the res$sendfile()
method. All POST requests to the R file API are handled in execute_file()
. Based on the format specification in the request URL, the correct handler for building the function call is executed. The handlers construct the function call and pass it to the session_eval()
function to execute the call, creating the document. The following document formats are supported:
Format | Request handler | Interpreter | Type |
---|---|---|---|
file.r | httppost_rscript | evaluate::evaluate |
R script |
file.rnw | httppost_knittex | knitr::knit knitr/sweave tools::texi2pdf |
|
file.rmd | httppost_knitpandoc | knitr::knit ,knitr::pandoc |
knitr/markdown |
file.rmd | httppost_knit | knitr::knit |
knitr |
file.brew | httppost_brew | brew::brew |
brew |
file.md | httppost_pandoc | knitr::pandoc |
markdown |
file.tex | httppost_latex | tools::texi2pdf |
latex |
The only qualified and tested instance of openCPU can be obtained using RHEL as OS. For using the full capabilities of openCPU has dependencies R packages and some linux utilities. The following sections are listing the openCPU dependencies:
The openCPU imports the following R packages:
Utilities
tools::Rd2text()
, tools::Rd2html()
, tools::Rd2latex()
Suggest (needed for some output types)