Result-as-a-Service (RaaS): Persistent Helper Functions in a Serverless Offering

Serverless Computing or Functions-as-a-Service (FaaS) is an execution model for cloud computing environments where the cloud provider executes a piece of code (a function) by dynamically allocating resources. When a function has not been executed for a long time or is being executed for the first time, a new container has to be created, and the execution environment has to be in-itialized resulting in a cold start. Cold start can result in a higher latency. We propose a new computing and execution model for cloud environments called Result-as-a-Service (RaaS), which aims to reduce the computational cost and overhead while achieving high availability. In between successive calls to a function, a persistent function can help in successive calls by precomputing the functions for different possible arguments and then distributing the results when a matching function call is found.


Introduction
Serverless Computing or Functions-as-a-Service (FaaS) is an execution model for cloud computing environments where the cloud provider executes a piece of code (a function) by dynamically allocating resources [1] [2]. In the serverless computing model, the code is structured into functions. The functions are triggered by events such as an HTTP request to an API gateway, a record written to a database, a new file uploaded to cloud storage, a new message inserted into a messaging queue, a monitoring alert, and a scheduled event. When a function is triggered by an event, the cloud provider launches a container and executes the function within the container. Some important concepts related to serverless computing are described as follows: How to cite this paper: Bahga • Push and Pull Models of Invocation: Functions in a serverless offering are invoked by event sources, which can be a Cloud service or a custom application that publishes events. The event-based invocation has two modes: push and pull.
• Concurrent Execution: Concurrent execution refers to the number of executions of the functions which are happening at the same time. Cloud providers set limits on concurrent executions. • Execution Duration: Cloud providers set a time-out limit under which a function execution must complete. If the function takes a long time to execute than the timeout limit, the function execution is terminated.
• Container Reuse: Cloud providers typically use containers for executing the functions in their serverless offerings. A container helps in isolating the execution of a function from other functions. When a function is invoked for the first time (or after a long time), a container is created, the execution environment is initialized, and the function code is loaded. The container is reused for subsequent invocations of the same function that happen within a certain period.
• Cold and Warm Functions: When a function has not been executed for a long time or is being executed for the first time, a new container has to be created, and the execution environment has to be initialized. This is called a cold start. Cold start can result in a higher latency as a new container has to be initialized. The cloud provider may reuse the container for subsequent invocations of the same functions within a short period. In this case, the function is said to be warm and takes much less time to execute than a cold start.
The key contributions of this work are 1) a new computing and execution model for cloud environments called Result-as-a-Service (RaaS) is proposed over FaaS, which aims to reduce the computational cost and overhead while achieving high availability, 2) an approach for optimizing FaaS offerings by introducing a library of "persistent helper functions" is proposed, 3) an analytical model and an algorithm for maximizing the performance in a serverless offering is presented, and 4) an implementation case study using persistent helper functions is presented.
In [14], Azari and Koc have presented an approach for partitioning tasks be-tween hardware and software to improve performance. We have adapted this approach for modeling speedup from using persistent helper functions in a RaaS offering.

Proposed Approach
We propose a method for optimizing FaaS offerings by introducing a library of persistent helper functions that are not billed like the functions in a FaaS. The persistent helper functions can "turbo" boost the execution by prefetching data and precomputing logic. In between successive calls to a function, a persistent function can help in successive calls by precomputing the outcomes for different possible arguments and then distributing the results when a matching function call is found. This makes function calls faster and also reduces load since common computation is shared by the cloud provider across millions of calls that can share the common precomputed values. Different third parties can compete to provide helper functions that different retail users can leverage, thus creating a Persistent Functions marketplace, much like an "app store" [15].
There are two reasons why RaaS is favored over FaaS. Firstly, as a consequence of cost-savings when scaling, the proposed pricing model is detached from the computational process expected by the on-demand request and is likely to be much lower when users are incurring on the shared service rather than individual functions with the same purpose. Secondly, we demonstrate the round-trip latency is significantly reduced after the precomputation of the expected values, thereby achieving high availability on request. The new model aims to meet the requirements of low-latency applications such as smart metering, smart cities, autonomous vehicles, wearable devices, among others, to reduce the cost of compute-intensive tasks.
An app store of persistent helper functions from third parties and cloud providers can help accelerate and optimize the use of serverless applications in the cloud context. Sophisticated identification, linkage, and lifecycle licensing modules allow applications and helper functions to be scaled, priced competitively, and also allow privacy through authentication and encryption.

RaaS: FaaS Offering with Persistent Helpers
In this section, we present a new computing and execution model for cloud environments called Result-as-a-Service (RaaS). RaaS is an enhancement over FaaS as it reduces the computational cost and overhead while achieving high availability through the use of persistent helper functions. Figure 1 shows the architecture of a RaaS offering. The components in the RaaS architecture are as follows: • Load Balancer: Load balancer routes events/requests to servers, which ultimately invokes the functions which are executed within containers running on the servers. If a server has a hot container for a function already running, the request is routed to that server.
• RaaS Server: Figure 2 shows the architecture of a RaaS server, which executes functions within containers and returns a response to clients. Functions are invoked by event sources. The event-based invocation has two modes: push and pull. The server also handles CRUD (create, read, update and delete) operations for setting up functions. When a server runs a function for the first time, it caches the function image and starts a hot container. If a container is already running, the server routes the function call to the running container. The response from a function execution is then sent back to the load balancer. The server maintains a pool of containers for persistent

Features of Persistent Helper Functions
• Stateful: A key differentiating factor of persistent helper functions from existing FaaS offerings is that the persistent helper functions can be stateful, whereas functions are stateless and any state information has to be separately maintained in a state database. • Configuration and Customization: The persistent helper functions can be configured or customized to be used in different functions.
• Third Party Libraries: The persistent helper functions may use a third-party library or may be developed by the user.

A function in a serverless offering is represented as a Control Data Flow Graph
A. Bahga et al. (CDFG), as shown in Figure 3. There are two types of nodes in a CDFG: data flow nodes and decision nodes. A data flow node is a piece of code that has a single entry point, single exit point and no condition, whereas, a decision node is a piece of code which has at least one condition. Nodes can be persisted and the profitvalue determines the benefit from persistence in memory or database. For each node in the CDFG, the actual execution time (Ti') and the execution time of a persisted version (Ti) is determined. The profit value for each node is the difference (Ti'−Ti).
We present an algorithm to partition portions of a function (nodes in CDFG representation of a function) into two sets-persisted and not-persisted, as follows: Set of nodes which are not persisted: { } The goal of this algorithm is to maximize the performance by using persistent helper functions given constraints such as memory used, database read/write capacity used or database size. The speedup from using persistent helpers can then be computed as follows:

Implementation Case Study
To evaluate the proposed approach, we developed a reference application for sentiment analysis of social media posts such as tweets from Twitter as shown in Figure 4. A custom listener component fetches tweets using the Twitter API and posts the tweets to an API gateway endpoint which triggers a function in a serverless offering to compute sentiment of each tweet. The computed sentiments are stored in a database. A web application presents the sentiment analysis results.
Different approaches can be used to compute sentiment of tweets such as a sentiment analysis function that uses a sentiment lexicon, a third-party library such as Python TextBlob, or a web-based NLP service such as AWS Comprehend. In the FaaS version of the function where no persistence is used, one of the above three approaches is used to analyze each tweet. Whereas in the RaaS version, a persistent helper service is set up, which stores the computed sentiments in memory or a database, and the function which processes the tweets uses this service. Whenever there is a request from the function to persistent helper to compute sentiment, the persistent helper service checks if the tweet has been evaluated before. If the sentiment is not found in memory or database, it is A. Bahga et al. computed and stored. Otherwise, if the sentiment had previously been computed, the stored results are returned, thus saving time by avoiding a redundant repeated computation.
We developed and deployed a series of functions into AWS Lambda and tested two different conditions: 1) single tweet text and 2) random tweet text. The functions ran in a cold and warm state, each with different memory sizes. We used AWS Comprehend to analyze the text and derive the sentiment and used AWS API Gateway as the RESTful API to handle incoming GET request from the client.

Experimental Results
To evaluate the performance of RaaS approach over FaaS we measured the run times of the functions in RaaS and FaaS versions of the reference application shown in Figure 4.
For the FaaS version, we used a Lambda function set up in the AWS Lambda service, which computes sentiments using the AWS Comprehend service. Whereas in the RaaS version, we used a Lambda function set up in the AWS Lambda service along with a persistent helper service that computes and stores sentiments in memory.
We evaluated the cold run and warm run performance of the functions in the FaaS and RaaS versions. The cold runs measure the behavior of functions when provisioned for the very first time. We took a number of measurements of function run times by varying the container memory size. Figure 5 shows the cold and warm run performance for FaaS and RaaS versions. For reference, we also show the predicted performance with persistence, which is estimated using the model described in Section 3.3. As seen from the cold and warm run charts, the predicted performance closely matches the actual performance. Figure 6 shows the results for an alternative implementation of persistent helper service that computes and stores sentiments in a NoSQL database instead of memory. For the single tweet text condition, we extracted a single tweet from a training dataset that contains 5000 tweets. Our first test consisted of sentiment analysis on the text without persisting the data, and as performed in both a cold  and warm state. For the second test, the text was persisted in the database for both cold and warm states. In the random test condition, we used the training dataset to randomly sample the sentiment analysis in both states.
In both the cold and warm run experiments (with persistence in memory and in database), we observed that the average run time improves by increasing the memory allocated. This happens because the CPU capacity allocated to containers executing the functions also increases as the memory allocated is increased.
AWS Lambda states that every time memory is doubled, the CPU capacity is also doubled. Further, we observed that the RaaS approach (with persistence) outperforms the FaaS approach (no persistence).

Conclusion and Future Work
We presented an approach for optimizing FaaS offerings by introducing persistent helper functions, which can boost the execution by prefetching data and precomputing logic. Future work will focus on extending an open-source serverless offering such as OpenFaaS to support persistent helper functions and creating a dashboard to display the status of persistent helper functions instantiated by the user, their cost along with other runtime expenses and workload utilization.