MLOps Live

Join our webinar on Improving LLM Accuracy & Performance w/ Databricks - Tuesday 30th of April 2024 - 12 noon EST

Can Open-Source Serverless Be Simpler than Lambda?

Yaron Haviv | April 10, 2018

While browsing the CNCF Serverless Slack channel recently, I noticed a message; someone needed help writing a function which processes S3 update events. He didn’t want to use AWS Lambda and alternatively was looking for an open source serverless solution over Kubernetes. I took on the challenge of writing, as a response, a function for the open source nuclio high-performance serverless event and data processing platform. It was simpler than you would imagine.

There is a constant debate around whether to use Serverless (managed cloud) or FaaS (open source functions-as-a-service). Serverless platforms are simpler, fully orchestrated and cost less (only per-invocation). Why would anybody ever want to use open-source Serverless/FaaS? Probably when reality hits hard…

I attended the recent Serverlessconf in Paris where serverless worriers presented real-world use-cases. One of them described a scenario in which the function needed more execution time than what Amazon Lambda permits and was too slow due to lack of concurrency. She had to break the code to smaller tasks, use S3 to store intermediate state, SQS for intermediate messages and somehow make it work.

As you may have guessed it took far more time and money than they anticipated, and instead of a single function call she ended up with 65,000 calls! At a certain point the workflow accidentally took down the entire company service.

Open-source serverless is not always as integrated as cloud provider services, but it gives you far more choices in setting your own parameters, choosing which data or API gateways you want to use, local debugging, avoiding cold start if you want and its faster.

If you’re into microservices, you probably already have a Kubernetes cluster up and running — either one you’ve installed or a Google/ Azure/ AWS managed Kubernetes service. Deploying open source Serverless platforms like nuclio, OpenFaaS or Kubeless takes a few extra commands, but it’s only done once, uses all the underlying Kubernetes services and is portable. The process is not as hard as some cloud providers want you to think. Several managed Kubernetes services offer billing per function container execution time, so even the argument that states that “managed” Serverless, unlike FaaS, is “pay-per-use,” might soon be gone.

Test Case: Writing and Deploying a Real Function

I will prove my point with a common function use-case: watching a native cloud provider service (AWS S3) and acting every time there is an update. The function is written in Go and uses the nuclio serverless platform over Kubernetes.

If you haven’t heard about nuclio you can read more in a previous post. In a nutshell, nuclio is an extremely fast serverless platform with a real-time processing engine (100x faster than AWS Lambda), which supports a large variety of native event sources like HTTP, Cron and mainstream messaging or streaming options (Kafka, AWS Kinesis, Azure Event-hub, RabbitMQ, NATS, MQTT, Google Pub/Sub, iguazio v3io). nuclio has many unique features like versioning, project hierarchy, integrated structured logging/debugging, stream processing, UI and more.

nuclio is progressing rapidly and now supports all major languages (Py, Go, Java, Node.js, C#/.Net, Shell, Binary). To setup nuclio locally on Docker (standalone), Kubernetes or over a managed Kubernetes offering, check out the instructions on nuclio Github.

How Does My Function Work?

Working with AWS, when you place an object in S3 you can ask to be notified via Amazon SQS (Simple Queue Service), SNS (Simple Notification Service) or directly trigger a Lambda Function. I chose SNS since it can simply trigger an HTTP request to a pre-registered HTTP end-point.

When browsing the internet, I quickly found an example SNS and S3 update message and a Go library with headers for all Amazon event structures. You can read AWS docs or follow this simpler blog-post which explains how to set up the S3 bucket to generate notifications and how to setup the SNS service to forward them to an email/HTTP (just switch the email destination in the example with your HTTP end-point). These steps must be done in any case, even when you use the Amazon native Lambda service.

When watching the function debug log, I quickly discovered that the first message it generates is a subscription confirmation message and added a way to handle it in the function (responding to it with an HTTP GET request). See the example function below and the full source with documentation in this link.

The function starts by un-marshaling the SNS JSON message (the event body). If it’s a “SubscriptionConfirmation” message it responds with a confirmation, then it un-marshals the S3 update message (found encoded inside the SNS message) and prints all the S3 object details such as bucket name, object key, object size, etc.

You’re probably asking yourself how I created a deployment package/zip with all the external packages and dependencies (like you do in Lambda). nuclio is smart enough to notice your dependencies and downloads them automatically for you during its function build process. If you want to have special packaging instructions (like adding custom files), specify it in the function spec using standard dockerfile/linux commands.

func Handler(context *nuclio.Context, event nuclio.Event) (interface{}, error) {

// non intrusive structured Debug log (runs only if level is set to debug)
context.Logger.DebugWith("Process document", "body", string(event.GetBody()))

// Get body, assume it is the right HTTP Post event, can add error checking
body := event.GetBody()

snsEvent := snsevt.Record{}
err := json.Unmarshal([]byte(body),&snsEvent)
if err != nil {
return "", err
}

context.Logger.InfoWith("Got SNS Event", "type", snsEvent.Type)

if snsEvent.Type == "SubscriptionConfirmation" {

// need to confirm registration on first time
context.Logger.DebugWith("Handle Subscription Confirmation",
"TopicArn", snsEvent.TopicARN,
"Message", snsEvent.Message)

resp, err := http.Get(snsEvent.SubscribeURL)
if err != nil {
context.Logger.ErrorWith("Failed to confirm SNS Subscription", "resp", resp, "err", err)
}

return "", nil
}

// Unmarshal S3 event, can add validations e.g. check if snsEvent.TopicArn has the right topic
myEvent := s3evt.Event{}
err = json.Unmarshal([]byte(snsEvent.Message),&myEvent)
if err != nil {
return "", err
}

// Log the details of the S3 Update
record := myEvent.Records[0].S3
context.Logger.InfoWith("S3 Details", "bucket", record.Bucket.Name,
"key", record.Object.Key, "size", record.Object.Size)

// handle your S3 event here
// ...

return "", nil
}

Notice the use of a built-in structured, multi-level logging in nuclio (context.Logger…). It made it simple to debug my function or observe it in production in a non-intrusive way (the log verbosity level can be determined at runtime).

Since I’m lazy, love auto-completions as I type and built-in debuggers, I wrote my function in JetBrain’s Goland IDE and tested it locally (on my laptop) with the nuclio testing package against a sample S3/SNS message.

Here is the nuclio function unit testing code:

func TestS3Watch(t *testing.T) {
// Initialize a test context (verbose = true)
tc, err := nutest.NewTestContext(Handler, true, nil )
if err != nil {
t.Fatal(err)
}

// Create a test event (eventString is a simulated event Json)
testEvent := nutest.TestEvent{ Path: "", Body: []byte(eventString) }

// Invoke the tested function
resp, err := tc.Invoke(&testEvent)
tc.Logger.InfoWith("Run complete", "resp", resp, "err", err)
}

For the function to listen on a public URL and intercept AWS messages, I had to configure an API gateway which forwards HTTP requests to my function based on a path prefix. With nuclio’s UI, the process simple (I mean, have you tried setting an AWS API Gateway?). Go to the triggers tab and add an HTTP trigger with the required path and preferences, it will automatically provision a Kubernetes Ingress (API Gateway) service for you.

I wanted to send a fully working code example, so I used nuclio configure decorations. It lets you set any function spec attribute (environment variables, build dependencies, trigger configurations, etc.) through inline code comments (see below). I used it to specify the API gateway configuration (path = /mys3hook) and now you can take this code as is and run it anywhere without any manual configuration. Just deploy it via the UI, CLI, or Web API.

// @nuclio.configure
//
// function.yaml:
// spec:
// triggers:
// myHttpTrigger:
// maxWorkers: 4
// kind: "http"
// attributes:
// ingresses:
// http:
// paths:
// - "/mys3hook"

This function is now one of the built-in nuclio playground (UI) examples. It can be coupled with other cool object processing function examples found in the playground like text file pattern searches, image thumbnail generation, face recognition or sentiment analysis.

Serverless’ greatest advantage is automating your development workflow while ignoring the underline (server) infrastructure and it’s not limited to cloud provider offerings!

In many cases open-source serverless alternatives deliver more value:

  • Provide better control and customization options,
  • Can be deployed on-prem, on your laptop or in the cloud of your choice,
  • Are not limited to a specific cloud provider’s APIs and trigger options,
  • Deliver faster performance,
  • Are even Simpler.

(This post by Yaron Haviv was initially published on The New Stack).