Serverless: Background, Challenges and Future
Yaron Haviv | January 16, 2017
Serverless computing is the latest buzz, driven by the digital economy’s demand for instant results without the hassle. The serverless concept is a prepacked flavor of modern cloud-native architecture which decomposes applications to multiple stateless and elastic micro-service tiers. Micro-services shrink or grow to satisfy demand; they are restarted in case of a resource failure and change versions without breaking or taking down the entire app.
The cloud-native approach is not entirely new and has roots in SOA and grid computing. It evolved to use containers for workload isolation. Cloud-native is used across hyper-scale applications and SaaS offerings (Google, Facebook, Netflix, eBay, etc.). We at iguazio have designed our platform as a high-performance cloud-native application, because it’s the only way to run continuous applications at scale.
A fundamental requirement to achieving this state of nirvana has to do with keeping micro-services stateless and immutable. This is tough, or at least it has been until our current age of “serverlessness.” Before using it for everything, beware of the limitations and inefficiencies in first gen serverless solutions.
Why is it Tough…? I Can Bring Up My Docker Image in a Snap!
Right, I love Docker! Docker makes it very easy to package and run a micro-service while removing the challenge of installation and dependency. But a commercial cloud-native app requires setting up quite a bit of infrastructure. If the micro-service is stateless, we need to store state somewhere and state has many forms (configuration, data, logs, etc.). Apps require setting up all those different clustered data services for various application patterns, assuming you benchmarked and were able to agree on a data service to begin with. On top of that we need end-to-end security, application gateways, load-balancing, monitoring tools, build and debug tools, etc. All this means we end up with a large cluster maintained by an army of DevOps.
Working in the cloud (or using our platform) does make things simpler, since you consume services through APIs, as opposed to deploying infrastructure/VMs. Cloud also offers a common management and security policy which can shorten time to production.
Welcome to the Serverless World
So, Amazon Web Services offers all those components mentioned above, including object storage, cache, NoSQL, Streaming, Pub/Sub, etc. along with end to end security (IAM) and API gateways - and it’s all API-driven. Amazon built the entire cloud-native infrastructure, instead of customers having to build it for themselves so that all we need is to write the code (and pay). Awesome!
Amazon Lambda modeled three main use cases and event sources:
- Synchronous: service is invoked and provides an immediate response (e.g. HTTP request)
- Async: push a message which will drive an action later (e.g. email or an S3 change notification)
- Streaming: a continuous data flow that needs to be processed (coming from Kinesis or DynamoDB)
You write a function that intercepts the request or message, then process and generate a response. Every invocation is metered, billed and logged. Functions are immutable. Changing the code generates a new version and this version can be automatically deployed. You can also role back in case of a bug.
Amazon recently announced a new workflow tool (Step Functions) that can provision, orchestrate and monitor a complete application comprised of multiple connected functions.
The serverless trend is now getting adopted by all cloud providers, as this is the best way to push various platform services while offering customers the benefit of building apps faster. Unlike Amazon, other solutions in the market are open source with open APIs.
Azure Functions seems to address gaps in AWS Lambda’s usability and functionality. For example, binding functions to predefined inputs and outputs means a simple setup, the reuse of database connections across invocations and the ability to address resource security.
Google Functions implemented cool debugging capabilities for node.js, but I’m less impressed with IBM’s OpenWhisk architecture, which seems to have a rather inefficient design and is less comprehensive.
Some Gaps Still Exist
Amazon and cloud providers offer usability, as usual, but it comes with poor performance and resource utilization. The most notable gap is concurrency.
We’re not getting any younger and CPUs aren’t getting any faster… In recent years we’ve seen a push for better CPU utilization through concurrency which has been adopted by all languages: Java/Scala (nio, Akka), Python (Twisted, gevent), C/C++ (epool), built into GO (routines), node.js, NGINX and the motivation behind HTTP v2. This is not a new concept (read C10k from 1999). Instead of using expensive threads per request, request states are stored in-memory. If and when they stall (e.g. wait for IO), the CPU executes the next task on the same thread. For iguazio’s platform, we developed and open sourced Accelio, which enables highly concurrent zero latency messaging and other major innovations like distributed asynchronous non-volatile memory, to guarantee micro-services and threads NEVER block. Non blocking is also the guiding principle in the nuclio serverless platform allowing it to process 400,000 events per second with a single function processor.
Function instances are executed on different micro-services (processes) by design to address isolation. This means that if you get many HTTP requests which happen to wait on some resource (like a DB), they will get processed by several blocking micro-services. Meanwhile you’ll pay for the CPU and memory idle time, while suffering from greater latency. An IO usually takes 10ms. If your function does a lot of processing (e.g. rendering pictures), or if it sits idle most of the time, this blocking may be invisible but if your function does lots of simple computations, like serving web requests or adding stats to a user session, this can turn into a 100x overhead.
Some solutions like IBM OpenWhisk update function state transitions in a DB and depend on multiple DB, RPC and HTTP calls per invocation, adding a huge overhead. This means “serverless” platforms will require a lot more servers. Serverless implementers must think of resource efficiency if they want it to become mainstream.
There are quite a few challenges in testing and debugging micro-services, especially with Amazon Lambda which is closed source and has proprietary APIs. Let’s say you want to develop and test on a laptop, use break points, write unit tests, or end to end application tests. How would you do that serverlessly? I’m not sure you’d want to pay Amazon for all those random regression tests on a per call basis. Therefore, having open-source APIs and test frameworks that can fit your laptop becomes mandatory.
Other cloud-native capabilities such as the ability to implement rolling upgrades or Canary deployments, or adding more use-cases like batch jobs and continuous services, can help serverless become the common way we deploy cloud-native applications. Those features exist today in micro-services frameworks like Kubernetes, and it’s only a matter of time until these patterns converge.
Given the overall complexities and fear of vendor lock-in, I believe it is important we create and maintain a cloud-vendor neutral serverless eco system, one which different companies can make contributions to, add development and testing tools, various triggers and glue, connections to various data feeds or sinks, etc. iguazio plans on making more open source contribution in that space, so stay tuned.
Update (Oct 2017): nuclio, our new open source serverless platform is now available. Read how it addresses these challenges and how to quickly use it.