Você está na página 1de 7

BROUGHT TO YOU IN PARTNERSHIP WITH

CONTENTS

öö Introduction to Serverless
Computing

Introduction to ö ö Common Use Cases for Serverless

ö ö What Is Monitoring? What Is


Observability?

Serverless Monitoring ö ö Detecting Performance Problems


in Serverless

ö ö Implementation of Monitoring

ö ö Glossary

WRITTEN BY EMRAH SAMDAN


VP OF PRODUCTS AT THUNDRA

Introduction to Serverless Computing However, in exchange for the pay-as-you-go service and simplicity
Cloud migration has enabled software companies to outsource in operations, software developers lost the ability to monitor their
their old-school, in-house servers to cloud providers thanks to an serverless applications. This is due to developers uploading their code
infrastructure-as-a-service approach. After that, IT admins and onto cloud platforms that do not provide access and control of the cloud-
operations people continued to manage the provisioning, patching, based environment. This Refcard looks into challenges of monitoring
and securing of their cloud servers. and observability within serverless architectures and explains how a
monitoring system for a serverless architecture can be implemented.
Eventually, cloud providers came up with a new cloud execution model
where they fully manage the provisioning and the allocation of these
servers. In this model, called function-as-a-service or serverless, cloud
Common Use Cases for Serverless
As the concept of serverless propagates through the software
vendors run your applications in a stateless, ephemeral container
industry, applications of it are seemingly boundless. Serverless now
when triggered by an event.
enjoys center stage in various fields that range from APIs all the way
There are many different event triggers, such as a direct API call, a to machine learning.
message written to a queue, a signal sent by an IoT device, and so on.
FaaS arose as a business model thanks to its pay-as-you-go billing
method. With this, cloud providers charge their customers only for the
time the function is running. This is particularly useful for companies
that have unpredictable traffic because they can scale automatically,
and they don't pay for idle resources.

AWS announced its Lambda service as a FaaS solution back in


November 2014. Azure and Google Cloud Functions followed in early
2016 and March 2017 respectively.

1
INTRODUCTION TO SERVERLESS MONITORING

This is because serverless is not only mitigating the pains associated training can be broken down and fit into different serverless functions,
with managing personal servers but is also providing cost-effective with each one scaling accordingly to meet the computational needs of
and scalable solutions to the industry. Described below are some of that specific step. Therefore, engineers can ensure that resources are
the use cases with serverless (this is not intended as an exhaustive list). not wasted and costs are managed effectively.

API CREATION AND MANAGEMENT Many argue that machine learning engineers can achieve the same
One of the most popular use cases regarding serverless technology is models by using containers without worrying about the limitations
to build API services. A fundamental problem with APIs is scalability, that the young serverless technologies currently impose. However, the
hence developers are choosing serverless due to its scalability and complexity involved in setting up and managing these containers can be
pay-as-you-go capabilities. even more daunting than developing the machine learning model itself.

Today, there are over 21,200 APIs in the world, all with different traffic Serverless alleviates these complexities, allowing machine learning
needs. By having an API on a serverless architecture, developers can engineers to focus solely on their models. The other alternative is pro-
ensure that if there are sudden spikes in traffic, the resources will scale visioning servers, which only adds additional costs and the complexity
accordingly to meet the needs of the incoming requests. Furthermore, of initializing and managing the infrastructure. Considering the options
providers of these APIs can greatly cut costs as serverless adopts the available, serverless has provided a simple alternative to machine learn-
pay-as-you-go ideology — if an API has no traffic, no resources will be ing engineers, allowing them to focus more on breaking the bounds of
consumed by the API, so there are no charges. These advantages make computational science rather than fixing the links between their servers.
serverless a favored tool among new startups, as reported in Digital
Ocean's quarterly report. What Is Monitoring? What Is Observability?
Monitoring has always been in the life of operations people as the act
IOT SUPPORT
of keeping an eye on the specific set of metrics to ensure the health of
Internet of Things is yet another sector of technology exploring
the system. Observability, on the other hand, is the state of the system
serverless. This is because IoT is aimed at "dumb" devices connecting
in which ops people can go deeper into their system when facing an
to web-based resources. The simplicity and scalability provided by
issue in production.
serverless is appealing.
To summarize, people can monitor their systems while a system can
Unlike conventional APIs, IoT devices do not see many requests being
be observable. In this sense, observability is regarded as the superset
made to their web endpoints. For this reason, having dedicated
of monitoring. In her book Distributed Observability, Cindy Sridharan
resources leads to unnecessary costs. Since there are no charges when
summarizes the relation.
there are no requests, this makes serverless an ideal choice.

Furthermore, setting up a serverless architecture allows developers 


and product managers to abstract the server-side layer away. That
means IoT providers can simply focus on producing devices and busi-
ness logic on the web. Using IoT with serverless is quite rudimentary, 
   
as described by Andy Warzon, the founder and CTO of Trek10.    



 




Additionally, cloud providers, such as AWS and Azure, provide reliable
and trusted services to build IoT applications. Due to the reliability
and security provided within these provided environments as well as 
the continuous improvements attributed to the competition in the
field, developing and providing IoT applications becomes easier, more
Observability requires instrumentation that emits actionable informa-
cost-effective, and more reliable.
tion out of system behavior. By instrumenting the code, developers

MACHINE LEARNING aim to understand the problems with traces, metrics, and logs when
Machine learning engineers face a daunting task even before they something unexpected happens.
write their first line of code: setting up their environment to run their
Developers can instrument their code manually by printing logs and
code. However, with serverless, the complexity of the environment
injecting spans to the necessary places, or automatically by using
setup can now be offloaded to cloud providers.
third-party tools or even by writing their own tools to gather a specific
Additionally, the architecture that serverless implies also benefits ma- set of information by injecting monitoring code either at runtime or at
chine learning models when being developed. Various steps in model compile time.

3 BROUGHT TO YOU IN PARTNERSHIP WITH


INTRODUCTION TO SERVERLESS MONITORING

THREE PILLARS OF OBSERVABILITY another transaction starting in the first Lambda. For this reason, it is
In order to make a system observable, developers need the three very hard to detect a problem that occurred in this chain of invoca-
pillars of observability to promote the transparency required to under- tions. There are some vendor-specific solutions for distributed tracing,
stand how a system is working. They are not interchangeable and are but it mostly requires developers to write instrumentation code.
effective when all of them are present for developers and ops people.

1. Metrics give you an initial idea of the overall health of your


system. The metrics that are important can vary depending on
the system. For example, invocation duration is an important
metric because a jump in invocation duration usually means an
unhealthy condition in the serverless architecture. Metrics are
the starting point of the investigation because they signal devel-
opers when an unexpected situation occurs. However, a detailed
analysis requires other pillars.

2. Traces enable developers to measure the time spent in various


places of their applications. Using traces, developers have an
overview of how their system is behaving. Local tracing shows
the measurement of time in a closed system, i.e. single server-
less functions, while distributed tracing shows the measurement
of time in a distributed architecture. This is particularly useful in
serverless environments.

3. Logs constitute the third and oldest of the observability pillars.


It is basically the reflection of print statements while you're
debugging. It can be helpful while solving an issue, but it is
advised to have different log levels and structured logging in
order to find what you're looking for in a pile of logs. Even then, All three pillars of observability are required for meaningful and action-
developers need to implement sampling methods in order to able insights. However, monitoring serverless applications is difficult,
reduce the logging throughput of their systems for faster debug- as you cannot add monitoring tools into the cloud environment due
ging and minimizing storage size. to the black-box nature of serverless. As a result, your monitoring tool
needs to run alongside your serverless functions.

THE ROLE OF OBSERVABILITY AND MONITORING IN SERVERLESS This stresses the need for a lightweight monitoring tool that adds little
Monitoring in a serverless architecture can be cumbersome with met- overhead to your primary functions before publishing the collected
rics limited to what the cloud provider makes available. monitoring data. This data must be sent and processed by a backend
service, but this can only be done at the end of the main function exe-
For example, AWS Lambda only provides invocation counts, durations,
cution. This means that the serverless platform should not close after
error ratios, concurrent executions, and throttles. There are also logs
the serverless function has been executed, but rather after the moni-
that you can check — but discovering an issue with logs is like looking
toring data has been sent. This can be hazardous to customer-facing
for a needle in a haystack unless you have structured logging.
applications and result in longer invocation durations.
Serverless transactions are different from microservices because
they are event-driven by nature. Transactions are mainly a chain of SERVERLESS MONITORING AND OBSERVABILITY USE CASES
asynchronous invocations of serverless functions. In such a case, Although the computing model shifts to a completely new paradigm
distributed tracing is crucial to understanding the full lifecycle of a with serverless, the monitoring and observability objectives haven't
serverless transaction. changed. Either with monitoring reactively or by observability proac-
tively, the first objective is to avoid customer dissatisfaction due to
The following image depicts a serverless transaction consisting of performance degradation or an outage.
seven AWS Lambda functions, an SNS topic, an SQS queue, a Kinesis
Stream, an API Gateway route, an S3 bucket, and a Firehose stream. Some sub-objectives can be listed as understanding the root cause
When a transaction ends in the last Lambda function, there might be of errors, detecting bottlenecks, reducing MTTR, and so on. Moreover,

4 BROUGHT TO YOU IN PARTNERSHIP WITH


INTRODUCTION TO SERVERLESS MONITORING

another objective arose from the serverless pay-as-you-go model: duration. Therefore, developers should fine-tune their allocated
controlling the cost. Serverless systems can be more robust because memory by monitoring the memory and CPU usage of their functions.
they're based on managed serverless components, but they have the
Moreover, they aim to decrease the time spent with the function by
potential to be expensive when not used properly.
managing external services carefully and warming up containers to
For this reason, monitoring use cases can be grouped into the avoid cold starts. For this purpose, they need to closely monitor their
category of either detecting performance problems in the times of serverless architecture and external services.
debugging, troubleshooting, and monitoring or controlling costs for
serverless transactions. Implementation of Monitoring
Although cloud vendors provide their own monitoring systems,
developers usually need third-party implementations in order to have
Detecting Performance Problems in Serverless
actionable details to decrease the mean time to resolve (MTTR) in a
Serverless architectures have performance problems unique to
serverless architecture. Instrumentation, either manual or automated,
themselves. These include cold starts, timeouts, and long-running
is critical to having all three pillars of observability achieve an
functions due to performance problems associated with downstream
aggregated view of your system.
services. Traditional problems due to bad coding practices still exist in
serverless architectures. Instrumentation is generally achieved by adding a library that
automatically injects code or lets the developer inject instrumentation
Cold starts occur when a function is invoked for the first time. It takes
because it is not possible to put an agent where the function runs (as
time to initialize the container and load the dependencies that your
it runs in a different container at every invocation). Adding this library
function requires. The frequency and duration of cold starts depends
as a Lambda Layer is the AWS-specific way of doing so, which eases the
on the programming language you choose, the memory configuration
monitoring. However, these libraries generate large amounts of data
of your Lambda function, and its security configuration. Cold starts
that eventually becomes hard to store and process. For this reason,
are inevitable, but their frequency and duration can be reduced by
sampling plays an important role to pursue a healthy monitoring system.
warm-up modules provided by the community. For customer-facing
latency issues, it's important to determine if the culprit is a cold start
WHAT TO WATCH FOR THROUGHOUT THE SOFTWARE
or another problem. DEVELOPMENT CYCLE
Monitoring requirements for serverless applications vary depending on
Timeouts are also critical in the performance of serverless applications.
the application's stage. Developers want to fine-tune their applications
AWS Lambdas can execute up to 15 minutes, but it is advised that users
during development and testing before they proceed to production.
set the timeout limit to 10 seconds. This is due to the general belief
Traces and logs are more important at this stage in order to see the
that if a service does not respond in the first 10 seconds, it should
potential bottlenecks before they become serious in production.
be expected to timeout. Cloud vendors alert users if timeouts occur,
and during such an instance, the best practice is to check if there is a Running an instrumented serverless architecture in a development en-
problem with downstream services and apply some circuit breakers vironment on a cloud vendor reveals the problematic issues. After going
limiting interaction with the problematic service. serverless, traces and logs are still useful for troubleshooting erroneous
conditions, while metrics are crucial to staying informed about the gen-
Timeouts can usually be expected due to CPU intensive operations,
eral system health. A typical example of this is understanding the root
but the CPU-intensive part of your function can be difficult to detect.
cause by checking the traces and logs causing a jump in the invocation
Therefore, local tracing and/or CPU/memory profiling are required to
duration as you get alerted when the duration of a single function and/
analyze what parts of the functions are using system resources and
or whole serverless transaction exceeds a threshold.
contributing to the invocation duration.
MANUAL AND AUTOMATED INSTRUMENTATION IN SERVERLESS
Although the serverless computing model is useful because of pay-as- OBSERVABILITY
you-go billing, it's important to keep your eye on the costs attached Instrumentation is the process whereby serverless architectures are
with serverless architectures. The cost of serverless functions depends configured to enable monitoring. That means you would specify what
on the total duration and allocated memory for the serverless function. parts of your architecture you would like to monitor, and depending
on the monitoring tool of your choice, you could perform manual or
Hence, developers should work on decreasing the invocation duration
automated instrumentation or even a combination of both.
or decreasing the memory when possible to control cost att ached with
serverless. Note that invocation duration depends on the allocated What does this mean? Manual instrumentation is when you yourself
memory, as well, because more memory means more CPU power specify the individual parts of your serverless application you would
for Lambda functions, which eventually may mean less invocation like monitored. Automatic instrumentation is when you specify the

5 BROUGHT TO YOU IN PARTNERSHIP WITH


INTRODUCTION TO SERVERLESS MONITORING

general pattern of your serverless functions and the tool then monitors SYNCHRONOUS VS. ASYNCHRONOUS MONITORING
all components that lie within the defined pattern. There are two ways you can publish your monitoring data: synchronously
or asynchronously. There are various reasons why you may choose to
Both methods have their respective pros and cons. The method you
publish monitoring data synchronously or asynchronously. However,
choose depends on your monitoring goals. For example, manual
according to a whitepaper by AWS released in November 2017, it is
instrumentation allows the ultimate customization of how you would
recommended that monitoring data be published asynchronously.
like to proceed with monitoring your serverless application, but it also
increases the relative setup complexity. On the contrary, automatic This is because synchronous publication monitoring data can lead
instrumentation greatly reduces the entire instrumentation process, to overhead in your serverless functions; it entails longer execution
even leading to a single line of configuration to attain detailed trace times since your monitoring data has to be returned at the end of
data illustrating the behavior of your serverless functions. your invocation. Other reasons that could be problematic are if the
publishing failed or if you are behind a secure environment such as
However, you may not achieve the detailed and customized monitor-
an Amazon VPC. That would mean that your monitoring tool cannot
ing you are looking for, so the solution is usually a combination of both
access the generated data, and overall, you have either unreliable or
methods to achieve overall observability of your serverless archi-
potentially no monitoring data available.
tecture, with manual instrumentation providing the more in-depth
analysis of principal functions. In that case, it is advised to monitor your serverless architecture
asynchronously by taking advantage of the cloud vendor's data
HOW TO ADD MONITORING AS LAMBDA LAYERS logging capabilities. For example, your monitoring data could be
AWS introduced Lambda Layers at the end of 2018 at the AWS re:Invent transferred to AWS CloudWatch, and regardless of the execution
conference. Since then, tasks such as monitoring have become simpler, duration or structure of your serverless function, your monitoring
as serverless developers can now add the monitoring tool of their tool can always access the monitoring data from CloudWatch. Even
choice as a Lambda Layer. Essentially, a Layer is a zipped archive that though synchronous monitoring is simple and easy to configure,
can contain anything from required libraries to custom runtimes. This asynchronous monitoring has major benefits that should not
package, in the form of a Lambda Layer, can then be shared across be ignored. The following image shows the implementation of
multiple functions once it is loaded into your AWS Lambda console. asynchronous monitoring with AWS Lambda.

With Lambda Layers, configuring your monitoring tools has become


easier, as you now simply import your monitoring tool in the form of a
Layer on top of your Lambda function. You can further configure your
monitoring tool with environment variables with no need for manual
configuration in the code itself.

In AWS, monitoring tools can be added to your Lambda environment


via the Lambda console. Within the console, you can click on Layers in
the Designer view and simply add the packaged monitoring library of
your choice. After ensuring that the correct layer version and package
have been loaded, you can initialize your monitoring tool with environ-
ment variables. That means, depending on the monitoring tool you are
using, you can pass in your monitoring credentials, perform instru-
SAMPLING MONITORING DATA
mentation, and even enable sampling of your data. You can achieve
Monitoring serverless applications can be costly, especially when
comprehensive observability without adding a single line of code.
trying to achieve the three pillars of observability. Moreover, using
structured logs to make searching easier further exasperates costs
associated with logging. To tackle this problem, it is advised to
implement a sampling mechanism that reduces the amount of
monitoring data reported. Sampling is wise because not every single
trace, metric, and log is necessary for every single invocation of a
serverless architecture for keeping the system healthy.

The question here is how to sample the monitoring data. Traditional


approaches offer rule-based systems that only record, for example,
one out of every 1,000 invocations or one invocation every 10 minutes.

6 BROUGHT TO YOU IN PARTNERSHIP WITH


INTRODUCTION TO SERVERLESS MONITORING

However, you may miss important invocations with rule-based • Invocation: An execution of a serverless function; either
sampling. In order to reduce stored data for serverless monitoring triggered by other events or periodically.
effectively, you must apply intelligent sampling rules. • Lambda Layers: A bundled package of libraries, dependencies,
and/or runtimes that can be incorporated into your Lambda
For example, you only need the traces when the duration exceeds a
functions separately and shared across multiple functions.
certain threshold or a specific type of error occurs. An intelligent sam-
Lambda Layers are provided by AWS and were introduced at
pling system should be able to:
AWS re:Invent 2018.
• Filter out non-erroneous invocations and gather every detail
• Log: The print statements and messages logging agents used
about problematic invocations with specific error types.
in a codebase. These logs could be messages pertaining to
• Filter out invocations with expected durations and gather every relevant errors, simple print statements, or serialized output.
detail about long-running invocations to reveal bottlenecks. • Metrics: Statistical data related to the computation and
performance of a function. These statistics could range from
Glossary
information about cold starts and erroneous invocations, all
• AWS X-ray: Application management tool provided by
the way to average memory usage and CPU usage throughout
AWS to allow you to monitor your cloud systems, including
your serverless architecture.
serverless applications.
• Trace: A representation charting out single functional events
• Cold start: The process whereby a serverless container is
in the form of a trace span. Events that are a result of other
initialized upon invocation of a function. This setup of these
events are represented as child spans below the main event.
environments leads to latency between the invocation being For example, a function calls another function, and this is
sent and the actual execution of the function. represented as the calling function being the parent span and

• Function: A block of executable code that is uploaded the called function being the child span, and their behavior
and data transactions are mapped out accordingly.
to a functions-as-a-service platform. These functions are
usually event-driven as they are executed on the occurrence • Trace chart: A graphical representation of trace data that
of another event or run periodically depending on your shows how much time each part of the system takes to
configuration. complete during function execution.

Written by Emrah Samdan, VP of Products at Thundra


Emrah Samdan is VP of Products at Thundra. Organization committee for Serverless Turkey and ServerlessDays
Istanbul. Helping serverless developers have observability on their applications.

Devada, Inc.
600 Park Offices Drive
Suite 150
Research Triangle Park, NC

888.678.0399 919.678.0300
DZone communities deliver over 6 million pages each month
to more than 3.3 million software developers, architects, Copyright © 2019 DZone, Inc. All rights reserved. No part of this
and decision makers. DZone offers something for everyone, publication may be reproduced, stored in a retrieval system, or
including news, tutorials, cheat sheets, research guides, fea- transmitted, in any form or by means electronic, mechanical,
ture articles, source code, and more. "DZone is a developer’s photocopying, or otherwise, without prior written permission of
dream," says PC Magazine. the publisher.

7 BROUGHT TO YOU IN PARTNERSHIP WITH

Você também pode gostar