Envoy v1.24: The Next Frontier for Service Mesh Innovation
Envoy proxy, the cloud-native community‘s Swiss Army knife for service traffic control, has a new release and it‘s loaded with goodies. In case you missed it, Envoy v1.24 dropped in February 2023, sporting an impressive array of enhancements. As an open-source edge and service proxy designed for flexibility and high performance, Envoy is the foundational building block for some of the world‘s largest microservices deployments.
In this post, we‘ll dive deep into the key features of v1.24, sharing a Linux and proxy expert‘s perspective on their potential impact. We‘ll explore real-world use cases, share performance tips, and paint a picture of how these new powers might shape the future of service mesh architectures.
But first, let‘s level set on why Envoy has become such a critical tool in the cloud-native toolkit.
The Rise of Envoy
Since its creation at Lyft in 2016, Envoy has rapidly gained adoption as the lingua franca for service mesh data planes. It‘s a core component of popular service mesh implementations like Istio, AWS App Mesh, and Consul, and has been battle-tested in production at massive scale by tech giants like Apple, Netflix, and Salesforce.
Just how prevalent is Envoy becoming? Let‘s look at some eye-opening data points:
- Envoy has been downloaded over 12 billion times from Docker Hub
- Commercial service mesh offerings serving Envoy have a 20% CAGR and are expected to reach $2.75 billion by 2026 (source: Envoy Project)
- 95%+ of Kubecon survey respondents in 2022 reported using or planning to use a service mesh for microservices apps (source: CNCF)
Envoy‘s meteoric rise can be chalked up to several key factors:
- Modularity and extensibility: Envoy has a mature filter chain and plugin API that allows for endless customization
- Performance and scalability: Envoy‘s modern C++ codebase and data plane-focused design excel at high-throughput, low-latency proxying
- Broad ecosystem alignment: Envoy enjoys deep integrations with other cloud-native darlings like Kubernetes, Prometheus, and gRPC
In short, Envoy has become the programmable data plane of choice for forward-leaning cloud operators. Each new release promises to push the envelope on the art of the possible for service traffic management.
What‘s New in v1.24
Effortless gRPC-JSON Transcoding
Headlining the v1.24 release is a powerful new gRPC to JSON transcoding filter. This allows clients that speak JSON to seamlessly interact with backend gRPC services without any knowledge of protocol buffers. The transcoder acts as a magic translator, converting JSON requests into gRPC methods and transforming replies back into the JSON responses.
Why is this a big deal? gRPC is a high-performance RPC framework that is rapidly gaining steam for internal microservices communication. However, gRPC‘s use of protocol buffers can be a barrier to adoption for some clients and legacy JSON-based toolchains. The gRPC-JSON transcoder eliminates this friction by exposing gRPC services with a more familiar REST API facade.
Here‘s a quick example of how it works:
- Define your gRPC service methods and request/response messages in a .proto IDL file
- Use protoc to generate a proto descriptor file
- Configure the gRPC-JSON transcoder filter in your Envoy config, specifying the services to transcode
- Fire JSON requests at the proxy and watch them get automagically fulfilled by gRPC services!
The transcoder supports HTTP/1.1 and HTTP/2 JSON mapping with all the bells and whistles like parameter binding, special characters, and payload compression. It‘s a huge step forward for gRPC accessibility and ecosystem integration.
This feature will be a game-changer for orgs that have invested in gRPC for their microservices but still need to cater to external JSON clients. We expect to see strong adoption of the transcoder among API gateway deployments fronting gRPC service meshes.
Flexible Request Processing with Extensible Filters
Envoy v1.24 also introduces a handy new feature for outsourcing request/response processing to external services – the External Processing filter. This generic HTTP filter enables an out-of-band gRPC service to inject custom logic into the filter chain by pausing the stream and receiving the request/response for modification.
The external processing model has a number of gnarly use cases:
- Augmented authentication and authorization decisions
- Payload encryption/decryption or schema transformation
- Advanced logging, tracing, and metrics decoration
- Tapping into legacy policy or control systems
Setting up an external processor is as easy as pie:
- Define and deploy your gRPC processing server with the logic you desire
- Configure Envoy‘s External Processing filter, specifying the gRPC processing service
- Attach the filter to an HTTP filter chain
- Bam! Your requests and responses will now be routed through the external processor
The ability to quickly inject custom logic into the request path without mucking with the Envoy C++ codebase is incredibly powerful. External filters can be deployed and upgraded independently of Envoy, allowing a faster pace of iteration. They also promote more modular and reusable filter logic.
The External Processing filter massively expands the universe of what‘s possible to layer on top of Envoy. It will be exciting to see what sort of wild and wacky processing flows the community cooks up. Expect to see a burgeoning ecosystem of external filters emerge to plug into the most common customization and edge computing use cases.
Turbocharge Resilience with Advanced Connection Pool Controls
V1.24 seriously upgrades Envoy‘s connection pooling system with the addition of a dedicated TCP connection pool and granular circuit breakers. Previously, Envoy allocated upstream connections from a single shared pool with limited knobs for failure isolation. The new model introduces explicit connection pools with discrete configuration and circuit breakers.
This unlocks powerful traffic management capabilities like:
- Segmenting connection pools and circuit breakers by route or class of traffic
- Shielding shared critical backends from noisy neighbors and cascading overload
- Applying independent timeouts, concurrency limits, and passive health checks to pools
- Tuning circuit breakers for different resilience and resource utilization objectives
Defining dedicated connection pools in v1.24 is simple with the new API:
clusters:
- name: backend_service
...
connection_pool:
name: web_pool
type: STRICT_DNS
lb_policy: ROUND_ROBIN
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 1000
max_pending_requests: 2000
connection_pool:
name: batch_pool
type: LOGICAL_DNS
lb_policy: MAGLEV
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 500
Having a portfolio of surgical dials for each upstream enhances Envoy‘s ability to insulate an app from the blast radius of misbehaving clients and partially degraded backends. This is a must-have for operators that need to guarantee the stability and graceful failover of critical services in the mesh.
Connection pool controls have long been a sore spot for advanced Envoy users. The new functionality is a major leveling up that showcases the community‘s commitment to hardening the proxy for even the gnarliest corner case resilience needs. We predict the addition of connection pools will spawn a wave of new thinking around the art of the possible for layer 4 chaos engineering and graceful degradation.
Streamlined Observability with Typed Metadata
Envoy‘s core dynamic metadata system is a Swiss Army knife for affixing arbitrary key-value pairs to traffic as it percolates through the mesh. Operators leverage it to propagate everything from request IDs to A/B testing cohorts to authorization contexts in a decoupled way.
V1.24 extends this capability with first-class support for strongly typed metadata schemas. Now you can define the structure and types of your metadata in proto IDL and reference it with type safety throughout your Envoy config!
Here‘s how it works in practice:
- Define your metadata schema(s) as proto messages
- Populate the metadata on requests using the typed setters in the new stream info API
- Reference the typed metadata for routing, logging, etc. using the same proto accessors
This eliminates a major source of ambiguity and allows for much more expressive matching semantics over dynamic metadata. For example:
routes:
- match:
dynamic_metadata:
- filter: custom.metadata.auth
path:
- key: auth_context
value:
authenticated: true
auth_method: OIDC
Typed metadata is a sleeper hit feature that will pay major dividends in configuration ergonomics and observability pipelines. It dovetails nicely with external processing filters to enable sophisticated traffic labeling and attribution at the mesh boundary.
We expect to see rapid uptake of typed metadata, especially among mature Envoy deployments struggling to unify and govern sprawling metadata taxonomies. The consistency and expressiveness gains are simply too good to pass up.
Maximum Performance with Adaptive Concurrency Limits
Last but not least, v1.24 packs a nifty bonus feature for the performance geeks – adaptive concurrency control! This shiny new overload manager action automatically tunes the per-worker concurrency limit to maximize throughput and maintain tail latency SLOs.
The adaptive limiter works by using latency measurements to identify the ideal concurrency level via a gradient descent algorithm. It‘s a clever way to automate the tricky balancing act between resource utilization and quality of service.
Onboarding the adaptive limiter is a snap:
overload_manager:
refresh_interval: 0.25s
actions:
- name: "envoy.overload_actions.adaptive_concurrency"
triggers:
- name: "envoy.resource_monitors.concurrency"
scaled:
severity: 0.8
min_concurrency: 3
max_concurrency: 100
This configures the limiter to adjust concurrency between 3 and 100 per worker when measured concurrency exceeds 80% saturation.
The adaptive concurrency controller is emblematic of a trend we‘re seeing towards increased automation of low-level optimization tasks in Envoy. As the proxy matures, expect to see more and more auto-magical resource controllers that allow operators to manage the data plane with high-level SLOs instead of imperative settings.
While still experimental, the adaptive limiter shows immense promise for dynamically rightsizing Envoy deployments to handle bursty and unpredictable workloads. It‘s a tantalizing preview of the art of the possible for autonomous proxies that can intelligently shed load and enforce SLOs.
Frequently Asked Questions
Q: How much configuration change is required to adopt v1.24?
A: The good news is that most of the marquee features like the gRPC-JSON transcoder, External Processing filter, and adaptive concurrency controller only require some additional configuration to enable. Typed metadata is the only feature that requires defining new proto IDL. Overall, the upgrade burden for existing Envoy deployments should be quite manageable.
Q: What is the performance overhead of the new features?
A: The Envoy community takes great care to implement new features in an efficient, production-grade manner. Based on early testing, the gRPC-JSON transcoder and External Processing filters have a negligible impact on P99 latency when properly tuned. Adaptive concurrency limiting has also proven effective at sustaining throughput under overload scenarios. Of course, your mileage may vary based on the specific types of processing logic being introduced.
Q: How battle-tested are these features? Are they ready for production?
A: The Envoy maintainers have a high bar for the stability and test coverage of code merged to the mainline release branch. While v1.24 is hot off the presses, the new headline features have undergone extensive pre-release vetting by the community. Many of the features like gRPC-JSON transcoding and typed metadata have already seen successful production usage at orgs that track master closely. As with any fresh release, cautious canary testing is always advisable.
The Road Ahead for Envoy
With every release, Envoy makes a more compelling case as the universal data plane for the cloud. The v1.24 release is a testament to the relentless pace of innovation happening in the Envoy community and the deep domain expertise of the contributors.
As Envoy‘s feature surface area grows, so too does its gravitational pull on the cloud-native ecosystem. It‘s becoming increasingly clear that Envoy will be at the heart of every service mesh and API gateway deployment for the foreseeable future.
Looking ahead, we can expect to see Envoy push the envelope on a few key fronts:
- Tighter integration with service mesh control planes like Istio and Open Service Mesh
- Continued abstraction of low-level resources behind intent-driven APIs and controllers
- More first-party support for auxiliary data planes like Thrift, Kafka, and Redis
- Richer observability and tracing integrations for holistic understanding of app behavior
This is an exciting time to be an Envoy user. The new features in v1.24 crack open new doors for powerful traffic control and lay the foundation for a more automated and self-optimizing data plane. We can‘t wait to see how the community embraces and extends these new capabilities!