Career Development Resources

DZone's Featured Career Development Resources

6 Books That Changed How I Think About Software Engineering in 2026

By Otavio Santana

CORE

Reading is essential for everyone, and especially for software engineers. Our field centers on managing and advancing knowledge. As technologies and architectural paradigms evolve and challenges grow more complex, continuous learning becomes fundamental. In 2025, I read 34 books spanning philosophy, history, economics, and software engineering. While these subjects may seem unrelated to coding, they all aim to deepen our understanding of systems, whether in societies, economies, or software architectures. This article highlights six books that stood out for software engineers. Each offers lessons beyond technical implementation, covering strategy, leadership, learning, and design — skills that grow in importance as engineers progress in their careers. Some of these books are rereads. Revisiting valuable books often reveals new insights as our perspectives evolve. What once seemed theoretical may become highly practical when we encounter similar situations in real projects. Let’s start with a book that addresses one of the most misunderstood topics in engineering organizations: strategy. Crafting Engineering Strategy One of the most impactful books I read in 2025 was Crafting Engineering Strategy: How Thoughtful Decisions Solve Complex Problems by Will Larson. Many engineers assume their organization lacks an engineering strategy. In reality, most organizations already have one — it just might not be effective, explicit, or aligned with the company’s goals. Will Larson, also known for An Elegant Puzzle and as a staff engineer, provides a practical guide to navigating technical and organizational complexity through structured strategy. The book is especially valuable for senior engineers, architects, and engineering leaders who influence decisions beyond code. The author presents a repeatable process for building actionable engineering strategies, from diagnosing problems to communicating and implementing initiatives. Real-world examples from companies like Stripe, Uber, and Calm show how strategy shapes decisions on platform migrations, API deprecations, and infrastructure investments. Some of the most valuable lessons include: Building durable engineering strategies from first principlesApplying techniques such as Wardley Mapping and systems modelingLeading strategic initiatives as a staff+ engineer or engineering executiveLearning from real case studies across different industriesImproving long-term influence through structured thinking Engineering strategy is often seen as abstract or reserved for executives. This book clarifies that strategy is the structured alignment of technical decisions with long-term goals. While strategy and technical insight are essential, they are not the only factors in a successful engineering career. Often, the real differentiator is less technical. Emotional Intelligence Emotional Intelligence by Daniel Goleman offers an important perspective for software engineers: technical skills alone are not enough. In many organizations, engineers with strong technical capabilities are surprised when others — sometimes with less technical expertise — reach leadership positions faster. It is tempting to assume that the system is unfair. In reality, another factor is often at play: emotional intelligence. Daniel Goleman’s groundbreaking work explores how human behavior is shaped by two complementary systems: the rational mind and the emotional mind. While traditional intelligence (IQ) measures analytical ability, emotional intelligence (EI) includes qualities such as: Self-awarenessSelf-regulationEmpathySocial skillsMotivation These capabilities strongly influence collaboration, conflict resolution, communication, and leadership. Drawing on psychological and neurological research, Goleman explains why some with high IQs struggle professionally while others with moderate IQs succeed. Emotional intelligence shapes our ability to build trust, influence others, and navigate complex social environments — skills that grow in importance as engineers move into architectural or leadership roles. Another powerful insight from the book is that emotional intelligence is not fixed at birth. While childhood experiences shape it, EI can be developed throughout adulthood through reflection, feedback, and intentional practice. Recognizing this aspect of growth changes how we view engineering careers. The most successful engineers are not only technically strong but also understand people, teams, and organizational dynamics. This naturally brings us to the next topic: how engineering teams actually function and succeed in practice. Leading Effective Engineering Teams Leading Effective Engineering Teams by Addy Osmani is another standout book from my 2025 reading list. Drawing on over a decade with the Chrome team at Google, Osmani examines what makes engineering teams effective. The book addresses both individual contributors and engineering managers. One of the key themes of the book is the distinction between efficiency, effectiveness, and productivity — three concepts that are often used interchangeably but actually represent very different things. Efficiency focuses on doing tasks quickly.Productivity measures output.Effectiveness measures whether the work actually delivers meaningful impact. In engineering teams, optimizing the wrong metric can cause problems. Teams focused solely on productivity may generate large volumes of code without delivering real value. Osmani emphasizes that effective teams are built on trust, accountability, and clear communication. The book offers practical guidance on topics such as hiring, mentoring, career growth, and building sustainable engineering culture. Some highlights include: Traits of highly effective engineers and teamsTechniques for fostering trust and accountabilityStrategies to minimize friction in collaborationSystems thinking approaches for daily engineering decisions.Methods for improving visibility and recognition within organizations The most valuable lesson is that engineering excellence is rarely achieved alone. It almost always results from a healthy team culture. Once we understand how teams function, the next natural question becomes: how should we design the systems those teams build? This leads us to a topic that is often misunderstood in software architecture. Balancing Coupling in Software Design When software engineers first study architecture, one concept appears repeatedly: coupling. The message is almost always the same: coupling is bad. However, Balancing Coupling in Software Design by Vlad Khononov challenges this simplistic perspective. Coupling is not inherently bad. In fact, it is unavoidable. Every design decision we make introduces some form of coupling. The real challenge is understanding and controlling it. Khononov explores how coupling affects modularity, system evolution, and long-term maintainability. The book builds upon decades of research in software engineering while adapting those concepts to modern architectural practices such as microservices, domain-driven design, and distributed systems. Rather than treating coupling as something to eliminate, the book presents it as a design dimension that must be balanced. Some key insights include: Understanding different types of coupling in software systemsUsing coupling intentionally to manage complexityRecognizing trade-offs between modularity and system cohesionApplying design principles that support long-term evolution This perspective is especially valuable for architects and senior engineers who must balance flexibility, performance, and maintainability. Even the best design principles are ineffective if engineers cannot continuously learn and adapt. Given the rapid pace of change in our industry, learning is a core engineering skill. Ultralearning Ultralearning: The Essential Guide to Mastering Hard Skills and Future-Proofing Your Career by Scott H. Young focuses on one of the most critical abilities for modern professionals: learning efficiently. Software engineers constantly encounter new frameworks, languages, architectures, and methodologies. The challenge is not only learning new technologies but also deciding what is worth learning. Young introduces the concept of ultralearning, an intense and structured approach to mastering complex skills quickly. The book presents nine principles that help individuals learn deeply and effectively through self-directed education. Some of the ideas explored include: Direct learning through real projectsStrategic practice and feedback loopsRetrieval-based learning instead of passive readingExperimentation and adaptation of learning strategies The book highlights historical and modern ultralearners, such as Benjamin Franklin, Richard Feynman, and Judit Polgár, showing that structured self-learning has long driven mastery. For software engineers, this mindset is particularly valuable. The industry evolves rapidly, and those who learn efficiently gain a significant advantage over time. However, learning and design are only part of the equation. Without effective knowledge sharing, teams and organizations struggle to stay aligned. Docs Like Code Documentation remains one of the most underestimated aspects of software engineering. In many organizations, teams fall into one of two extremes. Either documentation is almost nonexistent, forcing engineers to rely on meetings and tribal knowledge, or there is an overwhelming amount of documentation that becomes outdated and ignored. Docs Like Code: Collaborate and Automate to Improve Technical Documentation introduces a more balanced approach. The core idea is simple: Treat documentation the same way we treat code. This means applying practices such as: Version controlCode reviewsContinuous integrationAutomated validationCollaborative workflows By integrating documentation into the development lifecycle, teams can ensure that knowledge evolves alongside the codebase. The result is documentation that remains relevant, maintainable, and useful, rather than becoming an abandoned artifact. For engineers focused on system design and long-term maintainability, this approach transforms documentation from a bureaucratic task into an essential engineering practice. Final Thoughts Reading remains one of the most powerful habits a software engineer can develop. The books highlighted here address various aspects of engineering growth: strategy, emotional intelligence, team dynamics, architectural design, learning, and documentation. Together, they offer a broader perspective on growing beyond coding to become a more complete engineer. Software engineering is not only about building systems. It also involves understanding complex environments, collaborating with others, making strategic decisions, and continuously learning. Sometimes, the best way to improve as an engineer is simply to start with a good book. More

Accelerating Your Software Engineering Career With Open Source and Jakarta EE

By Otavio Santana

CORE

For decades, software engineering followed a relatively predictable path: learn the language, master the tools, deliver results, and progress. That model is quietly breaking. Today, engineers are expected to do more than build systems — they are expected to influence decisions, communicate across teams, and demonstrate impact beyond their immediate environment. Yet most career advice still focuses solely on improving technical skills. This creates a gap. In this article, we explore how open source — especially through Jakarta EE — fills that gap, turning everyday engineering work into something visible, scalable, and career-defining. The Challenge of Modern Software Careers Once we accept that technical excellence alone is no longer enough, the next question becomes unavoidable: What actually sustains a software engineering career today? The industry has changed in subtle but significant ways. Stability has decreased, expectations have expanded, and the definition of value has shifted. Engineers are no longer evaluated only by their ability to deliver features, but by their capacity to influence decisions, communicate ideas, and operate beyond the boundaries of their immediate team. This creates a tension. Many engineers continue to invest heavily in technical preparation — learning frameworks, improving coding practices, studying architecture — yet still feel stuck. The issue is not always a lack of effort, but often a mismatch between effort and opportunity. Preparation, in isolation, does not scale if it remains invisible. Historically, engineering was never just about tools. The term itself comes from ingenium, referring to ingenuity, creative problem-solving, and the capacity to devise solutions under constraint. That older meaning matters because it reminds us that engineering is not simply technical execution; it is the disciplined application of judgment. But judgment alone does not guarantee opportunity. This is where Seneca becomes surprisingly modern. He is often paraphrased as saying that luck is what happens when preparation meets opportunity. Whether we call it luck, chance, or timing, the principle is the same: opportunity favors those who are already in motion. In the context of a software career, this means waiting to become visible only when the perfect opportunity arises is already too late. We need preparation, certainly, but also visibility and adaptability, because in practice these are what allow preparation to encounter opportunity at all. That is why the real challenge of the modern career is not only becoming good, but becoming discoverable, credible, and ready. And this is exactly where open source and open standards begin to matter. Open Source and Open Standards as Career Leverage Open source is often misunderstood. It is frequently treated as a side activity, something optional or even altruistic. But if we examine it more carefully, open source functions as a mechanism for making work visible at scale. It transforms private effort into public evidence. Instead of describing your experience, you expose it. Instead of claiming expertise, you demonstrate it. This distinction matters because traditional career signals — résumés, certifications, interviews — attempt to infer capability. Open source reduces that distance. It allows others to see how you think, how you collaborate, how you respond to criticism, and how you improve an idea over time. In that sense, open source becomes more than a technical activity. It becomes a form of preparation made visible. And that returns us to Seneca’s insight: Preparation without contact with the world remains incomplete. It is only when knowledge becomes visible and testable in public that it can truly meet opportunity. But open source alone is only part of the picture. To understand why it can have such a strong effect on a career, we need to add another concept: open standards. Historically, standards have been among the great enablers of civilization. Shared language allowed cooperation beyond small groups. Writing preserved thought across generations. Standard units of measure made trade, engineering, and science reliable. Human progress did not scale merely because people were talented; it scaled because meaning became shareable. Software is no exception. As systems become larger and more interconnected, a lack of standards leads to fragmentation, lock-in, and unnecessary complexity. Open standards address this by defining shared expectations independently of a single implementation. They create stability without demanding uniformity from vendors. When open source and open standards work together, something unusual happens. Open source creates transparency, collaboration, and visibility. Open standards create consistency, interoperability, and durability. One opens the door to participation; the other ensures that what is built can endure beyond a single company or framework. For software engineers, this combination is particularly powerful. It means that contributing is not only about fixing code or adding features. It is also about entering into a wider conversation about how systems should be designed, how technologies should evolve, and how collaboration can scale across organizations. This is why open ecosystems are so valuable for a career. They do not merely improve technical skill; they train judgment, communication, and long-term thinking. And few examples in enterprise Java illustrate this intersection as clearly as Jakarta EE. Jakarta EE: Where Open Source Meets Enterprise Reality Jakarta EE represents a convergence of these ideas in the Java ecosystem. At its core, it provides vendor-neutral APIs intended for long-lived enterprise applications. On the surface, that may sound like a technical description. In reality, it reflects a broader philosophy: software should evolve without forcing organizations into permanent dependency on a single implementation. This matters because enterprise systems are rarely short-lived. They are designed to survive years of evolving requirements, teams, and infrastructure. Without standards, this continuity becomes fragile. With them, systems gain a degree of resilience and predictability. That is why Jakarta EE is more than a framework discussion. It is an example of how open standards and open source can coexist to serve real business needs. It provides a shared foundation while still allowing multiple implementations, vendors, and runtimes to participate. This reduces fragmentation and makes enterprise Java more coherent over time. For engineers, engaging with Jakarta EE introduces a deeper layer of professional growth. The questions shift from local implementation details to broader design concerns. How should an API behave across environments? How do we preserve compatibility while allowing evolution? How do we create something that remains useful beyond the immediate preferences of one team or one company? These are not only coding questions. They are architectural and even philosophical questions, because they concern continuity, cooperation, and trade-offs over time. And that brings us back, quietly, to the same principle. If modern careers require preparation, visibility, and adaptability, then Jakarta EE offers a space where all three can be exercised together. It is certainly technical work, but it is also public, collaborative, and durable work. In other words, it is preparation in a form that has a real chance of meeting an opportunity. Still, understanding the value of such an ecosystem is one thing. Applying it to daily life is another. Applying This in Practice: From Knowledge to Career Movement Knowing that open source and standards can accelerate a career does not, in itself, change anything. The practical question is how to make this part of one’s professional life without turning it into burnout or abstraction. The first answer is consistency. Many engineers approach open source in bursts of enthusiasm, contributing intensely for a few days or weekends and then disappearing. But careers, much like reputations, are built less by intensity than by continuity. Seneca, in his Stoic way, repeatedly emphasized discipline over impulse. That applies here as well. A small, consistent contribution is often more transformative than an occasional heroic effort. The second answer is to treat open source as training, not as performance. At the beginning, the work may feel invisible or unpaid, and that can be discouraging. But this is precisely where long-term thinking matters. You are not only contributing code; you are learning to write clearly, to discuss ideas, to review systems critically, and to operate in public. These are career assets that compound. The third answer is communication. No meaningful open ecosystem works without it. Engineers must learn to explain decisions, respond respectfully, document clearly, and engage across cultures. This is one reason English becomes so important in practice. In software, English functions almost like musical notation in music: it is the medium through which participation becomes possible at scale. Learning it early is not merely a linguistic advantage; it is access to the broader conversation. The fourth answer is balance. A career is not strengthened by sacrificing everything to it. One of the oldest philosophical lessons, not only in Stoicism but in ethics more broadly, is that discipline without measure becomes self-destruction. Open source should expand your life, not consume it. Saying no, focusing on what matters, and accepting that no one can master the entirety of IT are signs of maturity, not weakness. And finally, there is the matter of visibility. Being skilled is essential, but it is not enough if your work never leaves the confines of the private sector. Visibility is not vanity. Properly understood, it is the process by which trust becomes possible. When people can see what you build, how you reason, and how you contribute, they have something concrete on which to base their confidence. Over time, this changes the nature of career opportunities. Instead of constantly needing to prove yourself from zero, your work begins to speak ahead of you. Conclusion: Preparation Meeting Opportunity If there is a single thread connecting all of this, it is the old Stoic insight we started with: Opportunity does not belong to those who merely hope for it, but to those who prepare in a way that allows chance to find them. That is why the modern software career cannot be reduced to technical competence alone. Preparation still matters, but it must now be visible and adaptable. Open source gives that preparation a public form. Open standards give it structure and durability. Jakarta EE shows how both can come together in a practical, long-lived, and globally relevant enterprise setting. The result is more than better code. It is credibility, trust, and a career foundation that extends beyond a single employer or moment in the market. In uncertain times, that may be the closest thing to stability we can build for ourselves. And perhaps Seneca would recognize the pattern immediately: we do not control when opportunity appears, but we can control whether we are ready when it does. More

Serverless Glue Jobs at Scale: Where the Bottlenecks Really Are

By Vivek Venkatesan

AI Is Rewriting How Product Managers and Engineers Build Together

By Raman Aulakh

The Human Bottleneck in DevOps: Automating Knowledge with AIOps and SECI

By Dippu Kumar Singh

Building a 300 Channel Video Encoding Server

Snapshot Organization: NETINT, Supermicro, and Ampere® Computing Problem: The demand for high-quality live video streaming has surged, putting pressure on operational costs and user expectations. Legacy x86 processors struggle to handle the intensive video processing tasks required for modern streaming. Solution: NETINT reimagined the video transcoding server by combining their Quadra VPUs with Ampere® Altra® Processor, creating a smaller, faster, and more cost-effective server. This new server architecture allows for advanced video processing capabilities, including AI inference tasks and automated subtitling using OpenAI’s Whisper. Key features: High performance: Capable of simultaneously transcoding multiple video streams (e.g., 95x 1080i30, 195x 720i30).Cost-effective: Reduces operational costs by 80% compared to traditional x86-based solutions.Advanced processing: Supports deinterlacing, software decoding, and AI inference tasks.Flexible control: Managed via FFmpeg, GStreamer, SDK, or NETINT’s Bitstreams Edge application interface. Technical innovations: Custom ASICs: NETINT’s proprietary ASICs for high-quality, low-cost video processing.Ampere® Altra® processor: Provides unprecedented efficiency and performance, optimized for dense computing environments.Optimized software: Utilizes the latest FFmpeg releases and Arm64 NEON SIMD instructions for significant performance improvements. Impact: The collaboration between NETINT, Supermicro, and Ampere has resulted in a groundbreaking live video server that: Increases throughput by 20x compared to software on x86. Operates at a fraction of the cost. Expands system functionality to support video formats not natively supported by NETINT’s VPU. Enables accurate, real-time transcription of live broadcasts through automated subtitling. Introduction The demand for high-quality live video streaming has grown exponentially in recent years. In both developed and emerging markets, operational costs are under pressure while user expectations are expanding. This led NETINT to reimagine the video transcoding server, resulting in a live video server that opens new video processing capabilities created in collaboration with Supermicro and Ampere Computing. A unique aspect of this architecture is that while NETINT VPUs handle intensive video encoding and transcoding, a powerful host CPU can perform additional functions, such as deinterlacing and software decoding, that the VPU doesn’t support in hardware. Additionally, a powerful host CPU can perform AI inference tasks. NETINT recently announced the industry-first automated subtitling using OpenAI’s Whisper, optimized for the Ampere® Altra® processor, which enables accurate, real-time transcription of live broadcasts. This server performs video deinterlacing and transcoding in a dense, high-performance, and cost-effective manner not possible with legacy x86 processors. Powered by Ampere CPUs, the server performs video processing and transcoding tasks in a dense, high-performance, and cost-effective manner that is not possible with x86 processors. Video engineers control the server via FFmpeg, GStreamer, SDK, or NETINT’s Bitstreams Edge application interface, making it accessible for deploying and replacing existing transcoding resources or in greenfield installations. This case study discusses how NETINT, Supermicro, and Ampere engineers optimized the system to deliver a reimagined video server that simultaneously transcodes 95x 1080i30 streams, 195x 720i30 streams, 365x 576i30 streams, or a combined 100x 576i, 100x 720i, 10x 1080i, 40x 1080p30, 40x 720p30, and 10x 576p streams in a single Supermicro MegaDC SuperServer ARS-110M-NR 1U server. This server expands the system functionality by enabling video formats not natively supported by NETINT’s VPU, such as decoding 96 incoming 1080i30 H.264 or H.265 streams via Ampere® Altra® processor and 320 incoming 1080i MPEG-2 streams. “The punchline is that with an Ampere® Altra® Processor and NETINT VPU, a Supermicro 1U server unlocks a whole new world of value,” Alex Liu, Co-founder, NETINT. NETINT’s Vision Responding to customers’ concerns about limited CPU processing and skyrocketing power costs, NETINT built a custom ASIC for one purpose: highest-quality, lowest-cost video processing and encoding. NETINT reinvented the live video transcoding server by combining NETINT Quadra VPUs with Ampere® Altra® processor to create a smaller and faster server that costs 80% less to operate and increases throughput by 20x compared to software on x86. Requirements to Reinvent the Video Server Engineer it smaller and faster. Make it cost 80% less to operate.Increase throughput by 20x. Why NETINT Chose Ampere Processors NETINT was already familiar with Ampere Computing’s high-performance and low-power processors, which perfectly complement NETINT’s Quadra VPUs. The Ampere® Altra® Cloud Native Processor is designed for a new era of computing and an energy-constrained world — delivering unprecedented efficiency and performance. From web and video service infrastructure to CDNs to demanding AI inference, Ampere products are the most efficient dense computing platforms on the market. The benefits of using a Cloud Native Processor like Ampere® Altra® include improved efficiency and scalability, which have great synergy with NETINT’s high-performance and energy-efficient VPUs. Problem Could Ampere® Altra® simultaneously deinterlace 100 576i, 100 720i, and 10 1080i simultaneous video streams that legacy x86 processors couldn’t in a cost-effective 1RU form factor? How Ampere Responded Engineers from NETINT, Supermicro, and Ampere unlocked the high performance of NETINT’s Quadra VPU and Ampere® Altra® 96-core processor to redefine the live-stream video server. Initial results with Ampere® Altra® using FFmpeg 5.0 were encouraging compared to legacy x86 processors, but didn’t meet NETINT’s goal to increase throughput by 20x while reducing costs by 80%. Ampere engineers studied different deinterlacing filters available in FFmpeg and investigated recent Arm64 optimizations available in recent FFmpeg releases. An FFmpeg avfilter patch that provides optimized assembly implementation using Arm64 NEON SIMD instructions showed a significant performance increase in video deinterlacing with up to 2.9x speedup using FFmpeg 6.0 compared to FFmpeg 5.0. With all architectures, and especially true for the Arm64 architecture, using the “latest and greatest” versions of software is recommended to take advantage of performance improvements. Performance Challenges NETINT, Supermicro, and Ampere engineers went to work running the full video workload, combining CPU-based video deinterlacing and transcoding using NETINT’s Quadra VPUs. With outstanding results just running the deinterlacing jobs, initial results running the full video workload didn’t meet the performance target. Combining their broad expertise in hardware and software optimization, the team analyzed, root-caused, and were able to meet the aggressive requirements and, in the end, used just 50-60% of the Ampere® Altra® Processor’s CPU utilization, allowing headroom for future features. The initial results didn’t meet the target of simultaneously transcoding 100x 576i, 100x 720i, 10x 1080i, 40x 1080p30, 40x 720p30, and 10x 576p input videos. Investigating the performance showed that initially, the performance was close to the goal, yet unexpectedly slowed down over time. Following the performance methodology outlined in Ampere’s tutorial, “Performance Analysis Methodology for Optimizing Altra Family CPUs,” by first characterizing platform-level performance metrics. Figure 2 shows the mpstat utility data: initially, the system was running within ~4% of the performance target yet was only running at ~71% overall CPU utilization, with ~36% in user space (mpstat %usr), and ~35% in system-related tasks — kernel time (mpstat %sys), waiting for IO (mpstat’s %iowait), and soft interrupts (mpstat %soft). The fact that the system was idle ~29% of the time indicated that something was blocking performance. With the large percentage in software interrupts and IO wait time, we initially investigated interrupts using the softirq tool in BCC, which provides BPF-based Linux IO analysis, networking, monitoring, and more. The softirq tool traces the Linux kernel calls to measure the latency for all the different software interrupts on the system, outputting a histogram showing the latency distribution. The BCC tools are very powerful and easy to run. It showed ~20 microseconds average latency in the driver used by NETINT’s VPU while handling ~40K interrupts/s. As our performance problem was of the order of milliseconds, the BCC softirq tool showed that software interrupts weren’t limiting performance, so we continued to investigate what was limiting performance. Next, we used the perf record/perf report utilities to measure various Performance Measurement Unit (PMU) counters to characterize the low-level details of how the application was running on the CPU, looking to pinpoint performance bottleneck(s). As we initially didn’t know what was limiting performance, we collected PMU counter data to measure CPU utilization (CPU cycles, CPU instructions, Instructions per Clock, frontend, and backend stalls), cache and memory access, memory bandwidth, and TLB access. As the system after reboot reached ~96% of the performance target and degraded to ~60% after running many jobs, we collected perf data after reboot and when the performance was poor. Analyzing PMU data to identify the largest differences between good- and poor-performance cases, the kernel function alloc_and_insert_iova_range stood out, consuming 40x more CPU cycles in the poor-performance case. Searching the Linux kernel source code via the very powerful live grep website showed that this function is related to IOMMU. Rebooting the kernel with the iommu.passthrough=1 option resolved the performance degradation over time issue by reducing TLB miss rate. We were at ~96% of the performance target, so we were close but needed extra performance to meet our goals! NETINT engineers made the final performance speedup. They saw additional Arm64 deinterlacing optimizations available in FFmpeg mainline, which met our performance goals while reducing the overall CPU utilization to 50-60%, down from 70%. The Results The result is the NETINT 300 Channel Live Stream Video Server Ampere Edition based on a collaboration of NETINT, Supermicro, and Ampere, which can simultaneously transcode 95x 1080i30 streams, 195x 720i30 streams, 365x 576i30 streams, or a combined 100x 576i, 100x 720i, 10x 1080i, 40x 1080p30, 40x 720p30, and 10x 576p streams in a Supermicro MegaDC SuperServer ARS-110M-NR 1U server. This server expands the system functionality to enable running video workloads that require a high-performance CPU in a dense, power-efficient, and cost-effective 1U server. Call to Action NETINT’s vision to reimagine the live video server based on customer demands resulted in the NETINT Quadra Video Server Ampere Edition in a Supermicro 1U server chassis, unlocking a whole new world of value for customers who need to run video workloads that require high-performance CPU processing in addition to video transcoding with NETINT’s VPUs. Alex Liu and Mark Donningan from NETINT, Sean Varley from Ampere Computing, and Ben Lee from Supermicro have a webinar available to watch on NETINT’s YouTube channel, “How to Build a Live Streaming Server that delivers 300 HD interlaced channels,” which provides additional information. Other video workloads that are excellent to run on this server include AI inference processing, which NETINT recently announced and demonstrated at NAB 2024 - NETINT unveiled the Industry-First Automated Subtitling Feature with OpenAI Whisper running on Ampere. About the Companies NETINT Founded in 2015, NETINT’s big dream of combining the benefits of silicon with the quality and flexibility of software for video encoding using proprietary ASICs is now a reality. As the first commercial vendor of video-processing-specific silicon, NETINT pioneered the video processing unit (VPU). Nearly 100,000 NETINT VPUs are deployed globally, processing over 300 billion minutes of video. Supermicro Supermicro is a global technology leader committed to delivering first-to-market innovation for Enterprise, Cloud, AI, Metaverse, and 5G Telco/Edge IT Infrastructure, with a focus on environmentally friendly and energy-saving products. Supermicro uses a building blocks approach to allow for combinations of different form factors, making it flexible and adaptable to various customer needs. Their expertise includes system engineering, focused on the importance of validation, and ensuring that all components work together seamlessly to meet expected performance levels. Additionally, they optimize costs through different configurations, including choices in memory, hard drives, and CPUs, which together make a significant difference in the overall solutions that Supermicro provides. Ampere Computing Ampere is a semiconductor design company for a new era, leading the future of computing with an innovative approach to CPU design focused on high-performance, energy-efficient AI compute. As a pioneer in the new frontier of energy-efficient high-performance computing, Ampere is part of the Softbank Group of companies, driving sustainable computing for AI, Cloud, and edge applications. For more information, visit amperecomputing.com. To find more information about optimizing your code on Ampere CPUs, check out our tuning guides in the Ampere Developer Center. You can also get updates and links to more great content like this by signing up for our monthly developer newsletter. If you have questions or comments about this case study, there is an entire community of Ampere users and fans ready to answer at the Ampere Developer community. And be sure to subscribe to our YouTube channel for more developer-focused content. Check out the full Ampere article collection here.

By John Oneill

Apache Spark 4.0: What’s New for Data Engineers and ML Developers

Undoubtedly one of the most anticipated updates in the world of big-data engines, the release of Apache Spark 4.0 is a big step in the right direction. According to the release notes, this shift involved closing more than 5,100 sprint tickets, facilitated by the negligence of over 390 active contributors. Machine learning and data engineering professionals, the new features of SQL, additional capabilities for Python, management of streaming states, and the newly introduced Spark Connect framework in Spark 4.0 will further reinforce the trend of high-performance, easy-to-use, scalable data analytics. What’s New: Key Highlights for Practitioners Lightweight Multi-Language Client Spark Connect The most significant improvement in Spark 4.0 is the updated Spark Connect client-server framework. There is a new Python client that is only 1.5MB in size. This release also introduces the spark.api.mode config parameter for switching between classic and Connect modes, as well as richer Python, Scala, and new Go, Swift, and Rust API client implementations. The change in impact is the newly expanded capability of data engineering teams to create thinner, more performant client applications, or simpler, streamlined applications for use in Go or Rust that query a Spark cluster. This amplifies the versatility of deployment and enables the use of Spark in microservices or in a containerized context. Innovations in SQL Language & Data Types Spark 4.0 introduces some of the most substantial new features in SQL: With SQL scripting and session variables, users can implement complex SQL workings using local variables and control structures.The use of the new PIPE syntax (|>) makes it possible to write SQL statements in a chained, more legible, functional form.New VARIANT data type tailored for semi-structured data such as JSON and other map-like structures enhances schema versatility.Collation (accent/case insensitivity, locale-based ordering) improves the treatment of multilingual data in string datasets. Effect With the additional Spark features, a unified data processing engine for working with structured and non-structured data becomes possible. And more SQL capabilities translate into easier work for data engineers, as they can design systems with more direct approaches and fewer configuration workarounds. Improving the Developer Experience and Incorporating Python into the Workspace Gains in productivity for Python programmers.Implementing custom batch and streaming connectors in Python.User Defined Table Functions (UDTFs) written in Python where the output can dynamically change and return different schemas. Effect These improvements let ML developers and data scientists spend less time and effort on prototyping and productionizing to code, and in particular, custom connectors and transformations without having to use Scala or Java. Advances in Streaming and Managing State Date: Spark 4.0. Relational stream processing now has several enhancements.The Arbitrary Stateful Processing v2 API (e.g., transformWithState) in streaming flows can manage complex state logic, timers, TTLs, and schema evolution.Queryable state and the State Store Data Source, which expose streaming state as a table, enhancing visibility for debugging. Effect Data engineers working on real-time pipelines now have more advanced techniques for creating stateful applications and stream processing, particularly in event-driven scenarios. Migration & Other Considerations While enhancements have been made to Spark 4.0, there are still some issues with migration: Changing some of the policies (like overflow or the new null policy) breaks more forgiving behavior, which will turn on the soft east policies and make them stricter.Java 17 runtime is now supported and required in some spaces, which may require changes to the dependencies used.Because there are new APIs, such as the Python Data Source API, UDTFs, and VARIANT data types, organizations are to first test migration on workloads that are not as critical for detecting compatibility problems. Tip for teams: Use Spark 4.0 with the newest workloads, and after the system is stable, then monitor the system behavior to retro-fit the older workloads. How Does This Matter for 2025 and Beyond? Spark 4.0 has been released amid several industry shifts. There is an increased demand for all-in-one data platforms (batch + streaming + machine learning).There is an increased use of semi-structured data (JSON logs, variant schemas).There is an increased use of non-JVM languages (Go, Rust) in the big data domain.There are increased expectations for observability and developer productivity in data engineering. Given the above trends, Spark 4.0 has established itself as an engine primed for data engineering and Machine Learning workloads. Effective upgrades will provide enterprises with an increased developer velocity, observability, and production stability. Closing Thoughts For those in charge of data engineering or machine learning operations, Spark 4.0 provides an important point in time. This is not just about upgrading to larger clusters and increasing job speeds, but also about improving APIs, expanding programming language support, enhancing SQL interfaces and streaming tools, and improving overall usability. Any migration to Spark 4.0 will require careful planning; however, the return on investment will include improved developer productivity, better integration of data engineering and Machine Learning workflows, and a more future-proofed platform. Begin with a sandbox to explore Spark 4.0 and test the new SQL and streaming capabilities to justify the business case for Spark 4.0. It is in the center of the future of data engineering and will brighten it with its capabilities.

By harshraj bhoite

Why Data Engineers Need to Think Like Product Managers

Introduction Today, the work of a data engineer is more complex than simply building pipelines and platforms. Data engineers are no longer just builders; they are now vital parts of value creation in a data driven organization. However, many engineers continue to evaluate success using the number of completed jobs and created tables, rather than their actual worth to the business. This is where a Product Manager (PM) mindset comes in handy. Adopting a product manager’s way of thinking is not about managing Jira boards and marketing roadmaps. It is about thinking of data assets as products complete with customers, lifecycle, and tangible results. The Shift From Pipelines to Products For a long time, data engineering teams have been organized by projects. A business request comes through, the engineers create a data pipeline, data is moved to the data warehouse, and the team goes to the next request. While this way of working produces data outputs, it does not achieve the desired business outcomes. Rather, a product-focused mindset inquires: Who are the recipients of this information?What are the business benefits?What is the rate of adoption and feedback?How do we sustain, progress, and responsibly remove it? These responses effectively transform “data pipelines” into data products — authoritative, discoverable, and reusable components designed for specific functions. This is the core concept behind current approaches such as Data Mesh, in which every domain manages its data as a product. However, even in centralized teams, product thinking can significantly improve alignment and quality. Data as a Product: A Mental Model Powerful changes happen when engineers treat data as a product. Enhanced Ownership and Accountability Every data product has a steward, who ensures its accuracy, usefulness, and life cycle. Just like PMs, data engineers are now responsible for the reliability and impact of their products, datasets, and data pipelines. There is no more “it’s the source team’s fault” — ownership entails stewardship. Clear Definition of Customers Self-awareness is a product without users; data without consumers is noise. The type of users who access your tables — analysts, data scientists, ML teams — determines how to optimally configure schema, freshness, and accessibility. Data engineers working with such users build improved systems and further reduce the need to do rework. Usability and Documentation Consider a dataset to be a form of API. If it can be understood, then it is broken. Product-focused data engineers advocate for discoverable, self-contained, and self-documenting datasets, ensuring every table or stream is accompanied by metadata that provides context and purpose. Feedback and Iteration Great product managers iterate on the product based on client feedback. Data engineers should also do the same. Usage telemetry, Slack feedback, and dashboard performance metrics provide the NPS for data. A dataset that is neither queried nor trusted provides feedback, not failure. Applying Product Thinking to Data Engineering Workflows Defining Success Metrics A product manager defines success as outcomes, such as user engagement and retention. In a similar manner, data engineers can track: Adoption metrics: Number of consumers, queries per datasetReliability: SLAs, number of incidents resolvedValue: Reports or ML models, business KPIs influenced Success is when a decision is made based on a dataset, not when a job is done. Roadmaps, Not Backlogs Many data teams find themselves in reactionary mode for far too long. Having and executing ad hoc requests for little to no consideration. Introduce product thinking to roadmaps, which seeks to balance innovation and maintenance. What takes place now, next, and later: Now: Fixes concerning the ingestion and quality of data, which are critical.Next: Tiered data models alongside frequent and reusable features.Later: New heuristics for predictive insights and experimentation pipelines. This approach to allocating time and resources helps to inform stakeholders of the team's focus and helps to guide the team's efforts on planned outcomes. Data Quality as a User Experience Issue Buggy and broken products are dropped by users. The same can be said for data that is considered useless. Data managers who wear PM hats consider data quality to be user experience. Missing values, incoherent key systems, and poor freshness make data outdated and not worth the trouble. "Data UX principles" are designed to help with this: Flag upstream data early with unit tests for data.Data health dashboards should be available for problem areas.Changes should be communicated in a versioned controlled manner.Trustworthy data instills trust in its users which helps to build products of a lifetime. The Advantages of Working in Groups Versus Individually Engineers in most organizations work with little interaction with analysts or business users. However, PM-style thinking supports bringing different functions together for collaboration. A data engineer should spend time answering the questions below: Which KPIs are critical for marketing and finance?How is data accessed and used (dashboards, APIs, machine learning models)?What are the pain points for data users? It is that understanding that allows engineers to create systems that address genuine problems rather than systems that just transfer data. Dataset Deprecation and Lifecycle Management Feature sunsetting is done responsibly by product managers. The same should be done by data engineers. Every dataset undergoes a certain evolution. From its creation, to maintenance, and, finally, deprecation, skipping the last stage is done just to “play it safe.” This is about keeping deprecated tables around. This practice clutters data catalogs and makes accessing data confusing for consumers. Establish deprecation timelines, along with notices regarding versioning and migration. Putting together a product that is well packaged and maintained is better than one that is assembled in a hurry and constantly being updated. The Challenge with The Change In Mindset: From Builders to Enablers This change is the most difficult one, not because of technology, but rather due to attitude. There is a need to shift from the execution mindset (“build what’s asked”) to outcome mindset (“deliver what’s needed”). This is a painful change with attitude. Leaders can help with this by: Having engineers focus business work instead of just counting the pipelines.Promoting communication with data users directly.Supporting groups who create data assets that are self-serve, registered, and easy to discover, so they can be reused. The evolution of culture thinking about data products stems from mature data organizations and is distinctively different from reactive ones. Why This Matters in 2026 The collection of ecosystems matures as the ecosystems of products and data grow exponentially. In the absence of structured thinking, organizations devolve into data sprawl, with tables and documentation duplicated and trust rapidly eroded. Thinking like a product manager: Responsibility: Each dataset is defined, acquiring a purpose and a person who governs it. Ownership is instilled.Reliability: Consumers understand deliverables.Recycling: Rebuilding data assets is eliminated when teams seek out resources.Revenue: Reengineering the product results in positive organizational monetary value. This is evolution data engineering, wherein the building of the pipelines is paramount, followed by the handing over of value on a larger scale. Final Thoughts Data engineering is not defined by cluster size or the number of jobs processed. It stems from value alignment, as evidenced by the firm's measurable impact. When data engineers approach project managers, they become value architects instead of relaying on pipelines. In 2026, the most accomplished data engineers will do more than just deliver tables. The outcomes will be out of this world. Success measurements change from a bulk of data processed to trust gained and subsequently improved decision-making.

By harshraj bhoite

Building Your Own Ledger: A Solo Developer's Answer to Time and Money Chaos

Let’s talk about the parts of our job that they don’t teach in tutorials. You’ve crushed the sprint, architected a beautiful solution, and pushed clean code. Then, the real-world complication hits: untangling the spreadsheet to figure out how many hours that last feature actually took, or manually building an invoice from a chaotic mix of calendar events, timer logs, and scribbled notes. For developers trading time for money, freelancers, contractors, and consultants, this administrative tax is a constant drain on focus and a direct hit to profitability. I built my career on this model. And for over a decade, I accepted the friction. My "system" was a Frankenstein's monster of a time-tracker tab, a project management tab, a calendar tab, and a spreadsheet tab. Data lived in silos. "Billable hours" were an estimate, reconstructed weekly with forensic effort. I kept thinking, "Someone must have solved this." The tools I found fell into two camps: overly simplistic stopwatches that gave me numbers without context, or complex, expensive enterprise platforms with features I didn't need and a price tag that hurt my solo-dev sensibility. So, I did what developers do: I started building a solution for myself, not for a venture capital pitch, but for my own sanity. Three years of nights and weekends later, that solution has become Ceki. It’s a focused platform with a clear premise: to give developers a single, transparent source of truth for where their time goes and what it’s worth, without taking a cut of their earnings. The Core Problem: Time and Money Are Two Sides of the Same Coin (But Our Tools Treat Them Separately) The fundamental flaw in most workflows is the disconnect between effort and commerce. Time tracking happens in one app (Toggl, Clockify).Project and task management might happen in another (Jira, Trello, GitHub Projects).Scheduling and client communication live in email and Calendly.Invoicing and payment are handled by a fourth service (FreshBooks, PayPal). You are forced to be the integration layer. You manually translate "4.5 hours on Jira ticket PROJ-123" into a line item on an invoice. This process is error-prone, time-consuming, and mentally exhausting. It creates blind spots: Did you go over budget on that client's feature? Are you on track to hit your revenue goal this month? The answers require a manual data synthesis you'll likely postpone. The Solo-Dev Built Solution: Principles Over Bloat Building this for myself meant I could ignore trends and focus on utility. The guiding principles were non-negotiable: It must be free for core use. The platform should never be a financial burden on the very people it's designed to help. If you bill by the hour, you shouldn't pay a monthly fee just to count those hours and bill for them. Your money is for you.Time must be linked to value immediately. Every logged hour shouldn't just be a number; it must be intrinsically tied to a specific project and a financial agreement (a contract, a budget). This creates a direct line from work to revenue.Transparency is the default. The developer and the client (if shared) should have a clear, unambiguous view of progress against the scope. This builds trust and eliminates difficult conversations about overages.It must respect focus. The tool should get out of the way. No noisy notifications, no complex setup. A timer that’s one click away, and a dashboard that gives the crucial facts at a glance. Under the Hood: A Stack Chosen for Clarity, Not Hype When you're building alone, your stack is your foundation and your constraint. I chose technologies that prioritize developer experience and long-term maintainability: Backend: Laravel (PHP). Its elegant syntax, built-in features for auth, APIs, and data handling, and robust ecosystem let me move fast without sacrificing structure. Eloquent ORM makes the complex relationships between users, projects, contracts, and time entries a joy to manage.Frontend: Vue.js with Quasar Framework. Vue's component-based reactivity is a perfect fit for a dynamic, dashboard-heavy application. Quasar provided a comprehensive set of pre-built, responsive UI components, allowing me to build a consistent, app-like interface without becoming a full-time CSS architect.Database: PostgreSQL. For anything involving financial data and complex relationships, its reliability and advanced features (like proper JSON support) are worth their weight in gold. This stack isn't about being the coolest; it's about being effective. It allowed a solo developer to build, test, and iterate a secure, fully-featured application. How It Works: Closing the Loop Between Coding and Getting Paid Let’s translate principles into practice. Here’s a concrete workflow for a freelance developer: 1. The Agreement Is the Foundation Instead of starting with a task list, you start with a Contract. You create a contract with your client for "E-commerce API Development - Phase 1." You define a budget: not just a dollar amount, but a time budget (e.g., 25 hours). This is your shared scope anchor. 2. Work Is Measured Against the Budget You begin work. Your integrated time tracker isn't a standalone app; it's part of the contract. You start the timer or log time directly to the "Add Payment Processing Endpoint" task. Instantly, your dashboard updates: "Contract Budget: 7.5h used of 25h. Remaining: 17.5h." You're no longer just tracking time; you're tracking burn rate against a defined scope. 3. Invoicing Is a Reporting Function, Not a Chore The contract period ends. To invoice, you don't compile data. You simply review the contract dashboard, which now has a complete, immutable log of all tracked time entries, each with a description. You click "Generate Invoice." The system creates a PDF with all entries, totals the hours, applies the agreed rate (which is stored in the contract), and presents the total. You send it. The link between work done and money owed is explicit and unquestionable. 4. Your Profile Becomes Your Verifiable Track Record Completed contracts (with your client's permission to share) can contribute to a public profile. This isn't a self-written bio; it's a ledger of delivered value: "Built X for Client Y, within a 25-hour budget." For finding new work, this verifiable proof-of-work is more powerful than any resume. The Business Model: Why "Free" Is Sustainable The most common question is: "If it's free, how does it last?" This is a core philosophical choice. The platform is built with an "open-core" mentality for the indie developer and small team. The core is permanently free: Time tracking, contract/budget management, invoicing, and public profiles will always be free for individual users and small teams. This solves the fundamental pain point without a barrier.Sustainability through value-add, not restriction: Future development will focus on advanced features that larger teams or agencies would pay for: enhanced team role management, deep analytics and forecasting reports, white-labeling for agencies, and premium integrations. The free core remains intact and fully functional. The mission is to build an essential tool for the community, not to extract maximum revenue from individual freelancers. Its success is tied to the success of its users. For the Skeptical Developer: A Practical Challenge If you're billing by the hour, try this audit for one week: Note the time you spend switching contexts to start/stop a timer.Calculate the time you spend at the end of the week or month collating time entries from different sources.Add the mental overhead of wondering if your tracked time aligns with the project's budget. That’s pure overhead. That’s time not spent coding, learning, or even resting. The promise of a unified system isn't about features-it's about reclaiming that cognitive load and turning it back into productive capacity or personal time. Conclusion: Taking Ownership of Your Professional Infrastructure As developers, we build automation for our clients every day. It’s time we applied that same principle to the business of our own work. You shouldn't need a suite of expensive apps and manual processes to understand the basic equation of your professional life: Time Invested = Value Delivered = Revenue Earned. Ceki is my attempt to build that automation for myself and share it. It’s a solo developer’s stack applied to the solo developer’s business problem. It’s not about managing you; it’s about giving you the ledger so you can manage your work and your worth with clarity and confidence. The tool is live, it’s free, and it’s built by someone who faces the exact same challenges you do. You can find it at https://ceki.me. Stop integrating disparate tools in your head. Start building with a foundation that connects your effort to your outcome, from the first commit to the final invoice.

By Konstantin Om

How Does a Scrum Master Improve the Productivity of the Development Team?

The role of a Scrum Master is to establish Scrum, and the Scrum Master is accountable for the Scrum Team’s effectiveness. Thus, it is quite tempting to ask how a Scrum Master can help improve the productivity of the development team. But, in a complex working environment like software development, productivity is often not the right measure to showcase all the complexities of software developers’ knowledge work. In simple working environments, productivity means a ratio of output to input. The traditional idea is to know how much is achieved (output) with a given amount of resources (inputs), largely in numbers, and the focus is on maximizing the output. That’s why, in traditional project management of software development projects, stakeholders evaluate the development team’s productivity based on the lines of code. Or, even today in Agile project management, stakeholders with a traditional mindset ask for the number of story points per iteration, known as Sprint Velocity. But productivity in a complex working environment like Agile software development is not linear. Factors like customer satisfaction, business value, and project success matter more than working at the highest efficiency. It is because if a software is not able to deliver the intended business value or solve the exact customer problems, there is no use in building it fast, in the least possible time. It will be a waste of time and money. Having said that, it does not mean there are no opportunities to improve productivity. There are operational efficiencies that can hinder productivity. And it is the responsibility of a Scrum Master to address the operational inefficiencies to improve the productivity of the development team because actions of a Scrum Master have a direct impact on the team’s efficient functioning. In this post, we will look at the four primary ways a Scrum Master can help improve the productivity of the Scrum Team. I would rather call it ways to ‘improve effectiveness’ because we also have to focus on ensuring the development team delivers the software of the highest business value and customer satisfaction most effectively. Four Ways a Scrum Master Improves Development Team Productivity Here are four ways a Scrum Master can contribute: 1. Facilitating Scrum Each Scrum event (Sprint, Sprint Planning, Daily Scrum, Sprint Review, and Sprint Retrospective) has a purpose. The official Scrum guide says, “Each event in Scrum is a formal opportunity to inspect and adapt. Events are used in Scrum to minimize the need for meetings not defined in Scrum.” And it is true. Modern-day complexities in software development, such as customer-centric product development, changing market trends, and competitors' developments, require continuous collaboration among developers, stakeholders, and product owners to inspect and adapt. Too many meetings can hinder the productivity of the developers. By facilitating each Scrum event at the right time and in the right order, the Scrum Master eliminates unnecessary meetings, ensuring the team communicates, inspects, and adapts at the right time to produce the most valuable work. These Scrum meetings also provide an opportunity to address a team’s operational inefficiencies, resulting in improved productivity. Let’s understand it by an example. The purpose of Sprint planning is to bring clarity and consensus on what needs to be done for the development team. It must happen at the beginning of the sprint to ensure everyone has clarity and a mutually agreed and shared understanding of the Definition of Done (DOD), Product goal, Sprint goal, Increments to be delivered, and External dependencies. The Scrum Master ensures that all key stakeholders (Product Owner, Developers, and Scrum Master) are present at the sprint planning meeting, and their concerns are addressed. Similarly, for each Scrum ceremony — Daily Scrum, Sprint Review, and Sprint Retrospective — the Scrum Master ensures it serves its intended purpose. By facilitating these events, the Scrum Master ensures the team works most effectively, resulting in improved productivity while delivering the most effective work. 2. Removing Impediments Scrum focuses on getting feedback early and often from the customers. This is the reason why sprints are of short duration. If there are any blockers, obstacles, or other impediments to obtaining early and frequent customer feedback, it is the responsibility of the Scrum Master to remove those impediments. To give you an example of an impediment, consider that the deployment of the increment is delayed due to some external dependencies, such as a bureaucratic deployment process or complex dependency chains with other teams. This delays the customer feedback that can prevent potential improvements in the next sprint. It is the responsibility of the Scrum Master to streamline deployment processes, remove blockers, and gather feedback from customers early. This is one example of a hindrance. Hindrances could be anything from an unclear Definition of Done to a poor estimate of Story points, a lack of required technological resources, and context switching. 3. Empowering the Team to be Self-organizing A Scrum Team is a self-organizing team. It means the developers are the ones who decide: What work to do?When to do the work?How to do the work?How do engineers, designers, and testing experts work together?Who does the work?What technologies to use?What architecture and UX to use? Even a Scrum Master does not dictate the way development teams organize, plan, and manage the work. The 11th principle of Agile Manifesto says, “The best architectures, requirements, and designs emerge from self-organizing teams.” However, it is definitely the responsibility of the Scrum Master to coach the development team in self-organization and cross-functionality. The Scrum Master has to ensure the team is collaborating effectively and accountable. To achieve this, the Scrum Master can create an environment that fosters open collaboration, where the Scrum Team collaborates on solving problems independently and feels psychologically safe and encouraged to contribute. This autonomy and accountability remove operational inefficiencies and promote faster decision-making. If a team needs resources, guidance, and any support, the Scrum Master is there as a facilitator and a servant leader to provide the resources the team needs to function optimally and effectively. Based on the experience of the Scrum Team, the involvement of the Scrum Master varies. Having said that, it is supposed that the best Scrum Teams are capable of self-organizing, planning, identifying, adapting, and resolving their own impediments. It is the fine balance of authority and autonomy that a Scrum Master needs to master. 4. Removing Barriers Between Stakeholders and a Scrum Team Software development does not go as smoothly as it appears on paper. It is challenging to bring all the stakeholders on the same page. That’s exactly why the Scrum Team has a Scrum Master. They bridge the gap between the Scrum Team, the Product Owner, and the Organization. The Scrum Master facilitates collaboration among stakeholders as requested or needed and helps them understand the complexities of each other’s work. This improves the flow of work by addressing the complex issues, securing necessary resources, and bringing clarity to the priorities, needs, and expectations. Conclusion Productivity is not the goal of the Scrum Team, but effectiveness is. It is because ultimately nothing is more wasteful than building software that no one wants. And undoubtedly, the actions of a Scrum Master have a direct impact on the team’s productivity, efficiency, and effectiveness. By leading the team in Scrum, addressing the operational inefficiencies, and facilitating collaboration among the stakeholders, a Scrum Master can help improve the productivity of the development team.

By Sandeep Kashyap

AI Code Generation: The Productivity Paradox in Software Development

Measuring and improving developer productivity has long been a complex and contentious topic in software engineering. With the rapid rise of AI across nearly every domain, it's only natural that the impact of AI tooling on developer productivity has become a focal point of renewed debate. A widely held belief suggests that AI could either render developers obsolete or dramatically boost their productivity — depending on whom you ask. Numerous claims from organizations linking layoffs directly to AI adoption have further intensified this perception, casting AI as both a disruptor and a catalyst. In this article, we'll examine the current landscape and delve into recent studies and surveys that investigate how AI is truly influencing developer productivity. Studies Let's explore the findings from the studies below, which assess the impact of AI tooling on developer productivity. Study #1: Experienced Open-Source Developer Productivity To evaluate the impact of AI coding assistant tools on the productivity of experienced open-source developers, a randomized controlled trial (RCT) was conducted from February to June 2025 using the tools. A total of 16 developers with an average of 5 years of experience were chosen to complete a total of 246 tasks in mature projects. These tasks were randomly assigned among developers, with either AI tools being allowed or disallowed, respectively. Before starting tasks, developers forecast that task completion time would decrease by 24% with AI. After completing the task, developers estimated that with AI, the completion time had been reduced by 20%. However, on the contrary, the study found that allowing AI actually increased task completion time by 19%. Moreover, these results are in stark contradiction of experts prediction of task completion time reduction of up to ~39%. Below is the summary of the prediction and findings mismatch: Experts and study participants misjudged the speedup of AI tooling. Image courtesy of respective research. Although the study concludes that AI tooling slowed developers down, it could be due to a variety of factors, with five key factors for observed slowdown listed below: Over-optimism about AI usefulness (Direct productivity loss). Developers are free to use AI tools as they see fit, but their belief that AI boosts productivity is often overly optimistic. They estimate a 20–24% time reduction from AI, even when the actual impact may be neutral or negative, potentially leading to overuse.High developer familiarity with repositories (Raises developer performance). AI assistance tends to be less helpful, and may even slow developers down, on tasks where they have high prior experience and need fewer external resources. Developers report AI as more beneficial for unfamiliar tasks, suggesting its value lies in bridging knowledge gaps rather than enhancing expert workflows.Large and complex repositories (Limits AI performance). Developers report that LLM tools struggle in complex environments, often introducing errors during large-scale edits. This aligns with findings that AI performs worse in mature, large codebases compared to simpler, greenfield projects.Low AI reliability (Limits AI performance). Developers accept less than ~44% of AI-generated code, often spending significant time reviewing, editing, or discarding it. Even accepted outputs require cleanup, with ~75% reading every line and ~56% making major changes, leading to notable productivity loss.Implicit repository context (Limits AI performance, raises developer performance). AI tools often struggle to assist effectively in mature codebases due to a lack of developers' tacit, undocumented knowledge. This gap leads to less relevant suggestions, especially in nuanced cases like backward compatibility or context-specific edits. Due to these factors, the gains of auto-code generation are offset considerably, and thus the significant contrast in perceived/forecasted and actual results in developer productivity is exposed. Also, with the AI tooling, the developer is required to spend additional time on prompting, reviewing AI-generated suggestions, and integrating code outputs with complex codebases. Thus, adding to the overall completion time. See below for average time spent per activity — with and without AI tooling. Average time spent per activity. Image courtesy of respective research. Takeaway: The study reveals a perception gap where AI usage subtly hampers productivity, despite users believing otherwise. While findings show a slowdown in large, complex codebases, researchers caution against broad conclusions and emphasize the need for rigorous evaluation as AI tools and techniques continue to evolve. Thus, the study should merely be considered as a data point in evaluation and not a verdict. Study #2: GitClear The GitClear study analyzed ~211 million structured code changes from 2020 to 2024 to assess how AI-assisted coding impacts developer productivity. It categorized changes — like added, moved, copied/pasted, and churned lines — using GitClear's Diff Delta model to track short-term velocity versus long-term maintainability. Duplicate block detection was introduced to measure how often AI-generated code repeats existing logic. The methodology links rising output metrics to declining code reuse, revealing hidden costs in perceived productivity gains. Below is the trend of code operations and code churn by year as cited in the report. GitClear AI Code Quality Research — Code operations and code churn by year. Image courtesy of respective research. The following points can be inferred from the study: Increased code output: AI-assisted development led to a significant rise in the number of lines added, up 9.2% YoY in 2024. This could be perceived as an increase in developer productivity due to faster code generation and higher task (ticket) completion throughput. However, the key question remains — are the added lines of code required in the first place?Decline in refactoring (“moved” code): “Moved” lines — an indicator of refactoring — dropped nearly 40% YoY in 2024, falling below 10% for the first time. This can be attributed to the developer accepting the AI-generated code as-is and skipping the effort to refactor (to save time). Moreover, AI tools rarely suggest refactoring due to limited context windows, and thus fuel the overall drop.Surge in copy-and-pasted and duplicated code. Copy/pasted lines exceeded moved lines in 2024, with a 17.1% increase YoY. Commits with duplicated blocks (≥5 lines) rose 8x in 2024 compared to 2022. 6.66% of commits now contain such blocks. This, too, can be attributed to the developer accepting the AI-generated code as-is without much effort to keep the code DRY.Increased churn in newly added code. Churn — code revised within 2–4 weeks — increased 20–25% in 2024, i.e., developers are revisiting new code more frequently. This also implies that although the code output surged with AI tooling, due to low quality, code is being revised sooner than it used to happen earlier (when no or limited AI tooling was utilized). Takeaway: The rise in AI-generated code has led to a parallel increase in copy-pasted fragments, duplication, and churn — while refactoring efforts have notably declined. This trend signals a deterioration in overall code quality. Many organizations still gauge developer productivity by metrics like lines of code added or tasks completed. However, these indicators can be easily inflated by AI, often at the expense of long-term maintainability. The result is bloated codebases with higher duplication, reduced clarity, and an expanded surface area for bugs. While AI may boost short-term development velocity, the trade-off is accumulating technical debt and diminished code quality — costs that will surface over time in the form of increased maintenance burden and reduced agility. Surveys While studies often rely on data-driven methodologies, these approaches can sometimes be questioned for their assumptions or limitations. Surveys, on the other hand, offer direct insight into developer sentiment and can help bridge gaps that traditional studies might overlook. In the sections below, we explore findings from independent surveys that assess the impact of AI tools on developer productivity. Survey #1: StackOverflow In its 2025 annual developer survey, Stack Overflow received over 49k responses, covering various aspects, including AI tooling and its related impact. Do note that I, too, was one of the respondents. Among respondents, overall AI tool usage surged to ~84% from ~76% the previous year. The AI tool positive sentiment however dropped by ~10 percentage points signaling a trust deficit by the developers— more on this later. AI tools usage and sentiment. Image courtesy of respective survey results. Among the respondents, ~46% of users actively distrust the accuracy of AI tools. Moreover, ~66% cited that the AI tools solution is not up to mark, and ~45% cited that these solutions require additional debugging time. This clearly means the developer requires additional effort to understand, debug, and potentially refine AI-generated code, effectively increasing overall task completion time. Although trust in AI tools' ability to handle complex tasks surged by ~6 percentage points, this could be due to AI tool enhancements or to developers' overall lack of trust in AI tools' accuracy. Thus, totally avoiding AI tools for any complex tasks given the risk it carries in terms of quality and other aspects. Given the significant trust deficit in the accuracy of AI tools, the decline in positive sentiment seen in the previous section could be well related. Trust in AI tools accuracy and ability to handle complex tasks. Image courtesy of respective survey results. Frustrations with AI tools and humans as the ultimate arbiters of quality and correctness. Image courtesy of respective survey results. Even though the AI agents adoption isn't mainstream more than half of the respondents ~52% cited productivity gains. The AI Agents perhaps could be a space worth watching for as they are relatively new thus a lot of potential enhancement could follow in upcoming years. Moreover, given the contextual information it utilizes to generate code, they seem promising over the simpler AI tools. AI agents and impact on work productivity. Image courtesy of respective survey results. Takeaway: The survey revealed a sharp rise in AI tool adoption accompanied by a notable drop in positive sentiment highlighting a growing trust deficit. Majority of respondents expressing active distrust in AI tool accuracy, due to subpar solutions, suggesting that AI-generated code often demands extra effort to refine and validate. This offsets the productivity gain from faster code generation. Interestingly, trust in AI tools' ability to handle complex tasks rose, reflecting cautious optimism rather than full confidence. Developers still see themselves as the ultimate judges of code quality, reinforcing the need for human oversight. Meanwhile, AI agents — though not yet widely adopted — show early promise. Their use of contextual information positions them as a potentially more reliable and efficient evolution of current AI tooling. Survey #2: Harness Harness surveyed 500 engineering leaders and practitioners to assess various parameters, including the impact of AI on developer productivity. Although the surveyed participants showed overall positive sentiments towards AI tooling and its adoption, 92% also highlighted the associated risks. In an independent related observation, the risks are corroborated. AI Missteps and Impact Radius. Image courtesy: https://martinfowler.com/articles/exploring-gen-ai/13-role-of-developer-skills.html Almost two-thirds of respondents mentioned that they spend more time debugging AI-generated code and/or resolving security vulnerabilities. AI tooling may also generate code that includes outdated dependencies or insecure coding patterns, requiring developers to spend time updating and patching these vulnerabilities. This significantly increases the developer overhead and potentially offsets a considerable part of the productivity gains with AI tooling. Two-third of respondent requires more time debugging AI generated code and/or resolving security vulnerabilities. Image courtesy of respective survey results. About 59%nearly half offsets the gains due to rework or additional efforts 59% of developers experience deployment problems with AI tooling involved. Image courtesy of respective survey results. Since 60% of the respondents don't evaluate the effectiveness of the tools, it's quite challenging to relate it to developer productivity altogether. 60% of respondents don't evaluate the effectiveness of AI tooling. Image courtesy of respective survey results. Takeaway: The survey reveals a nuanced picture of AI's impact on developer productivity. While most respondents expressed optimism about AI tooling, but also flagged significant risks. Notably, the majority reported spending more time debugging AI-generated code and addressing security vulnerabilities — contradicting the assumption that AI always boosts efficiency. Deployment issues further compound the overhead, with many encountering frequent rework. The lack of tool effectiveness evaluation by many respondents underscores the challenge of accurately measuring productivity gains. Overall, the findings highlight that AI adoption demands careful oversight to avoid offsetting its intended benefits. Conclusion The studies and surveys analyzed paint a complex picture of AI's role in software development, revealing that perceived productivity gains often mask deeper issues. While AI tools may accelerate coding tasks, they also introduce duplication, churn, and technical debt — especially in large codebases — undermining long-term maintainability. Trust in AI-generated code remains fragile, with developers frequently needing to debug and refine outputs. This erodes efficiency, offsets gain from faster code generation, and highlights the importance of human oversight. Crucially, coding represents only a fraction of the overall software delivery cycle. Improvements in cycle time don't necessarily translate to gains in lead time. Sustainable productivity demands more than speed — it requires thoughtful architecture, strategic reuse, and vigilant monitoring of maintainability metrics. In essence, AI can be a powerful accelerator, but without deliberate human intervention, its benefits risk being short-lived. References and Further Reads Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer ProductivityGitClear Code Quality Study — 2024 | 2025Harness — State of Software DeliverySO Developer Survey 2025Role of Developer Skills in Agentic Coding

By Ammar Husain

CORE

Improving Developer Productivity With End-to-End GenAI Enablement

This is a very common scenario that every developer can relate to — I am focused on a feature, and suddenly my project buddy requests a PR review or asks for help when a test case is failing. Now, I need to context-switch to help my buddy, or the code review will be delayed. Every engineering team faces the same bottlenecks — context switching, boilerplate work, delayed code reviews, and slow onboarding. The goal is to improve developer enablement and boost productivity through automation. Generative AI amplifies that goal. From writing user stories to generating test cases, GenAI can automate repetitive tasks and provide real-time guidance. But the challenge is to connect all those capabilities cohesively rather than treat them as isolated tools. Let's build a centralized GenAI-Driven Developer Enablement Hub that connects developers, their codebases, and AI models to accelerate delivery. Think of it as a platform that plugs into your IDEs, CI/CD pipelines, and documentation systems locally. Component Context Overview This hub acts as an internal service that consumes your development context (requirements, code, design docs, tickets) and exposes GenAI capabilities through APIs. Developers interact through IDEs, local hooks, or CI/CD integrations. Context Manager: Indexing and Retrieval Embeddings and retrieval are required to make GenAI aware of the internal codebase. Python from sentence_transformers import SentenceTransformer import faiss, os, json, numpy as np model = SentenceTransformer('all-MiniLM-L6-v2') index = faiss.IndexFlatL2(384) def build_index(repo_path): docs, vectors = [], [] for root, _, files in os.walk(repo_path): for f in files: if f.endswith('.py'): with open(os.path.join(root, f)) as code_file: content = code_file.read() docs.append(content) vectors.append(model.encode(content)) index.add(np.array(vectors)) with open('context_store.json', 'w') as f: json.dump(docs, f) build_index('/my/code/repo') This code snippet builds a searchable index of the internal code repo so GenAI can pull relevant snippets for context when answering developer prompts. GenAI Orchestrator: Local and Cloud Model Integration We will run a small LLM locally and switch to a cloud model when complexity and requirements exceed the local model limit. We will use FastAPI for API development. Python from fastapi import FastAPI, Request import requests, subprocess, os # FastAPI based service app = FastAPI() @app.post("/generate") async def generate(req: Request): body = await req.json() prompt = body["prompt"] local = subprocess.run(["ollama", "run", "mistral", prompt], capture_output=True, text=True) if len(local.stdout) < 200: # switch to cloud if response too short response = requests.post("https://api.openai.com/v1/chat/completions", headers={"Authorization": f"Bearer {os.getenv('OPENAI_KEY')}"}, json={"model": "gpt-4o-mini", "messages": [{"role": "user", "content": prompt}]}) return {"result": response.json()["choices"][0]["message"]["content"]} return {"result": local.stdout} Workflow API: Story, Code, Test case, Documentation Generator We will have multiple API endpoints to perform multiple tasks. /storygen – It generates the developer friendly plan of actions basedon the story details which is sent as prompt./review – It reviews the code for bugs and best practices/testgen – It generates the test cases based on the code snippet as a prompt/docgen – It generates the documentation for the code Python @app.post("/storygen") async def storygen(req: Request): data = await req.json() story = data["story"] prompt = f"Convert this Jira story into a dev plan with modules, classes, and test strategy:\n{story}" return await generate(Request({"prompt": prompt})) @app.post("/review") async def review(req: Request): data = await req.json() code = data['code'] prompt = f"Review this code for bugs and best practices:\n{code}" return await generate(Request({'prompt': prompt})) @app.post("/testgen") async def testgen(req: Request): data = await req.json() code = data['code'] prompt = f"Generate pytest cases for the following code:\n{code}" return await generate(Request({'prompt': prompt})) @app.post("/docgen") async def docgen(req: Request): data = await req.json() code = data['code'] prompt = f"Generate concise markdown documentation from the following code:\n{code}" return await generate(Request({'prompt': prompt})) Governance Layer: Access Control, Security, Audit To run this developer enablement hub locally, this section can be ignored. But to build a production-grade enablement hub, we need to think about the governance, security, access control, and audit. Prompt logging: Log each request (prompt + model response) for audit and improvement. Access control: Restrict model endpoints by role (developer, reviewer, admin). Local-first privacy: Use local inference (Ollama, LM Studio) for proprietary code. Secure secrets: Store API keys and embeddings in Vault/Secrets Manager. Feedback loop: Capture developer thumbs-up/down for continuous fine-tuning. Integration With Developer Workflows Now it's time to integrate the API endpoints in different workflows, for example, a pre-commit hook for test generation, CI/CD integration, and VS Code integration. Local Pre-Commit Hooks for Test Generation To align with shift-left testing practice, tests should be generated before the PR is raised. GenAI-generated tests will be treated as suggestions where developers need to validate the accuracy and coverage. This will be the high-level flow: Developers implement a feature.Pre-commit hook runs /testgen and creates or refreshes tests.Generated tests are written to a temporary folder (tests/generated/).Pytest runs automatically but does not block the commit unless critical failures occur, and it provides a coverage report.CI/CD revalidates tests on PR merge. YAML #.pre-commit-config.yaml repos: - repo: local hooks: - id: genai-testgen name: Tests suggestion entry: bash -c 'python scripts/gen_tests.py || true' language: system pass_filenames: false - id: run-pytest name: Run existing unit tests and show coverage entry: bash -c 'pytest -q && coverage run -m pytest > /dev/null && coverage report | grep TOTAL | awk "{print \"Current coverage:\", $4}"' language: system pass_filenames: false If coverage drops below the threshold (e.g., 80%), the developer needs to review the generated tests. Python #scripts/gen_tests.py: import os, requests, glob code = "".join(open(f).read() for f in glob.glob("src/**/*.py", recursive=True)) r = requests.post("http://localhost:8000/testgen", json={"code": code}) with open("tests/generated/test_generated.py", "w") as f: f.write(r.json()["result"]) print("Suggested tests generated -> tests/generated/test_generated.py") CI/CD Integration for Validation The CI/CD pipeline will execute the tests again when a PR is raised. YAML name: CI Validation on: push: branches: [ main, '**/*' ] pull_request: branches: [ main ] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: '3.11' - run: pip install -r requirements.txt - name: Run Unit Tests run: pytest -q VS Code Integration With an Extension This enables the developers to access the GenAI from their IDE. JavaScript import * as vscode from 'vscode'; import fetch from 'node-fetch'; export function activate(context: vscode.ExtensionContext) { let disposable = vscode.commands.registerCommand('genai.review', async () => { const editor = vscode.window.activeTextEditor; if (!editor) return; const code = editor.document.getText(); const res = await fetch('http://localhost:8000/review', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ code }) }); const data = await res.json(); vscode.window.showInformationMessage(data.result); }); context.subscriptions.push(disposable); } Local Deployment Shell ollama run mistral uvicorn app:app --reload Containerized deployment setup: Dockerfile FROM python:3.10-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"] Conclusion: Empowering Developers With GenAI These locally built Developer Enablement Hubs represent the next phase of productivity engineering. Rather than adding isolated GenAI features, this unified platform: Automates requirements, coding, testing, and documentation.Provides contextual assistance across tools.Preserves security and governance.Bridges local and cloud environments for flexibility. As a result of this, developers' productivity will be reflected through the following metrics: Lower lead timeLow code review lagIncreased test coverageImproved onboarding timeIncrease in Gen AI adoption rate

By Nabin Debnath

From Platform Cowboys to Governance Marshals: Taming the AI Wild West

The rapid ascent of artificial intelligence has ushered in an unprecedented era, often likened to a modern-day gold rush. This "AI gold rush," while brimming with potential, also bears a striking resemblance to the chaotic and lawless frontier of the American Wild West. We are witnessing an explosion of AI initiatives — from unmonitored chatbots running rampant to independent teams deploying large language models (LLMs) without oversight — all contributing to skyrocketing budgets and an increasingly unpredictable technological landscape. This unbridled enthusiasm, though undeniably promising for innovation, concurrently harbors significant and often underestimated dangers. The current trajectory of AI development has indeed forged a new kind of "lawless land." Pervasive "shadow deployments" of AI systems, unsecured AI endpoints, and unchecked API calls are running wild, creating a critical lack of visibility into who is developing what, and how. Much like the historical gold rush, this is a full-throttle race to exploit a new resource, with alarmingly little consideration given to inherent risks, essential security protocols, or spiraling costs. The industry is already rife with cautionary tales: the rogue AI agent that inadvertently leaked highly sensitive corporate data, or the autonomous agent that, in a mere five minutes, initiated a thousand unauthorized API calls. These "oops moments" are not isolated incidents; they are becoming distressingly common occurrences in this new, unregulated frontier. This is precisely where the critical role of the platform engineer emerges. In this burgeoning chaos, the platform engineer is uniquely positioned to bring much-needed order, stepping into the role of the new "sheriff." More accurately, given the complexities of AI, they are evolving into the governance marshal. This transformation isn't a mere rebranding; it reflects a profound evolution of the role itself. Historically, during the nascent stages of DevOps, platform engineers operated more as "cowboys" — driven by speed, experimentation, and a minimal set of rules. With the maturation of Kubernetes and the advent of widespread cloud adoption, they transitioned into "settlers," diligently building stable, reliable platforms that empowered developers. Now, in the dynamic age of AI, the platform engineer must embrace the mantle of the marshal — a decisive leader singularly focused on instilling governance, ensuring safety, and establishing comprehensive observability across this volatile new frontier. The Evolution of the Platform Engineer: From Builder to Guardian This shift in identity signifies far more than just a new job title; it represents a fundamental redefinition of core responsibilities. The essence of the platform engineer's role is no longer solely about deploying and managing infrastructure. It has expanded to encompass the crucial mandate of ensuring that this infrastructure remains safe, stable, and inherently trusted. This new form of leadership transcends traditional hierarchical structures; it is fundamentally about influence — the ability to define and enforce the critical standards upon which all other development will be built. While it may occasionally necessitate saying "no" to risky endeavors, more often, it involves saying "yes" with a clearly defined and robust set of guardrails, enabling innovation within secure parameters. As a governance marshal, the platform engineer is tasked with three paramount responsibilities: Gatekeeper of infrastructure: The platform engineer stands as the primary guardian at the very entry point of modern AI infrastructure. Their duty is to meticulously vet and ensure that everything entering the system is unequivocally safe, secure, and compliant with established policies and regulations. This involves rigorous checks and controls to prevent unauthorized or malicious elements from compromising the ecosystem.Governance builder: Beyond merely enforcing rules, the platform engineer is responsible for actively designing and integrating governance mechanisms directly into the fabric of the platform itself. This means embedding policies, compliance frameworks, and security protocols as foundational components, rather than afterthoughts. By building governance into the core, they create a self-regulating environment that naturally steers development towards best practices.Enabler of innovation: Crucially, the ultimate objective of the platform engineer is not to impede progress or stifle creativity. Instead, their mission is to empower teams to build and experiment fearlessly, without the constant dread of catastrophic failures. This role transforms into that of a strategic enabler, turning seemingly impossible technical feats into repeatable, manageable processes through the provision of standardized templates, robust self-service tools, and clearly defined operational pathways. They construct the scaffolding that allows innovation to flourish securely. Consider the platform engineer not as an obstructionist, but rather as a highly skilled and visionary highway engineer. They are meticulously designing the safe on-ramps, erecting unambiguous signage, and setting appropriate speed limits that enable complex AI workflows to operate at peak efficiency and speed, all while meticulously preventing collisions and catastrophic system failures. The Governance Arsenal: The AI Marshall Stack Platform engineers do not enter this challenging new domain unprepared. They possess a sophisticated toolkit — their "governance arsenal" — collectively known as the AI Marshall Stack. This arsenal is composed of several critical components: AI gateway: Functioning as a "fortified outpost," the AI Gateway establishes a single, secure point of entry for all applications connecting to various LLMs and external AI vendors. This strategic choke point is where fundamental controls are implemented, including intelligent rate limiting to prevent overload, robust authentication to verify user identities, and critical PII (Personally Identifiable Information) redaction to protect sensitive data before it reaches the AI models.Access control: This element represents "the law" within the AI ecosystem. By leveraging granular role-based access control (RBAC), the platform engineer can precisely define and enforce who has permission to utilize specific AI tools, services, and data. This ensures that only authorized individuals and applications can interact with sensitive AI resources, minimizing unauthorized access and potential misuse.Rate limiting: This is the essential "crowd control" mechanism. It acts as a preventative measure against financial stampedes and operational overloads, effectively preventing scenarios like a misconfigured or rogue AI agent making thousands of costly API calls within a matter of minutes, thereby safeguarding budgets and system stability.Observability: These components serve as the "eyes on the street," providing critical real-time insights into the AI landscape. A significant proportion of AI-related problems stem not from technical failures but from a profound lack of visibility. With comprehensive observability, the platform engineer gains precise knowledge of who is doing what, when, and how, enabling them to swiftly identify and address misbehaving agents or unexpected API spikes before they escalate into significant damage or costly incidents.Cost controls: These are the "bankers" of the AI Marshall Stack. They are designed to prevent financial overruns by setting explicit limits on AI resource consumption and preventing the shock of unexpectedly large cloud bills. By implementing proactive cost monitoring and control mechanisms, they ensure that AI initiatives remain within budgetary constraints, fostering responsible resource allocation. By meticulously constructing and deploying these interconnected systems, platform engineers are not merely averting chaos; they are actively fostering an environment where teams can build and innovate with unwavering confidence. The greater the trust users have in the underlying AI infrastructure and its governance, the more rapidly and boldly innovation can proceed. Governance, in essence, is the mechanism through which trust is scaled across an organization. Just as robust rules and well-defined structures allowed rudimentary frontier towns to evolve into flourishing, complex cities, comprehensive AI governance is the indispensable framework that will enable AI to transition from a series of disparate, one-off experiments into a cohesive, strategically integrated product strategy. Why the Platform Engineer Is the Right Person for the Job: The AI Marshal's Unique Advantage Platform engineers are uniquely and exceptionally well-suited to assume this critical role of the governance marshal. They possess the nuanced context of development cycles, the inherent influence within engineering organizations, and the technical toolkit necessary to implement and enforce AI governance effectively. They have lived through and shaped the eras of the "cowboy" and the "settler"; now, it is unequivocally their time to become the "marshal." The AI landscape, while transformative, is not inherently lawless. However, it desperately requires systematic enforcement and a foundational structure. It needs a leader to build the stable scaffolding that allows developers to move with agility and speed without the constant threat of crashing and burning. This vital undertaking is not about imposing control for the sake of control; rather, it is fundamentally about safeguarding everyone from the inevitable "oops moments" that can derail projects, compromise data, and exhaust budgets. It is about actively constructing a superior, inherently safer, and demonstrably smarter AI future for every stakeholder. Therefore, the call to action for platform engineers is clear and urgent: do not passively await others to define the rules of this new frontier. Seize the initiative. Embrace the role of the hero. Build a thriving, resilient AI town where innovation can flourish unencumbered, and where everyone can contribute and grow without the paralyzing fear of stepping on a hidden landmine. Final Thoughts AI doesn’t need to be feared. It just needs to be governed. And governance doesn’t mean slowing down—it means creating the structures that let innovation thrive. Platform engineers are in the perfect position to lead this shift. We’ve been cowboys. We’ve been settlers. Now it’s time to become marshals. So, to all the platform engineers out there: pick up your badge, gather your toolkit, and help tame the AI frontier. The future of safe, scalable, and trusted AI depends on it. Because the Wild West was never meant to last forever. Towns become cities. And with the right governance in place, AI can move from chaos to confidence — and unlock its full potential. Want to dive deeper into the AI Marshal Stack and see how platform engineers can tame the AI Wild West in practice? Watch my full PlatformCon 2025 session here: Discover how to move from cowboy experiments to marshal-led governance — and build the trusted AI foundations your organization needs.

By Hugo Guerrero

CORE

Senior Developers, What to Read Next?

Recently, one of my best friends, who is, in the meantime, one of the smartest developers I have the luck to know, asked me what book he should read next to further develop his skills. It took me some time to gather my thoughts, and it might be useful for others, too. Spoiler alert: I could not find a single book that I would say is the one to read as a senior developer. Instead, I summarized the books that I found good for one reason or another. As the summary also declared, this is a subjective list; feel free to agree or disagree with my choices, as well as feel free to leave a comment or contact me in any other way to share your thoughts. First of all, why read books? We are in 2025 — everything important can be summarized in a 160-character-long message, and nobody has the mental capacity to consume anything longer than four seconds. Well, jokes aside, it is a valid concern that books might get outdated; following tech websites can help you stay up to date. (And a currently hot topic, AI is a really good example for this. I am not aware of many well-written and still up-to-date books on that topic.) While this is true, I still have two reasons why people should engage in reading physical books: Because usually, physical books get way deeper into topics than shorter publications. And they tend to present not only the direct results, but also try to clarify the ideas, thoughts, and assumptions behind advice (i.e., how you shall write code) or engineering decisions (i.e., why a given framework's API was built in a particular way). If you want to learn more, these aspects are far more important than knowing a whole API by heart or just knowing the best practices without understanding the reasoning behind them.Because you will remember better: at least, my experience is that I can remember way better which book I read something in, as well as on which blog I read something. I can remember the font, the size of the book, and the number of pages on the left and right side; therefore, when I look for something I read on paper, I find it usually faster than when I look for something that I read online. This might not apply to everyone, but according to my colleagues, this applies to most of us. As a side note, I will not link any online shop for the books, but will provide the ISBN numbers, and you can look them up in the shop of your choice. The Core Let's start with my advice on two books. I know my friend has already read them, but I was unsure if he has the physical copies, too: Clean Code (ISBN 978-0132350884) and Clean Architecture (978-0134494166) by Robert C. Martin. You do not have to agree with everything in these, but I expect every single developer in my team to know about the concepts and arguments listed in these two. As a side note, the second edition of Clean Code (978-0135398579) will be released soon (October 27, 2025), and I am already excited to get it. In general, I think it is a good idea to re-read these books every couple of years. Technical Books I do not believe that any of the following books would be 100% new and unknown to developers who have been around for a while. Still, they might contain such aspects that you have not thought through until now, so they could be a good addition to your library. I really liked the book The Software Craftsman (978-0134052502) because it places software development in a different context, as well as discussing internal processes, interviewing, personal dedication, and other aspects. This book will not help you with low-level coding details, but it could provide you with some insights about the industry you are working in, which definitely helps you to improve as a developer, too. Get Your Hands Dirty on Clean Architecture (978-1805128373), this book managed to surprise me. It has a really good section about layered architecture. This book complements the Clean Architecture book really well by detailing some code-level aspects, which can be really helpful if you are not used to actually writing code according to clean architecture standards. Kent Beck's book, Test Driven Development (978-0321146533), is simply the best book I've seen on TDD. I really liked how he demonstrated the strength of TDD in multiple examples, even writing an xUnit tool. The next book does not strictly belong to this list, as it is not meant for senior developers with years, maybe even decades of experience, but I find its writing style really good. Java By Comparison (978-1680502879) aims to help developers with less experience learn about best practices and how to avoid some mistakes. If you, as a senior, are asked by a junior what to read, this could be a really good pick. Clean Agile (978-0135781869) provides insights into how R. C. Martin remembers the starting days of the agile movement — how it evolved, which are the situations when agile methodology fails to help, and, in general, for what it was originally intended to be used. Reading this can heavily improve your added value in agile projects, simply by having a better understanding of the methodology itself, and maybe you can help your team to be more efficient, regardless of how strictly you follow a given ruleset. Non-Technical Books The books mentioned until this point were written by tech people for tech people, which is ultimately a valid approach because we should learn from each other. The following books do not fall into the same category: they are not necessarily written by technical people and are meant for more than just technical audiences. I still recommend them. Humble Pi: When Math Goes Wrong in the Real World (978-0593084694) is a super entertaining book. You would not expect anything else if you are familiar with the author. Still, beyond being entertaining, it brings attention to responsibility, which I find really important to be aware of, as a software developer. Some mistakes lead to a lot of money being lost. Some mistakes lead to people's lives ending earlier. I am not willing to get all of us super paranoid, but from time to time, everyone should consider what can happen if something in their code malfunctions. In the same book, I found some interesting details on topics that I was not even aware of, and how complex they can be. My favorite part was about calendars: everyone knows that dealing with time zones and various ways of counting days and years can be tricky. But I was not aware that it could be this tricky, and how much struggle it was, when sending messages from one European city to another was not a matter of seconds but weeks. Have you ever felt you are Surrounded by Idiots (978-1785042188)? If you work as a developer and have never felt this, please contact me. I mean, really. I want to know where you work and how to apply. This book describes people and groups. I assume you are a person and you work in some sort of group. Therefore, this book could be useful to understand others, the dynamics in which they are most efficient. I do not believe this book will help you resolve conflicts. But it can help you understand the reasons for conflicts. The last book I advise is Thinking, Fast and Slow (978-0374533557). This one covers a lot of topics that are not or are only marginally related to software development. Still, understanding how people decide and what the limits of rational behaviour are can help you a lot if you want to practice some self-reflection. And I believe, most of us, most of the developers could practice a bit more of it. Wrapping Up Feel free to pick any book from this list for the reasons I mentioned or for any other reason you may have, or pick any other book you believe that will help you become a better developer. My main message here is just please do consume high-quality sources because we cannot fall into the same mistakes over and over again.

By Daniel Buza

CNCF Triggers a Platform Parity Breakthrough for Arm64 and x86

The Challenge Developing open-source software for deployment on Arm64 architecture requires a robust continuous integration and continuous deployment (CI/CD) environment. Yet, there has historically been a disparity between the levels of support for Arm64 and traditional x86 processor architectures, with Arm64 usually at a disadvantage. Developers of infrastructure components for multiple architectures have certain expectations of their work environments: Consistency of the tools and methods they use across platforms, so they don’t have to adopt different development procedures just to adopt a less prevalent platform.Performance from their platforms and support mechanisms, so their deployment schemes don’t suffer from speed deficiency when they choose to support multiple platforms.Testing coverage so the very same tests for efficiency, compliance, and security apply to all platforms simultaneously and without substantial differentiation.Maintainability, enabling developers to automate their integration and redevelopment processes so they apply to all platforms without alteration. Product managers for these same components have these same requirements, plus at least two more: Platform coverage capability, so that technical account managers (TAM) may have the skills and readiness they need to respond to customer needs.Support tiering capability, enabling TAM and other IT personnel to classify their levels of software support according to their capability to respond to urgent or emerging customer issues. The Solution Working in collaboration with both Ampere and infrastructure provider Equinix, open-source developer Alex Ellis made available his Actuated CI/CD platform to some of the most critical open-source projects in the cloud-native software ecosystem. Actuated takes GitHub self-hosted automation processes demonstrated by security engineers to be inherently vulnerable to malicious attack, and runs them in microVMs abstracted from the public Internet. Implementation Several key open-source Cloud Native Computing Foundation projects took advantage of an Actuated environment to run all of their GitHub Actions for Arm64. This environment is based on Ampere® Altra® processors made available with the help of infrastructure provider Equinix. The success of this initiative was instrumental in prompting GitHub to implement full support of the Arm64 architecture with GitHub Actions. Now, developers who had been running Arm64 build processes in QEMU emulation environments on x86 architectures can relocate those processes to Arm64 on bare metal. Self-Hosted Runners for GitHub Actions on ARM64 GitHub dominates the hosting of software projects these days. The most popular way that GitHub-hosted projects generate builds and releases for continuous integration is with the platform’s built-in CI toolset, GitHub Actions. The most important role played by the GitHub Actions CI/CD platform is automating software development pipelines. The party responsible for triggering any GitHub Action is a runner. It’s an agent running on a server, waiting for something to do and eager to do it once it’s given the assignment. It’s assigned a job from the workflow and tasked with getting it done. GitHub is a complete software deployment platform. As such, it hosts its own runners, each of which is adapted to its specified target environment and architecture. Until recently, GitHub did not offer hosted runner environments for Arm64. Projects that wanted to generate Arm64-native builds did have an option — the self-hosted runner. GitHub users could install an agent on a physical or virtual machine hosted elsewhere, and have GitHub Actions dispatch jobs to that host, managed by the project users. This required project administrators not only to manage the project itself but also to take care of the maintenance and security of the build environment that the projects would use. In CNCF’s case, developers took advantage of credits to Equinix Metal, enabling them to provision bare metal instances and use them as self-hosted runners for projects. But for a code lab whose projects must be made available 24/7/365 to other developers worldwide, the security of self-hosted runners poses a challenge: Anyone could clone the project repository, modify the Actions jobs, and get access to the runner node to run arbitrary jobs, according to this GitHub documentation. Another problem was ensuring consistency between CI runs. With self-hosted runners, if there were side effects of the CI jobs, such as configuration changes or files left behind afterwards, they would still be there for ensuing jobs. This posed a problem — when running a CI job to build or test software, you should have a controlled environment, so that the only thing that changes between runs is the software. In the case of self-hosted runners, the environment can drift over time. In the absence of a cleanup process, it was possible for runs of the same build job on the same host to generate different results over time. One way developers bypassed the need for Arm64 native runners was by running virtual Arm64 environments on x86 servers, using QEMU open-source emulation. Emulated environments add a huge performance overhead for software compilations, which run at a fraction of the pace of compilations on native, non-emulated hardware. Emulation worked well enough for developing small to medium projects. But if developers had to build something big and important for ARM64, the strain would become so great on their virtual environments that builds would completely fail. “In the past, people were doing builds using QEMU,” said Equinix’s Developer Partner Manager Ed Vielmetti. “Say you were building a compiler, where the intermediate steps require large amounts of memory and very deep integration with the processor. That just would not work in an emulated environment.” The Disparity Phenomenon Unlike the typical enterprise, the Cloud Native Computing Foundation has a special obligation to build its cloud-native components for all the world’s major processor architectures. Projects such as the containerd portable container runtime, the etcd key/value data store, the fluentd log data collector, the Falco real-time threat detection tool, and the OpenTelemetry observability and instrumentation toolkit, among dozens of others, are critical dependencies for the cloud-native ecosystem, and as such, must be built for both x86 and Arm64. To build low-level infrastructure components with support for Arm64, CNCF developers need access to native Arm64 infrastructure. This means, ironically, they need the very class of tools they’re trying to create. At first, Ampere and Equinix collaborated with CNCF to address these gaps by donating Ampere Altra-based servers or setting up Altra-based bare metal nodes at Equinix facilities. The granularity of the Arm64-based server resources that Equinix could share was bare metal nodes — a 160-core dual-socket Ampere Altra system. Ideally, a server like this would be shared among several projects, but this was, at the time, beyond the capabilities of the CNCF. This is the problem that Ampere and Actuated proposed to solve for CNCF by allowing multiple projects to run on fewer hosts, thus providing easy access to build services for more projects while consuming less hardware. “OpenTelemetry is a full-on, full-time-on, CI/CD system,” said Antoine Toulmé, Senior Engineering Manager for Blockchain and DLT and Splunk Maintainer for OpenTelemetry project. “We were able to leverage [our Ampere server] infrastructure for ourselves, but we weren’t able to share it with open source at large." “We cannot give GitHub runners away,” Toulmé said. “Once we were happy with certifying the downstream distributions to our customers, we opened issues with the OpenTelemetry project saying we would like to see ARM64 support being delivered at the highest level — meaning, it should run for every commit, it should run for main, it should run all the time. And the feedback was, well, great, but there are no ARM64 runners in GitHub. So we’re going to need you to work with what we can do here.” Due to the lack of readily available Arm64 platforms for these projects, developers were unaware if the changes they committed were causing issues on Arm64, as test suites were not run as frequently as for x86. Since container orchestration platforms are among the platforms being developed to support Arm64, this phenomenon became a vicious cycle: Releases were gated on passing integration test suites for x86, but releases were not gated on the same test suites passing for Arm64. The solution CNCF’s developers would discover falls far short of qualifying as radical or revolutionary — in fact, it’s more of a bug fix in practice. It’s so simple to implement that it completely compensates for this disparity, not just for CNCF but for any developer of any platform-level component for any architecture. Breakthrough: Actuated, Plus Editing One Line of Code To take the first step towards platform parity between x86 and Arm64, Ampere enlisted the help of Alex Ellis, the creator of a service called Actuated. It’s a product that runs GitHub Actions jobs in secure, isolated microVMs, instrumented to receive build jobs from GitHub Actions, and offering developers visibility into the performance of their build jobs and the load on the shared build systems. Actuated could run all the CNCF’s existing GitHub Actions runners after altering a single line of their configuration files, plus in some cases the pasting of a few code snippets — changes which took less than five minutes to implement. These changes enabled GitHub-hosted projects to point to Actuated’s microVM-driven environment on Ampere Altra processors for their build jobs. “Falco really needed Arm64 GitHub runners to elevate its support for the architecture and enlarge its user base,” Falco project Senior Open Source Engineer and Sysdig Maintainer Federico Di Pierro said. “[Actuated] was the perfect solution for us because it was easy to leverage and relieved any burden for the maintainers. This way, we as maintainers can focus on what really matters for the project, instead of fighting with maintaining and deploying self-hosted infrastructure. Now we are building, testing, and releasing artifacts for ARM64, leveraging Actuated for many of our projects, and it works flawlessly.” Having seen the increase in demand for Arm native build environments in recent years, GitHub announced last June the availability in public beta of Arm64-based hosted runners for GitHub Actions, powered by Ampere compute instances on Microsoft Azure, followed in January 2025 by the release into public preview of free hosted runners for public repositories. For OpenTelemetry, this means the end of network loads as high as 10 times their assigned bandwidth caps, on account of OpenTelemetry builds constantly downloading dependencies from Docker Hub repositories. “Yeah, we were definitely breaking things,” the OpenTelemetry Project’s Antoine Toulmé said. “We got lucky, because the Arm runners for GitHub shipped. We have moved to ARM runners, we are happy as can be, and nothing is breaking anymore.” Now for the first time, project maintainers can pay as close attention to the safety and security of Arm64 builds as they have for x86 builds, knowing that they’re no longer likely to encounter performance degradations or penalties. “[Actuated] gave us great confidence in the CI builds on ARM64,” Principal Software Engineer and AWS Maintainer for the containerd project Phil Estes said. “If the Arm CI breaks now, there’s no way we will merge that [pull request] until we figure out why... We have full confidence now that [build failures] are not an issue with flaky hardware [as they sometimes were before].” For its part, Oracle is continuing its policy of donating $3 million per year in OCI credits for Arm64 instances powered by Ampere to CNCF projects. This generosity, along with the newfound stability of Arm64 platforms catalyzed by Ampere and Equinix, and brought about by Actuated, is enabling prominent cloud infrastructure vendors, including Red Hat, SUSE, Canonical, and Mirantis, to provide full support for their enterprise customers who choose ARM64 infrastructure. Parity makes it possible for enterprises to make sensible choices about their computing infrastructure and platforms without incurring penalties just for choosing an alternative architecture. Large cloud customers are proving that Arm64 can provide organizations with the performance they need and reduced expenses for workloads — all with industry-leading energy efficiency. But organizations can’t experience those benefits until they can deploy their workloads on all infrastructure options on a level playing field with one another and measure the results for themselves. Leveling the Playing Field In early 2023, few options existed for GitHub-hosted projects that wanted to fully integrate Arm64 into their continuous integration processes. Through this initiative, leveraging an innovative software solution from Actuated with Ampere CPUs hosted by Equinix, we lowered the bar for CNCF projects to make a start towards parity of support for ARM64 and x86. Key cloud-native projects, including etcd, containerd, Open Telemetry, Falco, and others, were able to advance their support of Arm64, accelerate their CI runs on native Arm64 infrastructure, and support increasing numbers of their users taking advantage of ARM64 compute in the cloud. By the end of this pilot project, the number of options for developers has grown considerably. The CNCF now offers its projects the ability to run GitHub Actions jobs on managed Kubernetes clusters on OCI, using Ampere-powered instances and the GitHub project Actions Runner Controller, and with the addition of hosted Arm64 runners to GitHub, it has never been easier for projects to easily support this fast-growing and exciting architecture for cloud-native applications. Check out the full Ampere article collection here.

By Scott Fulton III

Career Development

DZone's Featured Career Development Resources

Top Career Development Experts

The Latest Career Development Topics