Performance and Agility: A Critical Balancing Act | Straight Talk


The latest insights from your peers on the latest in Enterprise IT, straight to your inbox.

Juniper Networks' founder on the importance of infrastructure.

Twenty years ago, with the rapid growth in Internet use providing evidence of its value and with the availability of optical fiber causing a rapid drop in the cost of long-distance bandwidth, Pradeep Sindhu had an important insight. The Internet Protocol, he concluded, would be the only technology capable of providing network connectivity for computers at a world-wide scale. This insight led to the successful founding in 1996 of Juniper Networks, which is now celebrating its 20th anniversary.

Sindhu, now Juniper’s Vice Chairman and Chief Technology Officer, continues his deep involvement in technology. His long tenure in the industry provides him with a fundamentals-based perspective that isn’t affected by what is simply fashionable. And his reading of recent trends has given him some cause for concern. He believes that while the current focus on infrastructure agility is important, continually improving infrastructure performance is equally important. In fact, focusing on agility alone at the expense of performance raises risks, for companies and for the industry.

Before founding Juniper Networks, Sindhu was a Principal Scientist and Distinguished Engineer at Xerox's Palo Alto Research Center (PARC), where he worked on design tools for VLSI and on high-speed interconnects for shared-memory multiprocessors. He was also instrumental in the commercial development of Sun Microsystems' first high-performance multi-processor system family.

Sindhu holds a bachelor's degree in electrical engineering from the Indian Institute of Technology in Kanpur, a master's degree in the same discipline from the University of Hawaii, and both master's and doctorate degrees in computer science from Carnegie Mellon University.

The following is an edited transcript of a conversation Sindhu had with CTO Straight Talk Editor-in-Chief Paul Hemp and Articles Editor Gil Press.

For over 30 years, you’ve been driving innovation in an industry where change is the only constant. When so many components of IT change so frequently, it is sometimes difficult to see the big picture. What do you see as the key developments in the industry over the last decade or so?

The most important recent change is the measurement of progress in information technology. While improvements in the performance of networking, computing, and storage have been and continue to be important, there is another dimension that has become critical—the infrastructure’s “agility.”

Companies like Google and Amazon pioneered agility for compute infrastructure housed in their data centers. They viewed a data center as a single large computer that had a logically centralized “orchestration system” analogous to the operating system of a single computer. This centralized software permitted much better infrastructure agility. It allowed new, unanticipated applications to run on the infrastructure quicker than it was possible to do on traditional IT infrastructure. 

So how do you define agility?

Performance is an attribute that is reasonably well understood—for example, megabits per second for networks, or MIPS for a computer. Agility as applied to infrastructure does not have such a generally understood definition. We can define it as the speed with which one can get infrastructure to do something that was not anticipated when the infrastructure was put into operation—for example, getting an infrastructure to deliver an entirely new service. It should be clear from the definition that making infrastructure agile requires it to be programmable in one manner or another. 

Over the years, the networking industry has done a phenomenal job in improving performance and price-performance, but it has done a relatively poor job in improving network agility. Having seen the benefits of agile infrastructure in data centers, network operators have legitimate reasons for asking how some of the same principles of logically centralized software could be applied to improving the agility of networks. This is the key idea behind software-defined networks, or SDN.

Where do you think the networking industry went wrong in trying to solve the agility problem?

The industry went wrong in calculating that agility was the only problem that needed to be solved. For example, the buzz around SDN went so far as to suggest that all wide-area distributed networks built using specialized computers called “routers” should instead be built with general-purpose computers. What this extreme view ignores is that there is a trade-off between agility and performance—to get maximum agility all network functions must be written in software on general-purpose computers; conversely, to get maximum performance, one must sacrifice agility and specialize the infrastructure to do one network function well. Either extreme has problems.

A better approach is to recognize the trade-off and to use the right engine for each type of network function. Networks are generally understood to have four types of functions: management, control, layer 4-7 services, and packet forwarding. The first two functions are best executed on general-purpose processors because there is little in the computations being performed that would benefit from specialized hardware. Forwarding, on the other hand, benefits hugely in performance and price-performance by having specialized programmable hardware. Finally, L4-L7 services can also benefit from specialization, although the benefits are less dramatic than for forwarding.

What is the relationship between performance and agility?

For any given technology, there is generally an inverse relationship between performance and agility - if you want higher performance, you will need to give up some agility and vice versa.

Given this inverse relationship, it is easy to conclude that a good way to evaluate infrastructure is to look at the product of agility and performance rather than looking at the two attributes in isolation.

What are the implications, in terms of how IT infrastructure is developed and deployed, of the recent slowdown in performance improvements?

We have gotten used to seeing exponential improvements in the performance of compute storage and networking. But in the last decade, compute performance has slowed down significantly compared to the golden age between 1945 and 2005, where a doubling every 18 to 24 months was the norm. The implications of this slowdown are profound, and will impact all aspects of IT. Significantly, performance improvements in storage have accelerated because of new solid state technologies, and performance improvements in networking continue almost unabated.

Scale-out architectures were invented specifically to continue to improve the overall throughput of computation delivered to users, despite the technology limitation of individual microprocessor chips. The trick was to connect massive numbers of microprocessors, coupled by a fast local area network, and to write applications in such a way that they could be distributed across this distributed general purpose computer.

These architectures are now over a decade old—that makes them really old in “Internet Years”—and we are beginning to see the limitations of this approach applied to general-purpose computing. Another way in which performance limitations can be overcome is to specialize computers to solve particular problems when the gains are large enough and the problem is important enough. Two examples will serve to demonstrate that this is already beginning to happen. First, graphics processing units, or GPU’s, that were built initially for graphics are now being used to solve problems that involve heavy use of vector processing; scientific computing, columnar data bases, and protein folding are all examples of applications that benefit from the use of GPU’s. Second, the renewed interest in Artificial Intelligence through the application of “deep-learning” has sparked a race to build specialized learning computers.

I predict that we are entering the era of what could be called “Scale-Out Heterogeneous Computing,” in which both techniques will be applied simultaneously to the problem of building information infrastructure. Further, these techniques will be applied without imposing any penalty on the agility with which applications can be delivered on the infrastructure.

Thus, while the attention of the pundits is focused on the surface layers of information technology—the applications we all use to tremendous benefit every day—there are profound changes taking place in the deeper layers of the infrastructure. Regrettably, many people in the United States no longer consider information infrastructure to be a worthwhile investment, despite its critical importance to the future of the economy. The exact same mistake has been made once before—manufacturing was considered to be “dead money” and was ignored for a long time. We now belatedly realize that manufacturing is critical and investments are needed in order to innovate, but it will take a generation for manufacturing to recover. If we are wise, we won’t make the same mistake in information technology.

It is fashionable these days to say that it’s all about software, without really understanding what “software” truly is and isn’t, and ultimately what the success of software depends on.

The most general way to view software is that it is a formal—that is, machine executable—statement of intent to achieve some result. Once the intent is captured, the result can be achieved quickly through computing machinery, and can be iteratively refined to do better over time. This is where the power of software comes from.

Note that this general description of software doesn’t tie it to any particular computer or instruction set, even though most people today associate software with one particular instruction set because it happens to be in common use today. This is a profound mistake.

The second thing to note is the utter dependence of software on some computing machinery—software does not run on air! In fact, it would be fair to say that if the performance of these machines had not improved at exponential rates between 1945 and 2005, there would have been no computing industry and probably no software industry.

The fact is that the power of software depends on exponential improvements in the underlying machinery. When these improvements are not forthcoming, as is the case now, we are in uncharted territory.

It is almost a cultural phenomenon of the industry: The herd gets excited about a particular area and moves in that direction, often to the detriment of another. How do you see this phenomenon playing out in the interplay between performance and agility?

It is indeed a cultural phenomenon. Wall Street is not excited by infrastructure because the Street is focused on short-term profits. Investors are not excited by infrastructure for the same reason, and industry is less excited about it than it used to be. If this continues, fundamental developments in this area will move to geographies where there is an appetite to invest for the long term.

Do you think that the growing interest and investment in the Internet of Things will make people pay more attention to hardware?

It’s entirely possible, but the IoT space has suffered from its own share of hype. There are two fundamental problems that need to be solved before IoT fulfills its potential. One is delivering power to IoT devices over the air, and the other is providing connectivity to these devices at the right cost point.