| | Comments (1)

Steve Traugott at Infrastructures.ORG says:

Most IT organizations still install and maintain computers the same way the automotive industry built cars in the early 1900's: An individual craftsman manually manipulates a machine into being, and manually maintains it afterward. This is expensive. The automotive industry discovered first mass production, then mass customization using standard tooling.

Indeed... Most network devices are still configured by hand and manually maintained, with all of the attendant problems (typos, inconsistency of configuration, difficulty making common changes to many systems in parallel, etc.). I'm very interested in taking the same principles that Steve has been codifying and espousing for systems, and applying them to networks.

For the last several years, Steve has been driving this effort, including creating and hosting the Infrastructures mailing list. Their goal is to develop and discuss the

... standards and practices [that] are the standarized tooling needed for mass customization within IT. This tooling enables:
  • Scalable, flexible, and rapid deployments and changes
  • Cost effective, timely return on IT investment
  • Low labor headcount
  • Secure, trustworthy computing environments
  • Reliable enterprise infrastructures


...and we're having a lively discussion about this topic, including references to this blog, over on the infrastructures list right now. I wound up posting a brain-dump of the furthest I've been able to get with network gear automation thinking to date; some of it might be useful, most of it is probably already obvious to people who have been thinking about network gear more than I have.

Brent, the biggest thing that hit me between the eyes this time was a realization that a global, netwide rollback from a failed change transaction is *more* feasible for network gear than it is for UNIX hosts.

It requires that each network device be controlled and monitored by a "management proxy", a UNIX host physically connected to the device's serial port. This host does all of the heavy lifting, pulling change orders, synchronizing with all of the other management proxy hosts, then kicking off the change, testing for success, and either committing or rolling back depending on test results.

The key is that, unless *all* management proxies can see each other after the change and agree on success, they *all* trigger a rollback to the previous configuration; this will work even if a failed change partitioned the network. The only hard-fail mode I've been able to think of so far is the case where a management proxy host applies a change and then crashes during the test.

There's a lot more to it, and I'd appreciate hearing what other holes people can shoot in this.
For the details, see

I apologize for the noise if this is already a standard feature of someone's management methodology; if so, I'd be curious to hear more.


About this Entry Archives

This page contains a single entry by Brent Chapman published on March 10, 2005 5:14 PM.

Uplogix Envoy network management appliance was the previous entry in this blog.

New paper: Rigorous Automated Network Security Management is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Mailing List

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by Movable Type 4.12