You know what storage needs? More doughnuts to flatten us up
Flatter hierachies, fewer hops: Rockport and the Torus interconnect
The hierarchy of adapters, switches, routers and directors involved in storage networking is unwieldy, complex and costly and needs replacing with a flatter scheme of direct connections between servers and storage devices. That’s the networking message from start-up Rockport Networks.
It says that current fat tree and spine-and-leaf network architectures tend to prevent economical scalability and hinder cloud computing and analytics. It’s better to rethink storage network architecture, indeed network architecture, than to try and software-define current networking hardware to gain relatively marginal scalability, flexibility and cost-savings.
This company has been operating under the radar since being founded in 2012, in Ottawa, by a trio of storage and networking vets: CEO Doug Carwardine, CTO Dan Oprea and COO Michael McLay.
Its idea is that storage and server nodes in a network should be directly interconnected using a Direct Interconnect (torus mesh) of optical cables. A torus mesh is a geometric shape analogy of their interconnection scheme, being a ring with a circular cross-section and a hole inside the middle of the ring – think doughnut.
This is not a new thing. Torus mesh connectivity has been used to interconnect high performance computing environments for years.
Rockport’s CTO Dan Oprea says its torus is a Direct Interconnect scheme but the two terms are not synonymous. You can have either without the other. Supercomputers use torus network schemes which are not Direct Interconnects.
Oprea says Rockport wanted to use a torus because one of its properties is minimising the number of hops between any source node and any destination node, “so you can jump much faster.”
It also wanted a characteristic of Direct Interconnect; the ability to add or drop traffic on every node, which you cannot do with a spine and leaf design.
Rockport says Direct Interconnect is widely recognised as the most effective way to connect; it is also recognised as a very complex way to cable things together. The company claims it has solved the cabling challenge, with high radix connectivity and simplified cabling with short runs. Direct interconnect is the most efficient way to interconnect data nodes in the data centre and provides dramatic linear scalability, it says.
Because of this, Rockport's software is going to bring the performance and scalability of its architecture to the larger enterprise market.
A torus. Imagine this is formed from a 3-dimensional grid of networked nodes with links between them.
The torus networking concept
How this torus mesh idea applies to networking is ingenious and this is how your storage correspondent, who is definitely not a network engineer, envisages it.
Imagine a grid of nodes, three to a side and with another grid of 3 x 3 nodes, below them. The upper grid nodes have orange tops and are interconnected by red lines. The bottom grid has yellow tops with the nodes interconnected by blue lines. The nodes vertically in line with one another are connected by green vertical lines, giving us this concept:
2-layer 3x3 node grid with interconnections
Now imagine the lines are network paths between nodes in the grid and any node can communicate with any other node using these logical link lines. You can trace several paths from the top left node to the bottom right node using four intermediate nodes and five hops (links) between them. Got that?
Now imagine the nodes increase and the grid extends to two layers of 30 x 30 nodes. The same basic point applies, only now you’ll have the top left to bottom right node path involve some fifty or so intermediate nodes and links. Okay? Now extend it to a grid of 100 x 100 nodes and two layers. The paths get longer and longer, meaning 200 or so intermediate nodes and hops, which is clearly unacceptable. So; prepare for an imaginary leap here; suppose we wrap this grid around a flat column so that the far left set of nodes are now positioned so as to be underneath the far right set of nodes below them in our imagined 3D space;
Wrapped node grid firms a quasi-tube
Imagine a cross-section of this tube is the circular cross-section of the torus doughnut tube. Now the network path from the top left node of the grid to the bottom right node has been considerable shortened to just over 100 intermediate nodes and hops.
Now we need another imaginary leap. Ready? The front and rear edges of our grid are now extended as well, so as to draw a circle with the front and rear edges meeting to that the nodes on the edges directly face each other; we now have a 3 dimensional torus mesh topology as shown in the blue torus doughnut image above.
Now imagine the 3 dimensional nodes all connected in another dimension of connectivity; a 4 dimensional torus mesh topology. And then go to 5D or 6D. This concept can scale to support literally billions of devices. As dimensions are added, there are much shorter links between previously far members of our grid, meaning fewer hops and lower latency. A typical 4D or 6D torus mesh will have more than 5,000 devices within single digit hops. Oprea says that the more equal the dimensions (node count) are on either edge of a grid, the better the torus Direct Interconnect network; 10 x 10 grids are better than 5 x 15 ones for example.
Here is a Rockport 2D configuration with a 6 x 7 grid (42 data nodes);
Rockport 2D config scheme with logical links, not physical cables.
Now we’ll extend it to 3D;
Three dimensions with logical links and 126 nodes
And to scale beyond that, 4D;
4 dimensions with logical links and 378 nodes
Finally, here is a Rockport Direct Interconnect concept diagram using IT racks;
Rockport torus concept. Important point; the green lines are notional, indicative interconnect paths and not actual cables.
Ditch the switch
The networking nodes can be servers and filer arrays in scale-out cluster form with the data messages dealing with file IO requests. Each node will have Rockport software running in a chip with ports to the Direct Interconnect mesh. There is no longer any need for switches or routers, so rack space devoted to these can be recovered and used for processors running virtual machines, meaning more chargeable resource and income for cloud service providers.
Today, the Rockport solution can perform at 200Gbit/s comprised of 8 x 25Gbit/s ports per data node with internodal hop delays equal to 80 nanoseconds; PCIe-class bandwidth rate. Rockport contrasts this with a server-to-server connection using three switches and taking up to 3.12msecs. In effect the network is moved closer to compute.
Rockport’s software operates at layer 2, the data link layer or node-to-node data transfer, in the OSI model. The network is not sniff-able and has inherent security. Rockport says its network can scale from 80Gbps to 200Gbps to 800Gbps without any switching hardware. It can also scale up to more than 160,000 nodes in a single mesh.
If a server (node) goes down, then data is redirected. The network is fault-tolerant, with up to 8 paths per node (for a 4D mesh, 12 paths for a 6D mesh). New nodes are auto-discovered, the system auto-configures, and scales up or down as needed. The Rockport software provides shortest path computations and traffic distribution with flow-control to avoid network hot spots. Pass-through features reduce internodal hop delay. A Fabric Manager provides network management functions with the network configured through programming (software).
Scale out, not scale up
Rockport has an office in Palo Alto and is working on a funding round and developing OEM-type processes. Its product technology has been validated in early proof-of-concept installations. The software will be sold on a subscription basis with an optional perpetual pricing scheme. The Rockport scheme provides, it says, both significant CAPEX and OPEX savings.
The first application focus is scale-out file storage systems which have a need for consistent and reliable performance as the number of nodes and users scale out and up.
ESG Founder and senior analyst Steve Duplessie says of Rockport’s scheme: “It’s not the hardware anymore. Switching is a software function. The whole idea of the core/edge hierarchical switch architecture seems way last century with today’s CPU capabilities. Why not have a flat horizontal network that gets fatter when necessary - based on application requirements? Scale out networking – no need for scale up.”
Oprea says servers could well need 1Tbit/s interconnect bandwidth by 2020. The Rockport torus Direct Interconnect scheme can ramp up bandwidth merely by adding network link end-points to these servers, with no need for additional switches, routes, etc. The bandwidth can ramp up from 10Gbit/s to 25Gbit/s through 40Gbit/s 100Gbit/s and on to 1Tbit/s, driven by software and with less than 1 microsecond needed to cross the network from any source node to any target node because of the hop-reduction characteristic identified by Oprea.
My thinking is that this scheme could conceivably be of interest to disk drive manufacturers with directly addressed, object-storing disk drives – think Seagate and WD – as well as to scale-out filers like IBM, EMC, HP and NetApp products... also flash array orgs like SanDisk, SolidFire and others. Torus-based storage networking could flatten network hierarchies and finally deliver software-defined networking. ®