Welcome to the Chaos
April 4, 2024

AI’s High Speed Chase in Networking

AI’s High Speed Chase in Networking

Ned and Chris discuss the evolution of data center networking technologies, particularly focusing on the rivalry between Ethernet and InfiniBand in the context of AI and high-performance computing. 

Ethernet vs. InfiniBand:

This episode takes listeners into the world of data center networking, as Ned and Chris dissect the critical role of Ethernet and InfiniBand technologies in AI's rapid evolution. They explore how these networking giants influence the performance and efficiency of AI workloads and high-performance computing, highlighting their implications for future tech innovations.


Links: 

Transcript

00:00:00
Ned: [laugh] It was broken when I found it.


00:00:02
Chris: Just like my spirit.


00:00:05
Ned: No, no, I did that. That was all me.


00:00:07
Chris: Oh. This—actually, a lot of things are starting to make a lot more sense.


00:00:11
Ned: [laugh].


00:00:11
Chris: Cue the maniacal laughter.


00:00:12
Ned: [maniacal laughter].


Ned: Hello alleged human, and welcome to the Chaos Lever podcast. My name is Ned, and I’m definitely not a robot. I’m a real human person who likes to take orange spheres and deposit them in round—oops, I enjoy retrieving said sphere and… throwing it at the ground repeatedly, just like any other normal human person. With me is Chris, who is also here.


00:00:48
Chris: As always, your grasp of sport is inspiring.


00:00:53
Ned: There’s just one. Remember that. It’s ‘Sport.’


00:00:56
Chris: There’s just one sport.


00:00:57
Ned: [laugh].


00:00:59
Chris: I will go to sport and have snack.


00:01:03
Ned: I had a moment recently where we were watching college basketball, and I was saying things that made sense and picking up on plays and stuff, and I started getting really worried about myself.


00:01:16
Chris: [laugh] What’s happening? Who am I? What have I become?


00:01:21
Ned: [laugh] You have an identity that you’ve sort of honed over the last 40-some-odd years, and it’s just being disrupted. I don’t like it. Change is awful.


00:01:31
Chris: It is dumb.


00:01:33
Ned: Computers were a mistake. Apropos of nothing. So, I thought today, we could dig into some AI stuff. And before everyone backs away and runs for the nearest exit, we’re not going to be talking about the impact of AI on society. It’s not one of those. We’re going to do some deep technical bullshit.


00:01:55
Chris: So, I should put away the fedora and pull out the hard hat?


00:01:58
Ned: Yeah. We’re going to go plumbing the depths, spelunking into the deep history of data center networking, and then take that into the modern era of what we’re dealing with today, and something I like to call the Ultra Mega Super Ethernet Six Sigma Praetorian Cycle.


00:02:19
Chris: Extreme.


00:02:21
Ned: [laugh]. Times two, XD.


00:02:23
Chris: Dot AI.


00:02:25
Ned: Why are some letters cooler than others? Like just—I think we all know, like, X and Z are cool, but, like, why?


00:02:32
Chris: Well, X is no longer cool.


00:02:35
Ned: Well, I mean—okay. Valid point. I rescind the X. But like, Z and Q. I do like—Q is a pretty cool letter also. But why? Like, why are these cool? Is it because they’re not used as much?


00:02:47
Chris: That’s probably part of it, yeah. It’s like it’s a rarity.


00:02:52
Ned: Mm-hm.


00:02:53
Chris: It’s worth a lot in Scrabble.


00:02:55
Ned: Yeah, I guess maybe we could use Scrabble as a good—


00:02:59
Chris: And as y—yeah, as you know, all the cool kids are deep into Scrabble.


00:03:03
Ned: Okay. So, my wife and I went to Cancun, and while we were at the resort, there was a couple playing Scrabble in the pool. Like, they had the Scrabble tiles, and they were doing Scrabble on the edge of the pool while standing in the pool. And that just—I didn’t get it.


00:03:24
Chris: Humans, man.


00:03:25
Ned: Just the worst. [sigh] Anyway, here on Chaos Lever, we’ve discussed before how numbers being thrown around by current technology are patently ludicrous. There was a time when ten-megabit ethernet was the fastest thing around. And of course, there’s the apocryphal quote, “640k ought to be enough for anybody.” I don’t think he actually said that, but it doesn’t matter.


00:03:53
Chris: That’s why it’s apocryphal.


00:03:54
Ned: Yes, I know. In case anybody doesn’t know what that means, now they do. I distinctly remember, as a child—well, eh child; we’ll go ‘child’—when we added a 40 megabyte hard drive to our Apple IIGS, suddenly making it far more capable of storing my artistic masterpieces that I created in Paint Pro Plus. You should have seen my dream house. Holy cow. But Moore’s law is the thing. And the number of transistors on a chip has, more or less, kept doubling every 18 months, along with everything else continually embiggening every year or so.


00:04:38
Chris: Some would argue faster, but yeah.


00:04:40
Ned: Sometimes. It’s 2024—I think—and if I look at the current stats for different components of technology, they are hitting numbers that just cease to be meaningful to me at all. These numbers are [bazaca 00:04:54] if I can use a very technical term. So, a few examples: Intel’s Emerald Rapids XCC model CPU will have 61 billion transistors across 64 cores. Seagate’s Exos Mozaic hard drive will have 30 terabytes of capacity. That’s spinning rust, baby, right there. Nimbus Data has a 100 terabyte SSD that fits in the three-and-a-half inch form factor. And Broadcom’s Tomahawk 5 chip does 51.2 terabits at line rate. Yeah.


00:05:34
Chris: What?


00:05:36
Ned: [laugh] Like, it hurts. It hurts a little bit. Once we start talking in the billions and trillions, the human mind just completely fails to begin to start to even slightly comprehend what we’re talking about. Honestly, most of us don’t really do well with numbers beyond a dozen.


00:05:54
Chris: Yep, and we’ve actually proven that scientifically. You know why the phone numbers in America, including the area code, have ten digits?


00:06:05
Ned: Because that’s about as many digits as we can reliably remember?


00:06:08
Chris: That would be correct, sir.


00:06:11
Ned: [laugh] That’s the same reason we can’t keep more than, like, five concurrent things in our head, just one pops out.


00:06:18
Chris: Yep.


00:06:18
Ned: Forget it.


00:06:20
Chris: It’s also why you shorten things into less numbers like ‘seventeen-thirty-four’ instead of ‘one-thousand-seven-hundred-and-thirty-four.’ Or ‘one-seven-three-four,’ I guess is a better way to put it.


00:06:31
Ned: Right, because it’s easier to put 17 and 34 into two baskets—


00:06:35
Chris: Than four digits, right.


00:06:36
Ned: Than four digits into four baskets.


00:06:38
Ned: This is fascinating. Well, who are we talking about [laugh]? Oh, yes. So, beyond our frail human limitations, we are also hitting the limit of what our protocols and software are capable of handling when it comes to raw bandwidth and storage. Sure, my CPU can do 14 gigafloopsies, and my NIC can drive 400 paquitos per second, but do I actually have anything that, number one, needs that kind of power—no—and two, can effectively use it? Also no. Think about the latest Tesla Model S Plaid. That car, in theory, can go zero to sixty in 1.99 seconds. Now, imagine that Tesla is placed inside a container that’s only 40 feet long. Can the Tesla still technically hit zero to sixty in 1.99 seconds? Yeah sure, technically. Will it ever be able to go over ten miles an hour before crashing into a wall? No. What if we put it in a suburban development? Maybe we can hit 35, but maybe we could also hit Sally who’s trying to cross the street, and knowing Tesla, the car wouldn’t even slow down, it would speed up.


00:07:55
Chris: Oh, Elon, everything you make is gold [sigh].


00:08:00
Ned: Somebody referred to escaping Tesla as like a machine from Saw. It’s just designed to kill you.


00:08:08
Chris: [laugh].


00:08:09
Ned: And I feel like that’s pretty accurate.


00:08:11
Chris: I like it.


00:08:11
Ned: But we’ll put that to the side. It’s a jigsaw contraption, I think they call it. If you really want to see what the Tesla Model S Plaid is capable of, you need to place it in an environment suited to speed, otherwise, it’s just the thing you can point at and acknowledge it’s a death machine. This tortured metaphor can be applied to our current state of networking and the drive by AI to increase capacity and performance. For most workloads, the current capabilities of our modern data center networks are absolutely fine. Your average web application or middleware component doesn’t need and will not benefit from 400 gigabit Ethernet to the NIC. Most physical hosts will saturate the memory before anything else because programmers are lazy and RAM is cheap. I think we just lost some listeners. For those workloads, the data center network doesn’t need anything special at any of the OSI layers: Ethernet, IP, TCP, et cetera. They can all just do their thing, and any decent data center network fabric will have TCP/IP offload at the server NICs, which is nice for the CPU. But once you set up a Clos fabric with ECMP—don’t worry, I’ll get to what those things mean—you’re in pretty good shape. Until AI.


00:09:38
Chris: Ugh.


00:09:39
Ned: I know. Listen, you knew it was going to be about AI. I said it at the top. Don’t complain to us, listener, about how AI is in every news story, and you’re so sick of hearing about it, and AI is a big scam promoted by big tech since they need to shift more units, and the bottom fell out of crypto. We know. We know. Still, AI is driving some cool stuff in tech, and since we can’t have nice things, at least we can have cool things? So—


00:10:07
Chris: Uh, I’ll allow it.


00:10:09
Ned: All right. So, let’s get into AI networking, and the Ultra Ethernet Consortium, and a sea of alphabet soup, and we’ll try to navigate this whole thing together.


00:10:20
Chris: I just came back from a Zscaler presentation, so I’m already alphabet-souped out. What have you got that’s words?


00:10:26
Ned: [laugh] Very little, I’m afraid. All right, AI networking requirements. AI networking requirements and high performance computing that came before, they are different than your typical application in some very key ways. We don’t have to get into all the grimy details—and we kind of did some of that in a previous episode, which I reran last week; you’re welcome; also because I was lazy—but it’s probably easier to understand the requirements if we know a bit about how the AI model training and inference actually works. When you want to train a model, the process happens over a series of epochs where the training data is passed through the model in batches that are called iterations. I don’t want to get too bogged down here in the terminology—at least I didn’t use a bunch of acronyms—but believe me, there’s going to be a lot more terminology later. But the key thing to understand is sort of the cyclical nature of the training iterations and how each iteration cannot start until the previous one completes. So, as far as the network is concerned, things get very bursty. To start an iteration, all the batch data must be loaded into memory on every node that’s in the cluster. Then they all do their number-crunching thing and then return a result. It’s a massively parallelized operation, and it only completes once all the nodes have converged. Now, the nodes here are GPUs or other processors running the calculation. Any delay in transmitting data to a node or collecting a response means that every other node in the cluster is sitting there waiting before they can start the next iteration. That’s bad. An epoch can be made up of thousands of iterations, and a full training session can be hundreds of epochs. Any communication issues between the nodes literally costs time and money. So, what you need is incredibly fast, high bandwidth, low latency connectivity with zero data loss. If you want to know why InfiniBand is popular with AI workloads, this is why. InfiniBand provides high bandwidth, which is about 800 gigabit per second currently, with low latency and lossless data transmission. It’s kind of like the SAN fabric of AI because that’s why people preferred Fibre Channel to doing something like ISCSI.


00:13:06
Chris: That’s not the only reason why, but yeah.


00:13:08
Ned: It’s one of the big reasons. And, you know, if you talk to the SAN nerds, they’ll defend Fibre Channel to the end of the earth. But anyway. So, this is kind of like that, but for data transmission over a network. InfiniBand also natively supports RDMA through its host adapters. RDMA is Remote Direct Memory Access, which allows internode communication to skip the CPU entirely, and let applications on one physical server directly access the memory on another server. That’s useful. So, InfiniBand is awesome, but it’s also super expensive. There’s really only one company that makes it, and you won’t guess who it is, and at about 40,000 nodes in the InfiniBand fabric—which seems like a lot, but it isn’t—it starts to fall down. So, to provide an alternative to InfiniBand for AI and HPC workloads, the Ultra Ethernet Consortium was founded to use Ethernet as the underlying protocol for AI workloads. However, some tweaks need to be made, and to understand those tweaks, we’re going to need to talk about data center networking. Are you ready?


00:14:23
Chris: I’m not sure I’ve ever been more ready for anything.


00:14:26
Ned: Oh, yay. Warm hugs. Very fuzzy. [laugh] I said ‘warm fugs’ and that doesn’t make any sense. Might be the title of this show. Modern data center architecture is awash in a sea of acronyms that you may have heard, but not know exactly what they mean. We’ll try to clarify a few of them. Most of the modern data center network designs are meant to deal with the immense scale and complex topologies that are required for both the underlay and overlay networking. But it wasn’t always this way. Two important concepts to keep in mind are link aggregation and loops. Loops in the network are bad. So, just imagine, Switch A sends a packet to Switch B who sends a packet to Switch C, who sends it back to Switch A, and the packet never reaches the intended host.


00:15:20
Chris: Right. The idea is it’s supposed to go somewhere.


00:15:22
Ned: Yes.


00:15:23
Chris: Not nowhere.


00:15:25
Ned: [laugh] You want an acyclic graph to use fancy technical terms. To solve that problem, early switching technology use something called spanning tree, and an algorithm intended to select a root node and build out a non-looping tree for all the other connected devices in the topology. Whole tomes could and have been written on spanning tree, but it can be summarized thusly: spanning tree sucks, and modern data centers don’t use it. It leads to a lot of expensive paths, or inefficient paths, especially when the link goes down, and then spanning tree has to reconverge. And dear Lord, good luck.


00:16:11
Chris: Are you implying paths go down on my network?


00:16:14
Ned: Never. I would never. Aggregation is the other big thing. I might have multiple links between switches in my fabric. That was really important before the advent of, like, 40 gig and 80 gig and 100 gig links. And I might have multiple links from those switches to my hosts. This provides redundancy in case of a link failure—that’s good—but if you’re using spanning tree, the protocol only allows for one active link at a time, and that’s no good. I want to use all my bandwidth. So, LAG—or Link Aggregation—was created to let you bond multiple links together. LAG works, but you can’t do it across multiple switches most of the time, so if I have two links from Switch A to Server A and two links from Switch B to Server A, I can’t bond those four links together in a single LAG, unless my switches are in a stack, and those suck for a whole other host of reasons. Try to do a firmware upgrade. Good luck. The load balancing algorithms for LAG aren’t that great either, and its control protocol, LACP, kind of stinks. So, as data centers got bigger and the number of hosts and switches increased, something else had to come along. And here we have spine-leaf network topologies. Traditional networking topologies—especially if you went to the Cisco school of networking—used a three-tier system, or core, aggregation, and access layer switching. Spine-leaf condenses things down to a spine layer and a leaf layer, and it’s interesting how they’re connected. Every leaf in the topology has a connection to every spine. So, if you have four spine switches and eight leaf switches, every single one of those leaf switches will have four connections back to one of each of the spine switches. Each server connected to the fabric is only three hops from the neck server at most. It can go up to the leaf, maybe to the spine, back down to a leaf to the other server. So, that means the number of hops is extremely predictable. Switches in older deployments would have bridged connections meaning everything is happening at layer 2, which is also where spanning tree lives. The introduction of spine-leaf architectures also required creating a routed environment between switches, usually OSPF or BGP. We don’t need to get into BGP; just know that the paths between hosts are now routed instead of switched. Things are happening at layer 3. Now that we have a routed network, we can replace spanning tree with a different protocol that’s called ECMP. Ah, we got there. That’s Equal Cost Multi-Path routing. When there are redundant paths to the same destination, which there will be in a spine-leaf architecture, and they have the same routing cost, which almost all of them will, ECMP will split up sessions across the paths. I guess it’s kind of all there in the name: equal cost, meaning the cost is the same, and multipaths since it uses multiple paths. Yeah.


00:19:32
Chris: I can sort of see it. Yeah.


00:19:35
Ned: [laugh] ECMP uses a hash to identify a flow in the network or a session in the network, and it uses the same path for a given flow, which is good from a consistency standpoint, but it isn’t great if that single flow causes network congestion. You might want that flow to span multiple links. ECMP, by default, doesn’t do that. But if you have a bunch of flows, it’ll break them up across paths. Your average spine-leaf architecture with modern, say, 64-port switches can accommodate about 2000 servers in a non-blocking architecture. That’s a lot of servers. Two-thousand. You can add significantly more capacity by selecting switches with more ports, or by creating a three-tier Clos layout—that’s C-L-O-S which is someone’s last name, it’s not an acronym—you can create a three-tier Clos layout with a super spine between the leaf spine pods.


00:20:36
Chris: And so, those four paragraphs are a wonderful reason that I use regularly to help myself fall asleep.


00:20:47
Ned: [laugh] Pretty much.


00:20:48
Chris: No, I mean, what’s crazy is this, all of what you’re talking about is the history of networking is actually necessary and revolutionary in its time, and some of the most boring shit I’ve ever heard in my life.


00:21:00
Ned: So, I did a lot of reading for this—part of the reason this episode is, like, two weeks late—and just getting through some of these white papers was brutal. I am condensing down, like, hundreds and hundreds of pages to small sections. So, you’re welcome, everybody.


00:21:19
Chris: This is why I did servers and storage and not networking.


00:21:22
Ned: Yes. Me too. At the end of the day, what we have is a non-blocking network topology that uses ECMP to provide per-flow load balancing across equal links. And now you know what all that means. This architecture works really well for 90% or more of workloads, and completely falls down when AI is brought into the mix. And the reason why is due to the nature of AI traffic flows in a cluster. The cluster is going to have super bursty traffic that needs to be delivered without loss or congestion on the network. Sometimes that’s going to be loading data for the next run, and you want to get that data loaded onto the high bandwidth memory banks in each GPU as quickly as possible. Massive flows happen from source servers to multiple nodes at the same time. At the end of each iteration, the GPUs need to have a little powwow to decide what to do next. The CUDA communicator or its equivalent will call for an All-Reduce or All-Gather, and all the nodes need to report in. And as I said earlier, the cluster cannot move onto the next iteration until all the nodes have performed their portion of that group action. Intention, latency, or retries on the network cause delays and those delays leave GPUs idle. And as we know, those things cost millions of dollars. Leaving a GPU idle is, like, a sin, I think? Like, the Pope might condemn you?


00:22:54
Chris: It’s definitely in the Bible, yeah.


00:22:56
Ned: Yeah, it’s the 11th commandment, you know. If he hadn’t dropped the stone tablet, we would all know that. So, how does ECMP handle congestion and flows on the network? It’s going to pick a different path for each flow, but like I said before, each individual flow follows a single path. When you’ve got a massive flow from the cluster hitting the network, congestion occurs and packet drops are possible. The early congestion protocols involved stopping transmission and starting back at some random interval with exponential back off. Go back N algorithms allowed packets to be sent without confirmation up to a certain point, and then if that confirmation failed to appear, you would go back N number of packets in the sequence, and try to send them again. Since AI clusters want to use RDMA—a native feature in InfiniBand—the Ethernet Working Group introduced two competing standards—of course they did—iWarp, and RoCE. I don’t know what iWarp stands for. I didn’t look it up. I don’t know why the I isn’t capitalized. I didn’t look it up. RoCE is RDMA over Converged Ethernet, which yes is an acronym inside an acronym, and I’m sorry. Ultimately, RoCEv2 won over against iWarp, and so it’s become the standard for RDMA. RDMA requires a lossless network which is not delivered by conventional congestion controls, so priority-based flow control, or PFC, was created to deliver Quality of Service—or QoS—for RoCE V2. And that is a real sentence [laugh]. PFC—


00:24:43
Chris: It upsets me that I understand it.


00:24:45
Ned: I’m so sorry. PFC is notoriously fiddly and a huge pain in the ass to configure. Ask me how I know [whispers] don’t ask. In 2015, teams from Mellanox and Microsoft drafted a paper that established DCQCN as a replacement for PFC—it’s better because it has more letters, I guess—and other congestion controls like QCN, DCTCP and TCP-Bolt. I swear, these are all real things and not something that ChatGPT hallucinated. We’re almost there. I swear. The goal of DCQCN—this is the one that replaced Priority Flow Control—was to provide, “Fast convergence to fairness, achieve high link utilization, and ensure low Q build up, and low Q oscillations.” That’s a direct quote from the paper. And it worked up to a point. Up to a point. You’re noticing a trend here [laugh]. Ultimately, the per flow nature of ECMP is a problem, and the congestion control provided by DCQCN was insufficient. And thus we arrive—finally—we’re pulling into the station of the Ultra Ethernet Consortium and its goals outlined by their version 1.0 proposal.


00:26:09
Chris: All that for 1.0?


00:26:11
Ned: 1.0. Yeah, there are five big improvements that the UEC hopes to make with their version 1.0 of the standard: packet spraying over multipath, flexible delivery order, modern congestion control, end-to-end telemetry, and larger scale with sustained reliability. And this is why I provided all the previous background because if you don’t know where we’re coming from, it’s hard to know why these objectives matter at all. When I started reading the standard—because I was going to just do a quick thing on UEC—I was like, “What the hell is packet spraying? It sounds disgusting.”


00:26:53
Chris: It’s not a great name.


00:26:55
Ned: No, but it was the beginning of my sojourn down this immense rabbit hole of networking. Now, I can answer the question, though. It still sounds disgusting, but packet spraying. Remember that ECMP does per flow load balancing, and it’s not very efficient for AI flows? Packet spraying allows ECMP to use all available links for all current flows, essentially spraying the packets across all the links. That’s why it’s called packet spraying.


00:27:27
Chris: I get it now.


00:27:29
Ned: It’s still disgusting.


00:27:31
Chris: I get stuff.


00:27:33
Ned: That does lead to new problems—because of course it does—especially around packet order. Even though all the links used by ECMP should be equal cost, that does not guarantee order of delivery. Usually, when packets arrive out of order, it’s up to the NIC to correctly sequence them, or sometimes request retransmission of packets, even if they’ve already been received. The new standard allows for packets to arrive out of order and send them up to the application layer for proper assembly. And this is accomplished by a UEC extension in the software API layer. If an application cannot support out-of-order delivery, it can request in-order delivery and the network will honor that. But the idea is, get this library incorporated into CUDA, and now suddenly, you can just support this out-of-order packet delivery thing. Now, just spraying packets down all links willy-nilly is not a great strategy, so the UEC standard includes some new congestion control mechanisms and better telemetry to be able to quickly detect and adjust to congestion events in the network. They say congestion can happen in one of three ways: at the sender, in the links between nodes, and at the receiver.


00:28:55
Chris: And allergies.


00:28:57
Ned: Oh. Oh, that’s four, yes. We all have those, I guess. The sender can minimize congestion through scheduling algorithms, just kind of having an understanding of how much data it’s already sent and what the capability of the network is. The use of packet spraying in between nodes can help reduce the likelihood that a single flow will saturate a link, which leaves the receiver as the final piece. The UEC standard basically allows a receiver to issue credits to senders and say, “This is how much traffic you can send at once. Don’t send more than that because I won’t be able to handle it.” This prevents the receiver from being overwhelmed from multiple senders transmitting too much data all at once, something which can happen, often during the ALL operations on the AI network. The UEC is focused primarily on the transport layer of the OSI model, although it relies on features being implemented in the application stack and the layer 2 and 3 of the network. The good news is, existing data center switches should be able to implement the standard that the UPC is pushing, but to get the full benefit, new firmware will probably need to be rolled out to some components. Vendors participating in UEC include all of the major networking vendors like Broadcom, Arista, and Cisco, along with chipmakers like AMD, and Intel, and server OEMs like HPE, Lenovo, Dell, and SuperMicro. Unsurprisingly, Nvidia is not a member. That’s probably because UEC is a direct attack on their InfiniBand stronghold.


00:30:37
Chris: They took their 800 gigabits per second and went home.


00:30:40
Ned: [laugh] They kind of did. So, will it work? Will UEC be enough to break the Nvidia stronghold? Quite possibly. A recent post from the AI team at Facebook detailed two training clusters they built out for their Llama 3 training models. Each cluster was made up of 24,000 GPUs with the same storage and compute for both. The big difference? One was using Ethernet with RoCEv2, and the other was using InfiniBand, and both were running at 400 gigabit per second. And they were able to get comparable performance out of both systems, and this is before the rollout of UEC’s new standards. So yeah, that’s like—that’s good. But that’s not to say that Nvidia is holding steady. At GTC a couple of weeks ago, Jensen Huang talked about the improvements they are making to NVLink 5, and about some of the data processing capabilities they’re adding directly to their NVLink switches to accelerate AI workflows. This is definitely an arms race, and I don’t think we’re going to see a clear winner emerge. But at least InfiniBand has some legitimate competition, which might keep prices at least somewhat reasonable, and drive innovation for both standards.


00:32:04
Chris: Right. And if we look forward a few years, once this becomes, let’s call it 2.0—


00:32:12
Ned: Sure.


00:32:13
Chris: What you’re going to end up with is a situation where how you build a model and how you train the model is in two different things. One is the cutting edge where you need this kind of speed and this kind of technology, and the other is the rest of us schmucks who will not.


00:32:29
Ned: Right.


00:32:30
Chris: And that goes right to the point that you just made. I mean, we’re talking orders of magnitude more expensive for InfiniBand at the moment. Like—


00:32:37
Ned: Yes.


00:32:38
Chris: —just staggeringly more expensive. Once we get to a point where you can get close enough—and of course, by close enough, we’re still talking about numbers that are so large that we cannot comprehend them—but you’re going to see wider acceptance of these different models being trained with, you know, the Toyota Camry versus the Cadillac version of this hardware. It’s just a Toyota Camry that can go the speed of light.


00:33:08
Ned: [laugh].


00:33:08
Chris: I don’t know where to take this metaphor.


00:33:11
Ned: No, I see what you’re saying, and it’s pretty telling that some of the biggest contributors to UEC are the cloud service providers that are going to be building these giant AI clusters. They want alternatives. They want to do these kinds of bake-offs. They want to know, can I save myself millions of dollars by implementing UEC with Arista switches instead of paying Nvidia whatever ridiculous price it is for their full DGX cluster? And if the answer is yes, they’re going to save millions of dollars.


00:33:45
Chris: Which is good. Millions of dollars—if anybody’s taking notes at home, millions of dollars… good.


00:33:53
Ned: Hey, thanks for listening or something. I guess you found it worthwhile enough if you made it all the way to the end, so congratulations to you, friend. You accomplished something today. Now, you can go sit on the couch, put an ice pack on your head, and try to forget everything I just said. You’ve earned it. You can find more about this show by visiting our LinkedIn page, just search ‘Chaos Lever,’ or go to our website, chaoslever.com where you’ll find show notes, blog posts, and general tomfoolery. We’ll be back next week to see what fresh hell is upon us. Ta-ta for now.


00:34:30
Chris: Also don’t InfiniBand cables break a lot?


00:34:33
Ned: When I’m around, yeah.


00:34:35
Chris: [laugh] Stop vacuuming in the data center.


00:34:37
Ned: Snip, snip, bitches [laugh].