Acer’s leading gaming branding, Predator, is all about maximizing performance, particularly around gaming. In the modern era, that now extends into content creation, streaming, video editing, and all the sorts of things that drive the need for high performance. As we’ve seen several times over the years, just throwing more cores at the problem isn’t the solution: bottlenecks appear elsewhere in the system. Despite this, Acer is preparing a mind-boggling solution.

The Acer Predator X is the new dual-Xeon workstation, with ECC memory and multiple graphics cards, announced today at IFA 2018. The premise of the system is for the multi-taskers that do everything: gaming, content creation, streaming, the lot. With this being one of Acer’s flagship products, we expect it to be geared to the hilt: maximum cores, maximum capacity. There-in lies the first rub: if Acer is going all out, this is going to cost something crazy:

  • 2 x Intel Xeon 8180 ($10009 x 2)
  • 12 x 16 GB ECC RDIMM ($200 x 12)
  • 2 x NVIDIA Quadro RTX 8000 ($000s x 2)
  • Some Storage
  • Some big power supply
  • Some custom chassis

Of course, Acer is focusing this product for the next generation of processors (read, Cascade Lake-SP), and so none of the specifications have been put into place yet. However, there’s a fundamental aspect to dual-CPU systems that needs to be addressed.

Dual CPU systems have what is known as a Non-Uniform Memory Architecture (NUMA) – despite each CPU having direct access to memory, without a NUMA-aware operating system or software in place, memory for one process on one CPU can be allocated on the memory of the other CPU, causing additional latency. We tested this way back in 2013, and the situation has not improved since. Most software assumes all the cores and memory are identical, so adding additional latency causes performance to tank. Tank hard.

Back in those 2013 articles, even scientific software was not built for multi-CPU hardware, and often performed worse than a single CPU with fewer cores. More recently, we’ve seen even single socket systems with a NUMA like environment such as the 32-core Threadripper 2 show performance deficits against monolithic solutions. Only in very specific scenarios (lightweight ray-tracing being the best), does performance improve.

When I approached the person who Acer put on stage to promote this new hardware for the Predator brand about these issues, he didn’t really have a clue what I was talking about. At first he confused it with having ECC, and describing the difference between bandwidth and latency seemed to go no-where. If Acer wants to promote this as a Windows machine, which I’m 99.9% sure they will, they really need to have some software wrapper in place to enumerate cores and put Core Affinity in place. Otherwise people will shell out a lot of money for, in a lot of cases, worse performance.

But hey, maybe Acer is going after the VM gaming market? Right?

One thing I was told is that Acer will be offering configurable variants. So you might be able to use a pair of Xeon Silver instead. Or remove that piece of 'leather' from the front of the chassis.

Comments Locked

37 Comments

View All Comments

  • zodiacfml - Thursday, August 30, 2018 - link

    Nah. They don't look like they want to sell it.
    Just look at the PC case, it looks like an entry level desktop gaming box. It has no ventilation at all except a restrictive 120mm mesh at the back.
  • s.yu - Friday, August 31, 2018 - link

    That thing looks impossibly tacky and cheap for all the high end components.
  • abufrejoval - Friday, August 31, 2018 - link

    That is the reason I'd like them to scale GPGPU or APU clusters rather than thowing CPU cores and GPU cores into a big, big compute lake that just cooks into steam with evaporating returns.

    If you take the type of SoCs AMD produced for the Chinese, perhaps use HBM2 for the HSA parts and add a bit of DDRx for "storage", you'd make the job of the programmer much, much easier to scale the game-engine or the HPC.

    Hardware independent coding is dead, because More's law is dead, so if close to the metal coding is required anyway, at least make it easier by using "Lego block" hardware.
  • abufrejoval - Friday, August 31, 2018 - link

    Or quite simply imagine a Threadripper made up of 2700G dies with HBM2.
  • kneelbeforezod - Saturday, September 1, 2018 - link

    Lots of people have use cases for dual-socket systems and dual-socket workstations aren't uncommon.

    > "without a NUMA-aware operating system or software in place, memory for one process on one CPU can be allocated on the memory of the other CPU, causing additional latency."

    CPU affinity and process groups is completely configurable. There doesn't have to be any additional latency if you don't want it but if you want your application to access all your memory you have to accept the physical fact of life that memory further away will be slower than memory closer. This is a perfectly acceptable trade-off for many applications.

    > "We tested this way back in 2013, and the situation has not improved since."

    Pretty sure it has improved. Interconnects are faster and memory is lower latency now than it was in 2013 and software has come a long way as well largely thanks to Threadripper 1 which was the first high end desktop chip to bring this server focus architecture to the desktop in a single socket.

    > "Most software assumes all the cores and memory are identical, so adding additional latency causes performance to tank. Tank hard"

    Define "most software". Do you mean games? Or do you mean operating system kernels, web servers, rendering applications, databases, physics simulations, AI/deep learning training? Most machines in the data center are dual-socket and if this caused any significant performance degradation that would not be the case.

    Symmetric Multiprocessing goes back to 1962 and developers are pretty damn good and making software work over multiple CPUs (when they have to). If an application doesn't scale over dozens of cores it's not the fault of the hardware.

    > "even scientific software was not built for multi-CPU hardware, and often performed worse than a single CPU with fewer cores"

    File a bug report and in the meantime limit that application to a single CPU.

    > " the 32-core Threadripper 2 show performance deficits against monolithic solutions"

    The 2990WX is a weird case. It's a server CPU put on a consumer board and is starved of memory bandwidth in some cases. That said you can configure it to act like a single die. Of course this system isn't using a 2990WS it's using Xeon 8180s which has a two more memory channels.
  • SvenNilsson - Sunday, September 16, 2018 - link

    I agree this is insane. I have owned two dual socket machines and that added latency for memory synchronization really kills gaming performance. Multi-socket only makes sense for virtualization, where you can put one virtual machine on each physical CPU chip.
  • woogitboogity - Saturday, June 13, 2020 - link

    The title of this article does the math beautifully:

    "Dual Xeon Processor" (expensive as hell with limited server applications)
    +
    "Predator X System" (high-end gaming consumer targeted product name)
    =
    "Please no" (because we know from Skull Trail this will end badly)

Log in

Don't have an account? Sign up now