One of the key battlegrounds of the next decade is going to be storage: density, speed, and demand. Naturally all the major players in the space want to promote their own technologies of that of their competitors, and Kioxia (formerly Toshiba Memory) is no different. This year during their plenary talk at the International Electron Devices Meeting (IEDM) the company set forth its promotion of its BiCS flash product family, as well as its upcoming XL-Flash technology. What was interesting during this talk is a graph that seems to slam the long-term prospects of any of the upcoming Storage Class Memory (SCM) technologies like 3D XPoint from Intel and Micron.

Memory (DRAM) vs Storage (Flash) vs 'Storage-Class' Memory (SCM)

Memory at its basic level is having a property of a cell that can be calculated and converted into data. A simple DRAM cell contains electrons, and the presence / absence of electrons determine whether the value of that cell is a 1 or a 0. Flash storage memory has gone through several changes over the last couple of decades, with floating gate and charge trap technologies helping drive the manufacturing and scaling of storage. New types of memory are in various states of development/manufacture/shipping that rely on the resistance of the medium in the cell, or the spin of the medium in the cell, rather than the voltage.

Traditionally it is easy to think of each cell as a straight forward 0 or 1, on or off, with two distinct detection levels. However, depending on the type materials used, it can be possible to detect multiple levels within a single cell. The industry moved from 1-bit per cell (0 or 1) to 2-bits per cell (00, 01, 10, 11) to 3-bits per cell (000, 001, 010, etc) onwards, with the leading storage products now on four bits per cell and looking at even more than this. ‘DRAM’ time memory has always been a 1-bit per cell medium, however storage has been going through the motions of increasing the number of bits per cell. Moving to more bits per cell gets extra storage capacity, in effect, for ‘free’, however it requires the materials to have tighter tolerances and the detection circuitry to be more precise, and one way to do both of those is to increase the size of the cell, decreasing the density overall. The more bits per cell, the difficulty becomes in distinguishing between the property levels in 2 to the power of the number of bits. It’s an interesting conundrum.

Kioxia’s current BiCS flash storage technology relies on stacking multiple layers of floating gate cells in a tower, and then repeating that design in the x-y directions to increase capacity. Kioxia currently ships a lot of 3-bit per cell and 4-bit per cell products, with the company looking at 5-bit per cell for special applications. The BiCs family of products has also been increasing the number of layers in its design, from 32 layer to 48 layer to 64 layer and now currently up to 96-layer, with 128+ layer in the future expected to arrive. Adding layers, by contrast to other methods, is fairly easy.

Kioxia is also building a new type of Flash called XL-Flash, which again adds another layer of parallelism to the concept of flash.

Storage Class Memory is slightly different to traditional flash memory. Memory works at a ‘bit’ level of access, while flash memory works at a ‘page’ and ‘block’ level. This means that while DRAM can access each bit and modify it, in flash it means that any write operation to the flash requires a whole page to be written at once. This means that every read-modify-write operation needs to read the full page, choose which bits are going to change, and re-write the page in full. This increases the wear on the drive (the number of read/write cycles), and there are many techniques in play in order to reduce the wear through wear levelling, spare storage area, and such. Memory by contrast needs to work at a bit level, and each bit needs to be selectable and adjustable – ‘storage class memory’ must therefore be able to act like memory at all times, and then be used for storage reasons when possible. The benefit of memory is meant to be its seemingly infinite (>10^18) cycle lifetime and low access latency, however it isn’t always that easy.

3D Stacked storage-class memory cells work a little different to flash. The easiest example here is 3D XPoint, which uses a phase change material to alter the resistance of a memory cell, and is accessed through an ovonic selector switch. The memory is built up through alternating the direction of word lines and bit lines to retain the bit-addressable nature of the SCM. In order to add more layers, the idea is that additional word and bit lines are added, along with the cells in-between.

Is 3D SCM the Future?

Why does Kioxia think that 3D Stacked SCM isn’t the future? I’ll go straight to the graph in question.

Here we have two lines showing relative cost per bit against the number of layers. Each line is normalized to a single layer of itself, not each other. The function that causes this graph is takes into account the number of layers (y-axis), the effective complexity of adding additional layers, the x-y area lost due to more complex control circuitry, and the yield lost by adding more layers. Putting numbers in spits out an effective cost-per-bit as the layers add up.

Now, 3D NAND is a proven technology. We have seen 90+ layers from multiple vendors in the market, and no-one is denying that adding layers is an effective way to go here, as the area loss is near to zero and the yield loss is similarly extremely low. This is because some of the etch-and-fill steps in the manufacturing process can cover many layers at once.

But for 3D Stacked SCM technologies, we still haven’t seen them expand beyond a single layer device in the market. Kioxia’s data shows that while its BiCS flash reduces down to an asymptotic value at cost per bit as we go past 10 layers, the company says that 3D Stacked SCM will at best only reduce to 60% of the cost per bit for a 4-5 layer device compared to a single layer – with the data rising from there. This is down to the increased cost per layer, the area loss required, and the yield decrease based on using complicated cell technologies that don’t have the benefit of decades of improvements. In order to build 3D Stacked memory, it's a painstaking process of layer upon layer, which leads to decreases in yield with each additional step.

For anyone interested, the equation for this graph is as follows:

Where n = the number of layers, Cf is the cost for the common layer, Cv is the cost per extra layer, A is the area penalty for adding a layer, and Y is the yield penalty for a single layer.

So it should be stated that at the plenary talk, we were not able to take photographs of the slides being presented. I made a quick note of the graph and the formula, and cycled back to Kioxia with suggested numbers for each of these variables to recreate these graphs. They replied saying I was very close with the following:

Predicted Graph Values
AnandTech Cf
Common Layer Cost
Cv
Extra Layer Cost
A
Area loss of Extra Layer
Y
Yield Loss of Extra Layer
NAND 0.95 0.05 ~0 ~0
3D SCM 0.70 0.30 0.02 0.06

When putting numbers in, it was clear that Cf + Cv had to equal 1, and as a result we basically end up looking at the ratio of the cost of adding a single layer to the design compared to the common layer of a design. The term involving area and yield affects the upswing of the curve, and the ratio of these ends up important for when the minimum value is as well as the rate at which the curve rises.

In the case of 3D SCM, the cost per bit at around 12 layers became the same as the cost per bit of a single layer, which is at the key of Kioxia’s commentary: if SCM was ever to hit the number of layers that NAND flash would, it would become prohibitively expensive (50x cost per bit of a single layer for a 64-layer SCM device).

Now of course, if we were to take the side of 3D Stacked SCM vendors, they will likely point out that just because of the price predictions today of >4 stacked layers seem cost-prohibitive, it doesn’t take into account what potential advancements will happen for the technology in the future. The ability to offer both high-density DRAM at order-of-magnitude performance levels or extremely low latency storage in a single product indicates its utility, rather than a lack of optimization for one or the other.

From what I’ve had the pleasure to work with, I can greatly see the benefit of SCM in the memory space – offering a super large pool of data to work from at a lower cost per GB than traditional DRAM, while also having a warranty that covers 100% access over the warranty period. As a storage medium, it offers an immediate fast access however the cost per GB is rather high. For storage at least, flash is going to be king of capacity for a long while yet.

Related Reading

Comments Locked

23 Comments

View All Comments

  • Anymoore - Monday, December 30, 2019 - link

    For SLC, SCM such as 3D XPoint is cheaper. Also, the diameter of the NAND channel cannot be shrunk below ~100 nm, whereas SCM is expected to go below ~20 nm. In other words, SCM achieves the same density with fewer layers than 3D NAND.
  • Billy Tallis - Monday, December 30, 2019 - link

    You cannot use density as a proxy for cost when comparing 3D NAND against something like 3D XPoint, even if you stipulate that you're talking about designs with the same layer count. Adding layers to 3D NAND is simpler and involves fewer process steps than adding layers to 3D XPoint, so 3D NAND doesn't need to shrink the horizontal cell dimensions as much as 3D XPoint in order for 3D NAND to remain way ahead on cost-effectiveness.
  • Anymoore - Thursday, January 2, 2020 - link

    As SLC, 3D NAND is in fact, more expensive per bit.
  • Billy Tallis - Thursday, January 2, 2020 - link

    The Samsung 983 ZET 960GB enterprise SLC SSD is cheaper than the Intel Optane 905P 960GB consumer 3DXP drive, by about 22%. Intel's Optane drives are also far more than 3x the price of consumer TLC SSDs, or 4x the price of consumer QLC SSDs. You must be basing your assertion on something other than real-world prices.
  • peevee - Monday, December 30, 2019 - link

    There will always be cases which are capacity-bound and speed-bound, and for hybrid cases there are memory/cache hierarchies. 3DXPoint over MLC over QLC is just another hierarchy.
  • nandnandnand - Monday, December 30, 2019 - link

    This is basically propaganda for Toshiba/Kioxia.

    There are many post-NAND candidates. Only one has to succeed to smash that graph.
  • Eliadbu - Tuesday, December 31, 2019 - link

    Flash storage is not going anywhere soon like hard drive won't phase out from the world anytime soon. but Kioxia seems to be overlooking technology improvements that may help 3d Xpoint to overcome the challenges of being both dense and cost-effective. And even if 3D Xpoint won't be the fitting technology there are plenty of other candidates to replace flash memory, they seem to put all the eggs in the flash basket and try to dismiss any other initiatives.
  • Fujikoma - Tuesday, December 31, 2019 - link

    As written in the third paragraph:
    2-bits per cell (00, 01, 10, 00)

    Shouldn't that be:
    2-bits per cell (00, 01, 10, 11)
  • Billy Tallis - Tuesday, December 31, 2019 - link

    Fixed. Thanks for pointing it out.
  • jjj - Tuesday, December 31, 2019 - link

    This is complete nonsense. XPont does not scale well but that does not say much about any other memory.
    NAND scaling is slow already, it's far from ideal at this point.
    The right memory scales well, horizontal, vertical, bits per cell. At some point, someone will make that.

Log in

Don't have an account? Sign up now