[Edaily Reporter Han Kwangbeom ] SKTelecom(017670)has partnered with NetApp, an intelligent data infrastructure company, to resolve performance degradation issues in AI workloads occurring in virtualized environments.
NetApp announced on the 25th that it had successfully completed a joint proof of concept (PoC) with SKTelecom to support the expansion of enterprise AI infrastructure while maintaining high performance even in virtualized environments.
This project was conducted to verify the interoperability between SKTelecom’s virtualized cloud infrastructure “Petasus AI Cloud”—designed for building and operating AI data centers—and NetApp’s disaggregated storage architecture solution, the “AFX System.”
The two companies successfully achieved performance levels of over 99%—equivalent to using physical servers directly—even in a virtual machine (VM) environment by leveraging NVIDIA’s GPU Direct Storage (GDS) technology.
Previously, when running AI workloads in a virtual machine environment, additional processing overhead occurred as multiple tasks shared server resources, making performance degradation inevitable. This has been a major constraint on the adoption of high-performance AI infrastructure in industries requiring ultra-low latency, such as electronic design automation (EDA), finance, manufacturing, and telecommunications.
Through this PoC, the two companies have effectively resolved the performance constraints associated with virtualization by optimizing their software stack and infrastructure design. Specifically, they achieved a data transfer rate of 32.7 GB/s—the same level in both virtual machine and physical server environments—while reducing CPU utilization by 40–50%. By reducing unnecessary processing overhead, they improved overall computational efficiency, allowing GPUs to focus more on AI training and inference.
At the heart of these achievements lies the NetApp AFX architecture, based on ONTAP. Its architecture, which allows for independent scaling of performance and capacity, enables flexible optimization of AI infrastructure to adapt to changing workloads, while consistently delivering data management and security features that have been validated in a global environment.
Furthermore, this project confirmed that by storing and utilizing the “KV cache”—data temporarily stored and referenced by AI while generating responses—on NetApp AFX, AFX can serve as high-performance, scalable storage that complements GPU memory. In essence, NetApp AFX’s high-bandwidth, ultra-low-latency, disaggregated storage architecture helped reduce data access times and increase throughput during the AI inference process.
The two companies, which have collaborated as storage partners for over a decade, plan to continue their partnership based on the success of this PoC, focusing on integrating AI data center solutions, conducting technical validation, and identifying opportunities to jointly serve enterprise AI customers. Moving forward, they plan to further develop AFX-based scalable storage technology into next-generation memory technology to enable faster and more efficient storage and utilization of temporary data.
PD Prasad, Vice President of AI Data Infrastructure at NetApp, stated, “By bridging the performance gap between physical servers and virtual machines, enterprises will be able to perform AI training and inference more quickly and efficiently in cloud environments, thereby improving operational efficiency and expanding the scope of innovation.”
Jeong Min-young, Head of AI DC Solutions at SKTelecom, said, “Through our collaboration with NetApp, we have succeeded in significantly reducing performance degradation in virtualized environments—a major technical challenge for AI cloud infrastructure—while simultaneously validating the Petasus AI Cloud as a production-ready platform for next-generation AI workloads.” “The Petasus AI Cloud, validated through this PoC, is expected to provide tangible value to enterprises seeking to secure both operational efficiency and cost-effectiveness in virtualized environments while reliably supporting high-performance AI training and inference environments,” he said.