CERN Accelerating science

Improved Metrics Collection and Correlation for the CERN Cloud Storage Test Framework

Date published: 
Sunday, 1 September, 2013
Document type: 
Summer student report
C. Lindqvist
Storage space is one of the most important ingredients that the European Organization for Nuclear Research (CERN) needs for its experiments and operation. Part of the Data & Storage Services (IT-DSS) group’s work at CERN is focused on testing and evaluating the cloud storage system that is provided by the openlab partner Huawei, Huawei Universal Disk Storage System (UDS). As a whole, the system consists of both software and hardware. The objective of the Huawei-CERN partnership is to investigate the performance of the cloud storage system. Among the interesting questions are the system’s scalability, reliability and ability to store and retrieve files. During the tests, possible bugs and malfunctions can be discovered and corrected. Different versions of the storage software that runs inside the storage system can also be compared to each other. The nature of testing and benchmarking a storage system gives rise to several small tasks that can be done during a short summer internship. In order to test the storage system a test framework developed by the DSS group is used. The framework consists of various types of file transfer tests, client and server monitoring programs and log file analysis programs. Part of the work done was additions to the existing framework and part was developing new tools. Metrics collection was the central theme. Metrics are to be understood as system statistics, such as memory consumption or processor usage. Memory usage and disk reads/writes were added to the existing client real-time monitoring framework. CPU and memory usage, network traffic (bytes received/sent) and the number of processes running are collected from a client computer before and after a daily test. Two other additions are visualization for storage system log files, as well as a new monitoring tool for the storage system. This report is divided into parts describing each part of the framework that was improved or added, the problem and the final solution. A short description of the code and the architecture are also included.