This document presents the results from the performance testing having been carried out on the EnSimS X3290 computer. The X3290 is specially configured to achieve an optimum balance between single job completion time and overall simulation throughput. The benchmark tests and energy consumption tests were designed to verify this claim.
The X3290 is based on Dell T620 freestanding server range. It has two Intel Xeon E5-2690 processors, which are currently the top of the range option, with a base speed of 2.9GHz and a maximum Turbo Boost speed of 3.8GHz. In total there are 16 physical cores, 32 execution threads (through Hyper-Threading). More technical specifications of the computer can be found in the online document or the User's Manual.
This document contains three parts: standard E+ benchmark testing results, DesignBuilder typical model benchmark results, and power consumption benchmark results.
The jEPlus standard E+ benchmark test is designed for the comparison of CPU models, E+ versions and OS versions. It comprises 28 models from the E+ examples, combined with 8 different sets of weather data. In total there are 224 simulations in the benchmark set. More details of the models can be found on the EnergyPlus Benchmark Model Set page.
The benchmark set was run using only one thread on the machines tested. In this way, the average simulation time of each job gives a good estimation of the best simulation speed achievable on the hardware-software platform. The test result is shown in the chart below.
From the chart, it is clear that Intel Xeon E5-2690 has a fairly similar single core performance compared to the top of the range Intel i7 desktop chip (i7-3770K, 3.5-3.9GHz, 2012).
The benchmark set was run again using all available threads on the tested computer. The total elapsed time for simulating the 224 jobs is reported on the chart below. This test shows the overall “crunching” power of the hardware-software platform.
The result shows X3290 is by far the fastest computer tested. It is about 3.5 times as powerful as the i7-3770K based desktop computer. This figure is broadly inline with the results from the next benchmark test suite. We have tested a 32 thread Amazon EC2 instance (2 x E5-2670, 2.6-3.3GHz) before, although the result is not included on the chart. The total simulation time was 6'57“, which is about 35% slower than the X3290. This may be caused by a combination of lower processing speed, E+ versions, and the computer's BIOS configuration.
This benchmark set is designed to represent typical building models that practitioners create using DesignBuilder. It aims to represent more realistic scenarios, where simulation times and simulation output levels can vary widely.
24 models from the Design Optimisation Competition (DOC2012) were used in this benchmark set. They represent a good range of simulation complexity. PCMs, natural ventilation calculations and daylight-based lighting controls are known time consuming design options in terms of simulation. They are represented in the model set. The models use a number of different construction types, built forms, and glazing arrangements. Unfortunately, detailed HVAC has not been represented in the set. More details on the models included in this benchmark set can be found in Table 1
The EnergyPlus models were exported from DesignBuilder, and then converted to E+ version 8. Orientation was added as an additional parameter, to create more simulation cases. A UK (London) weather file was used in all tests. The chart below shows the distribution of the average simulation time of each model, when executed on X3290 using 8 parallel threads. The simulations were done with no output files generated. In the full test, we set three output levels: no output, DesginBuilder's default output setting at hourly level, and DesignBuilder's all output setting at hourly level. These are used to test the impact of disc I/O on simulation performance.
Single simulation job completion time and the impact of computer loading is shown in the chart below. Please note the metric has been normalized, using X3290 8 threads as the reference point. The higher the index value is, the faster a job will finish simulating. The difference between 1 thread and 8 threads for X3290 can be explained by Turbo Boost, which increases the processor's clock speed from 3.3 GHz (8 threads) to 3.8 GHz (1 thread). Degradation of processor speed beyond 16 threads is caused by virtualization. However, Hyper-Threading in fact enhances performance rather than reduces it, as job completion time for 32 threads is only 55% (instead of 100%) longer than that for 16 threads.
The desktop and laptop processors (i7-3770K and i7-2620m) are both competent when running single simulations.
DesignBuilder and E+ simulations can potentially generate large amount of output data. Disk I/O can become a bottleneck for the system, especially when the computer is running multiple jobs in parallel. It is therefore important to test how X3290 cope with simulation jobs with high output level.
The benchmark set was configured to generate three levels of output: none, default DesignBuilder options at hourly level, and all options at hourly level. The average volumes of disk I/O are 2MB, 240MB and 440MB per job, respectively. From the chart below, it is clear to see that data output has only moderate impact on simulation completion time, and it has little dependency on the number of parallel jobs are running. In this case we can conclude that the X3290's file system is proficient to cope with typical DesignBuilder workloads.
As a further note, we did test requesting all possible output from the benchmark models, which in average each job generates 200GB data. It is highly improbable that such extreme cases may be required in practice.
X3290 is designed to crunch through many a simulation quickly. For such tasks, even high-end desktops and laptops will struggle to cope. The chart below shows the throughput of the computers when running the benchmark set. The results are presented as the number of jobs processed per hour. X3290 with 32 threads can process over 40 jobs per hour, which is 3.4x as powerful as a high-end desktop, or 9.4x as to a decent laptop. Using all 32 threads on X3290 gives 28% more throughput than using 16 threads, which again shows the benefit of Hyper-Threading.
X3290 provides a built-in function to monitor power consumption. The table below summarizes the energy consumption figures when using different thread settings. Please note the consumption per job and the throughput per kilowatt-hour figures are based on the DOC 2012 benchmark models. In real life such figures will vary with the simulation jobs.
|Idle||1 Thread||8 Threads||16 Threads||24 Threads||32 Threads|
|2xE5-2690||Power rating||90 watts||154 watts||266 watts||378 watts||392 watts||406 watts|
|Consumption per job||-||63.1 Wh||15.2 Wh||12.0 Wh||10.6 Wh||10.1 Wh|
|Throughput per kWh||-||15.9 jobs||65.8 jobs||83.1 jobs||94.6 jobs||99.3 jobs|
|i7-3770K||Power rating||45 watts||80 watts||165 watts||-||-||-|
|Consumption per job||-||23.7 Wh||13.9 Wh||-||-|
|Throughput per kWh||-||33.7 jobs||72.1 jobs||-||-||-|
X3290 with 32 threads represents the most energy efficient simulation solution. It uses about 27% less electricity per job compared to the i7-3770K based desktop computer. On the other hand, the standby power of X3290 is quite high compared to the desktop computer. We are working on a solution to further reduce the energy consumption, especially when the server is not in demand.
We have tested the new X3290 with an E+ benchmark set and a DesignBuilder benchmark set. The results from both sets are consistent. The dual Xeon E5-2690 processors are capable of running single simulations at a speed close to a top of the range desktop processor. When the workload is high, the true power of the server shines through. It is also a more energy-efficient solution for running large amount of simulations.