In this guided research we attempt to extend the data-mining pipeline of SG++ to support the estimation of probability density functions using the combination technique. The intuition behind this paper is to employ the combination technique for the purpose of breaking down the main problem into multiple smaller tasks aiming for computational speedups.
We expect this to majorly improve the performance of the current probability density estimation in the SG++ project, as well as create further opportunities for efficiency boosting via parallelization. We observe quality approximation of the original method and promising speed performance that creates a roadmap for future improvements. We present the performance of the method for grid levels 3 and 5 as well as for threads 1 and 4. We provide a comparison with the SG++ datadriven miner. From our experiments, we conclude that the proposed method will provide the appropriate speed improvement when executed in a large number of threads. This comes as a consequence of the number of independent processes attempting to run on parallel in comparison to the provided number of threads. To be more specific, the maximum number of threads provided is 4 when the number of parallel processes can reach a few hundreds. This adds further overhead in the synchronization and does not utilize the independence of the sub problems.
«
In this guided research we attempt to extend the data-mining pipeline of SG++ to support the estimation of probability density functions using the combination technique. The intuition behind this paper is to employ the combination technique for the purpose of breaking down the main problem into multiple smaller tasks aiming for computational speedups.
We expect this to majorly improve the performance of the current probability density estimation in the SG++ project, as well as create further op...
»