Journal Articles

  1. Jaśkowski, W., Liskowski, P., Szubert, M., and Krawiec, K. The Performance Profile: A Multi-Criteria Performance Evaluation Method for Test-Based Problems. International Journal of Applied Mathematics and Computer Science 26, 1 (2016), 215–229.
    • Abstract
    • In test-based problems, solutions produced by search algorithms are typically assessed using average outcomes of interactions with multiple tests. This aggregation leads to information loss, which can render different solutions apparently indifferent and hinder comparison of search algorithms. In this paper we introduce the performance profile, a generic, domain-independent, multi-criteria performance evaluation method that mitigates this problem by characterizing the performance of a solution by a vector of outcomes of interactions with tests of various difficulty. To demonstrate the usefulness of this gauge, we employ it to analyze the behavior of Othello and Iterated Prisoner’s Dilemma players produced by five (co)evolutionary algorithms as well as players known from previous publications. Performance profiles reveal interesting differences between the players, which escape the attention of the scalar performance measure of the expected utility. In particular, they allow us to observe that evolution with random sampling produces players coping well against the mediocre opponents, while the coevolutionary and temporal difference learning strategies play better against the high-grade opponents. We postulate that performance profiles improve our understanding of characteristics of search algorithms applied to arbitrary test-based problems, and can prospectively help design better methods for interactive domains.

    • BibTeX
    • @article{Jaskowski_2016_AMCS,
        author = {Ja{\'s}kowski, Wojciech and Liskowski, Pawe{\l} and Szubert, Marcin and Krawiec, Krzysztof},
        journal = {International Journal of Applied Mathematics and Computer Science},
        number = {1},
        pages = {215-229},
        title = {The Performance Profile: A Multi-Criteria Performance Evaluation Method for Test-Based Problems},
        volume = {26},
        year = {2016}
      }
      
    • Download
  2. Jaśkowski, W. and Szubert, M. Coevolutionary CMA-ES for Knowledge-Free Learning of Game Position Evaluation. IEEE Transactions on Computational Intelligence and AI in Games (2016), to appear.
    • Abstract
    • One weakness of coevolutionary algorithms observed in knowledge-free learning of strategies for adversarial games has been their poor scalability with respect to the number of parameters to learn. In this paper, we investigate to what extent this problem can be mitigated by using Covariance Matrix Adaptation Evolution Strategy, a powerful continuous optimization algorithm. In particular, we employ this algorithm in a competitive coevolutionary setup denoting this setting as Co-CMA-ES. We apply it to learn position evaluation functions for the game of Othello and find out that, in contrast to plain (co)evolution strategies, Co-CMA-ES learns faster, finds superior game-playing strategies and scales better. Its advantages come out into the open especially for large parameter spaces of tens of hundreds of dimensions. For Othello, combining CoCMA-ES together with an experimentally-tuned derandomized systematic n-tuple networks significantly improved the current state of the art. Our best strategy outperforms all the other Othello 1-ply players published to date by a large margin regardless of whether the round-robin tournament among them involves a fixed set of initial positions or the standard initial position but randomized opponents. These results show a large potential of CMA-ES-driven coevolution, which could be, presumably, exploited also in other games.

    • BibTeX
    • @article{Jaskowski_2016_TCIAIG,
        author = {Ja{\'s}kowski, Wojciech and Szubert, Marcin},
        journal = {IEEE Transactions on Computational Intelligence and AI in Games},
        title = {{Coevolutionary CMA-ES for Knowledge-Free Learning of Game Position Evaluation}},
        pages = {to appear},
        year = {2016}
      }
      
    • Download
  3. Jaśkowski, W., Szubert, M., and Gawron, P. A Hybrid MIP-based Large Neighborhood Search Heuristic for Solving the Machine Reassignment Problem. Annals of Operations Research 242, 1 (2016), 33–62.
    • Abstract
    • We present a hybrid metaheuristic approach for the machine reassignment problem, which was proposed for ROADEF/EURO Challenge 2012. The problem is a combinatorial optimization problem, which can be viewed as a highly constrained version of the multidimensional bin packing problem. Our algorithm, which took the third place in the challenge, consists of two components: a fast greedy hill climber and a large neighborhood search, which uses mixed integer programming to solve subproblems. We show that the hill climber, although simple, is an indispensable component that allows us to achieve high quality results especially for large instances of the problem. In the experimental part we analyze two subproblem selection methods used by the large neighborhood search algorithm and compare our approach with the two best entries in the competition, observing that none of the three algorithms dominates others on all available instances.

    • BibTeX
    • @article{Jaskowski_2016_ROADEF,
        author = {Ja{\'s}kowski, Wojciech and Szubert, Marcin and Gawron, Piotr},
        journal = {Annals of Operations Research},
        publisher = {Springer},
        title = {{A Hybrid MIP-based Large Neighborhood Search Heuristic for Solving the Machine Reassignment Problem}},
        year = {2016},
        volume = {242},
        number = {1},
        pages = {33--62}
      }
      
    • Download
  4. Szubert, M., Jaśkowski, W., and Krawiec, K. On Scalability, Generalization, and Hybridization of Coevolutionary Learning: A Case Study for Othello. IEEE Transactions on Computational Intelligence and AI in Games 5, 3 (2013), 214–226.
    • Abstract
    • This study investigates different methods of learning to play the game of Othello. The main questions posed concern scalability of algorithms with respect to the search space size and their capability to generalize and produce players that fare well against various opponents. The considered algorithms represent strategies as n-tuple networks, and employ self-play temporal difference learning (TDL), evolutionary learning (EL) and coevolutionary learning (CEL), and hybrids thereof. To assess the performance, three different measures are used: score against an a priori given opponent (a fixed heuristic strategy), against opponents trained by other methods (round-robin tournament), and against the top-ranked players from the online Othello League. We demonstrate that although evolutionary-based methods yield players that fare best against a fixed heuristic player, it is the coevolutionary temporal difference learning (CTDL), a hybrid of coevolution and TDL, that generalizes better and proves superior when confronted with a pool of previously unseen opponents. Moreover, CTDL scales well with the size of representation, attaining better results for larger -tuple networks. By showing that a strategy learned in this way wins against the top entries from the Othello League, we conclude that it is one of the best 1-ply Othello players obtained to date without explicit use of human knowledge.

    • BibTeX
    • @article{Szubert_2013_TCIAIG,
        author = {Szubert, Marcin and Ja{\'s}kowski, Wojciech and Krawiec, Krzysztof},
        journal = {IEEE Transactions on Computational Intelligence and AI in Games},
        number = {3},
        pages = {214-226},
        title = {{On Scalability, Generalization, and Hybridization of Coevolutionary Learning: A Case Study for Othello}},
        volume = {5},
        year = {2013}
      }
      
    • Download
  5. Szubert, M., Jaśkowski, W., and Krawiec, K. Learning Board Evaluation Function for Othello by Hybridizing Coevolution with Temporal Difference Learning. Control & Cybernetics 40, 3 (2011), 805–831.
    • Abstract
    • Hybridization of global and local search techniques has already produced promising results in the fields of optimization and machine learning. It is commonly presumed that approaches employing this idea, like memetic algorithms combining evolutionary algorithms and local search, benefit from complementarity of constituent methods and maintain the right balance between exploration and exploitation of the search space. While such extensions of evolutionary algorithms have been intensively studied, hybrids of local search with coevolutionary algorithms have not received much attention. In this paper we attempt to fill this gap by presenting Coevolutionary Temporal Difference Learning (CTDL) that works by interlacing global search provided by competitive coevolution and local search by means of temporal difference learning. We verify CTDL by applying it to the board game of Othello, where it learns board evaluation functions represented by a linear architecture of weighted piece counter. The results of a computational experiment show CTDL superiority compared to coevolutionary algorithm and temporal difference learning alone, both in terms of performance of elaborated strategies and computational cost. To further exploit CTDL potential, we extend it by an archive that keeps track of selected well-performing solutions found so far and uses them to im- prove search convergence. The overall conclusion is that the fusion of various forms of coevolution with a gradient-based local search can be highly beneficial and deserves further study.

    • BibTeX
    • @article{Szubert_2011_CC,
        author = {Szubert, Marcin and Ja{\'s}kowski, Wojciech and Krawiec, Krzysztof},
        journal = {Control \& Cybernetics},
        number = {3},
        pages = {805-831},
        title = {{Learning Board Evaluation Function for Othello by Hybridizing Coevolution with Temporal Difference Learning.}},
        volume = {40},
        year = {2011}
      }
      
    • Download
  6. Krawiec, K., Jaśkowski, W., and Szubert, M. Evolving Small-board Go Players Using Coevolutionary Temporal Difference Learning with Archives. International Journal of Applied Mathematics and Computer Science 21, 4 (2011), 717–731.
    • Abstract
    • We apply Coevolutionary Temporal Difference Learning (CTDL) to learn small-board Go strategies represented as weighted piece counters. CTDL is a randomized learning technique which interweaves two search processes that operate in the intra-game and inter-game mode. Intra-game learning is driven by gradient-descent Temporal Difference Learning (TDL), a reinforcement learning method that updates the board evaluation function according to differences observed between its values for consecutively visited game states. For the inter-game learning component, we provide a coevolutionary algorithm that maintains a sample of strategies and uses the outcomes of games played between them to iteratively modify the probability distribution, according to which new strategies are generated and added to the sample. We analyze CTDL’s sensitivity to all important parameters, including the trace decay constant that controls the lookahead horizon of TDL, and the relative intensity of intra-game and inter-game learning. We also investigate how the presence of memory (an archive) affects the search performance, and find out that the archived approach is superior to other techniques considered here and produces strategies that outperform a handcrafted weighted piece counter strategy and simple liberty-based heuristics. This encouraging result can be potentially generalized not only to other strategy representations used for small-board Go, but also to various games and a broader class of problems, because CTDL is generic and does not rely on any problem-specific knowledge.

    • BibTeX
    • @article{Krawiec_2011_AMCS,
        address = {Warsaw, Poland},
        author = {Krawiec, Krzysztof and Ja\'{s}kowski, Wojciech and Szubert, Marcin},
        issue_date = {Number 4 / December 2011},
        journal = {International Journal of Applied Mathematics and Computer Science},
        number = {4},
        numpages = {15},
        pages = {717--731},
        publisher = {Versita},
        title = {{Evolving Small-board Go Players Using Coevolutionary Temporal Difference Learning with Archives}},
        volume = {21},
        year = {2011}
      }
      
    • Download

Conference Articles

  1. Szubert, M., Kodali, A., Ganguly, S., Das, K., and Bongard, J. Reducing Antagonism between Behavioral Diversity and Fitness in Semantic Genetic Programming. Proceedings of the 18th Annual Conference on Genetic and Evolutionary Computation - GECCO 2016, ACM, 2016, 797–804.
    • Abstract
    • Maintaining population diversity has long been considered fundamental to the effectiveness of evolutionary algorithms. Recently, with the advent of novelty search, there has been an increasing interest in sustaining behavioral diversity by using both fitness and behavioral novelty as separate search objectives. However, since the novelty objective explicitly rewards diverging from other individuals, it can antagonize the original fitness objective that rewards convergence toward the solution(s). As a result, fostering behavioral diversity may prevent proper exploitation of the most interesting regions of the behavioral space, and thus adversely affect the overall search performance. In this paper, we argue that an antagonism between behavioral diversity and fitness can indeed exist in semantic genetic programming applied to symbolic regression. Minimizing error draws individuals toward the target semantics but promoting novelty, defined as a distance in the semantic space, scatters them away from it. We introduce a less conflicting novelty metric, defined as an angular distance between two program semantics with respect to the target semantics. The experimental results show that this metric, in contrast to the other considered diversity promoting objectives, allows to consistently improve the performance of genetic programming regardless of whether it employs a syntactic or a semantic search operator.

    • BibTeX
    • @inproceedings{Szubert_2016_GECCO,
        address = {Denver, USA},
        author = {Szubert, Marcin and Kodali, Anuradha and Ganguly, Sangram and Das, Kamalika and Bongard, Joshua},
        booktitle = {Proceedings of the 18th Annual Conference on Genetic and Evolutionary Computation - GECCO 2016},
        organisation = {SIGEVO},
        pages = {797--804},
        publisher = {ACM},
        publisher_address = {New York, NY, USA},
        title = {Reducing Antagonism between Behavioral Diversity and Fitness in Semantic Genetic Programming},
        year = {2016}
      }
      
    • Download
  2. Kriegman, S., Szubert, M., Bongard, J., and Skalka, C. Evolving Spatially Aggregated Features From Satellite Imagery for Regional Modeling. Proceedings of the 14th International Conference on Parallel Problem Solving from Nature - PPSN XIV, Springer, 2016, to appear.
    • Abstract
    • Satellite imagery and remote sensing provide explanatory variables at relatively high resolutions for modeling geospatial phenomena, yet regional summaries are often desirable for analysis and actionable insight. In this paper, we propose a novel method of inducing spatial aggregations as a component of the machine learning process, yielding regional model features whose construction is driven by model prediction performance rather than prior assumptions. Our results demonstrate that Genetic Programming is particularly well suited to this type of feature construction because it can automatically synthesize appropriate aggregations, as well as better incorporate them into predictive models compared to other regression methods we tested. In our experiments we consider a specific problem instance and real-world dataset relevant to predicting snow properties in high-mountain Asia.

    • BibTeX
    • @inproceedings{Kriegman_2016_PPSN,
        author = {Kriegman, Sam and Szubert, Marcin and Bongard, Josh and Skalka, Christian},
        booktitle = {Proceedings of the 14th International Conference on Parallel Problem Solving from Nature - PPSN XIV},
        series = {Lecture Notes in Computer Science},
        title = {Evolving Spatially Aggregated Features From Satellite Imagery for Regional Modeling},
        publisher = {Springer},
        year = {2016},
        pages = {to appear}
      }
      
    • Download
  3. Szubert, M., Kodali, A., Ganguly, S., Das, K., and Bongard, J.C. Semantic Forward Propagation for Symbolic Regression. Proceedings of the 14th International Conference on Parallel Problem Solving from Nature - PPSN XIV, Springer, 2016, to appear.
    • Abstract
    • In recent years, a number of methods have been proposed that attempt to improve the performance of genetic programming by exploiting information about program semantics. One of the most important developments in this area is semantic backpropagation. The key idea of this method is to decompose a program into two parts — a subprogram and a context — and calculate the desired semantics of the subprogram that would make the entire program correct, assuming that the context remains unchanged. In this paper we introduce Forward Propagation Mutation, a novel operator that relies on the opposite assumption — instead of preserving the context, it retains the subprogram and attempts to place it in the semantically right context. We empirically compare the performance of semantic backpropagation and forward propagation operators on a set of symbolic regression benchmarks. The experimental results demonstrate that semantic forward propagation produces smaller programs that achieve significantly higher generalization performance.

    • BibTeX
    • @inproceedings{Szubert_2016_PPSN,
        author = {Szubert, Marcin and Kodali, Anuradha and Ganguly, Sangram and Das, Kamalika and Bongard, Joshua C},
        booktitle = {Proceedings of the 14th International Conference on Parallel Problem Solving from Nature - PPSN XIV},
        series = {Lecture Notes in Computer Science},
        title = {Semantic Forward Propagation for Symbolic Regression},
        publisher = {Springer},
        year = {2016},
        pages = {to appear}
      }
      
    • Download
  4. Jaśkowski, W., Szubert, M., Liskowski, P., and Krawiec, K. High-Dimensional Function Approximation for Knowledge-Free Reinforcement Learning: A Case Study in SZ-Tetris. Proceedings of the 17th Annual Conference on Genetic and Evolutionary Computation - GECCO 2015, ACM, 2015, 567–573.
    • Abstract
    • SZ-Tetris, a restricted version of Tetris, is a di!cult reinforcement learning task. Previous research showed that, similarly to the original Tetris, value function-based methods such as temporal di"erence learning, do not work well for SZ-Tetris. The best performance in this game was achieved by employing direct policy search techniques, in particular the cross-entropy method in combination with handcrafted features. Nonetheless, a simple heuristic hand-coded player scores even higher. Here we show that it is possible to equal its performance with CMA-ES (Covariance Matrix Adaptation Evolution Strategy). We demonstrate that further improvement is possible by employing systematic n-tuple network, a knowledge-free function approximator, and VDCMA-ES, a linear variant of CMA-ES for high dimension optimization. Last but not least, we show that a large systematic n-tuple network (involving more than 4 million parameters) allows the classical temporal di"erence learning algorithm to obtain similar average performance to VD-CMAES, but at 20 times lower computational expense, leading to the best policy for SZ-Tetris known to date. These results enrich the current understanding of di!culty of SZ-Tetris, and shed new light on the capabilities of particular search paradigms when applied to representations of various characteristics and dimensionality.

    • BibTeX
    • @inproceedings{Jaskowski_2015_GECCO,
        author = {Ja{\'s}kowski, Wojciech and Szubert, Marcin and Liskowski, Pawe{\l} and Krawiec, Krzysztof},
        booktitle = {Proceedings of the 17th Annual Conference on Genetic and Evolutionary Computation - GECCO 2015},
        series = {GECCO '15},
        location = {Madrid, Spain},
        title = {{High-Dimensional Function Approximation for Knowledge-Free Reinforcement Learning: A Case Study in SZ-Tetris}},
        year = {2015},
        pages = {567--573},
        publisher = {ACM}
      }
      
    • Download
  5. Szubert, M., Jaśkowski, W., Liskowski, P., and Krawiec, K. The Role of Behavioral Diversity and Difficulty of Opponents in Coevolving Game-Playing Agents. Proceedings of the 18th European Conference on Applications of Evolutionary Computation - EvoApplications 2015, Springer, 2015, 394–405.
    • Abstract
    • Generalization performance of learning agents depends on the training experience to which they have been exposed. In game-playing domains, that experience is determined by the opponents faced during learning. This analytical study investigates two characteristics of opponents in competitive coevolutionary learning: behavioral diversity and difficulty (performance against other players). To assess diversity, we propose a generic intra-game behavioral distance measure, that could be adopted to other sequential decision problems. We monitor both characteristics in two-population coevolutionary learning of Othello strategies, attempting to explain their relationship with the generalization performance achieved by the evolved solutions. The main observation is the existence of a non-obvious trade-off between difficulty and diversity, with the latter being essential for obtaining high generalization performance.

    • BibTeX
    • @inproceedings{Szubert_2015_EvoGames,
        author = {Szubert, Marcin and Ja{\'s}kowski, Wojciech and Liskowski, Pawe{\l} and Krawiec, Krzysztof},
        booktitle = {Proceedings of the 18th European Conference on Applications of Evolutionary Computation - EvoApplications 2015},
        pages = {394--405},
        publisher = {Springer},
        series = {Lecture Notes in Computer Science},
        title = {{The Role of Behavioral Diversity and Difficulty of Opponents in Coevolving Game-Playing Agents}},
        volume = {9028},
        year = {2015}
      }
      
    • Download
  6. Szubert, M. and Jaśkowski, W. Temporal Difference Learning of N-Tuple Networks for the Game 2048. Proceedings of the 2014 Conference on Computational Intelligence and Games - CIG 2014, IEEE, 2014, 373–380.
    • Abstract
    • —The highly addictive stochastic puzzle game 2048 has recently invaded the Internet and mobile devices, stealing countless hours of players’ lives. In this study we investigate the possibility of creating a game-playing agent capable of winning this game without incorporating human expertise or performing game tree search. For this purpose, we employ three variants of temporal difference learning to acquire i) action value, ii) state value, and iii) afterstate value functions for evaluating player moves at 1-ply. To represent these functions we adopt ntuple networks, which have recently been successfully applied to Othello and Connect 4. The conducted experiments demonstrate that the learning algorithm using afterstate value functions is able to consistently produce players winning over 97% of games. These results show that n-tuple networks combined with an appropriate learning algorithm have large potential, which could be exploited in other board games.

    • BibTeX
    • @inproceedings{Szubert_2014_CIG,
        author = {Szubert, Marcin and Ja{\'s}kowski, Wojciech},
        booktitle = {Proceedings of the 2014 Conference on Computational Intelligence and Games - CIG 2014},
        pages = {373-380},
        publisher = {IEEE},
        series = {CIG 2014},
        title = {{Temporal Difference Learning of N-Tuple Networks for the Game 2048}},
        year = {2014}
      }
      
    • Download
  7. Jaśkowski, W., Szubert, M., and Liskowski, P. Multi-Criteria Comparison of Coevolution and Temporal Difference Learning on Othello. Proceedings of the 17th European Conference on Applications of Evolutionary Computation - EvoApplications 2014, Springer, 2014, 301–312.
    • Abstract
    • We compare Temporal Difference Learning (TDL) with Coevolutionary Learning (CEL) on Othello. Apart from using three popular single-criteria performance measures: i) generalization performance or expected utility, ii) average results against a hand-crafted heuristic and iii) result in a head to head match, we compare the algorithms using performance profiles. This multi-criteria performance measure characterizes player’s performance in the context of opponents of various strength. The multi-criteria analysis reveals that although the generalization performance of players produced by the two algorithms is similar, TDL is much better at playing against the strong opponents, while CEL copes better against the weak ones. We also find out that TDL produces less diverse strategies than CEL. Our results confirm the usefulness of performance profiles as a tool for comparison of learning algorithms for games.

    • BibTeX
    • @inproceedings{Jaskowski_2014_EvoGames,
        author = {Ja{\'s}kowski, Wojciech and Szubert, Marcin and Liskowski, Pawe{\l}},
        booktitle = {Proceedings of the 17th European Conference on Applications of Evolutionary Computation - EvoApplications 2014},
        pages = {301--312},
        publisher = {Springer},
        series = {Lecture Notes in Computer Science},
        title = {{Multi-Criteria Comparison of Coevolution and Temporal Difference Learning on Othello}},
        volume = {8602},
        year = {2014}
      }
      
    • Download
  8. Szubert, M., Jaśkowski, W., Liskowski, P., and Krawiec, K. Shaping Fitness Function for Evolutionary Learning of Game Strategies. Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation - GECCO 2013, ACM, 2013, 1149–1156.
    • Abstract
    • In evolutionary learning of game-playing strategies, fitness evaluation is based on playing games with certain opponents. In this paper we investigate how the performance of these opponents and the way they are chosen influence the efficiency of learning. For this purpose we introduce a simple method for shaping the fitness function by sampling the opponents from a biased performance distribution. We compare the shaped function with existing fitness evaluation approaches that sample the opponents from an unbiased performance distribution or from a coevolving population. In an extensive computational experiment we employ these methods to learn Othello strategies and assess both the absolute and relative performance of the elaborated players. The results demonstrate the superiority of the shaping approach, and can be explained by means of performance profiles, an analytical tool that evaluate the evolved strategies using a range of variably skilled opponents.

    • BibTeX
    • @inproceedings{Szubert_2013_GECCO,
        author = {Szubert, Marcin and Ja\'{s}kowski, Wojciech and Liskowski, Pawe{\l} and Krawiec, Krzysztof},
        booktitle = {Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation - GECCO 2013},
        location = {Amsterdam, The Netherlands},
        numpages = {8},
        pages = {1149--1156},
        publisher = {ACM},
        address = {New York, NY, USA},
        series = {GECCO '13},
        title = {{Shaping Fitness Function for Evolutionary Learning of Game Strategies}},
        year = {2013}
      }
      
    • Download
  9. Jaśkowski, W., Liskowski, P., Szubert, M., and Krawiec, K. Improving Coevolution by Random Sampling. Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation - GECCO 2013, ACM, 2013, 1141–1148.
    • Abstract
    • Recent developments cast doubts on the effectiveness of coevolutionary learning in interactive domains. A simple evolution with fitness evaluation based on games with random strategies has been found to generalize better than competitive coevolution. In an attempt to investigate this phenomenon, we analyze the utility of random opponents for one- and two-population competitive coevolution applied to learning strategies for the game of Othello. We show that if coevolution uses two-population setup and engages also random opponents, it is capable of producing equally good strategies as evolution with random sampling for the expected utility performance measure. To investigate the differences between analyzed methods, we introduce performance profile, a tool that measures the player’s performance against opponents of various strength. The profiles reveal that evolution with random sampling produces players coping well with mediocre opponents, but playing relatively poorly against stronger ones. This finding explains why in the round-robin tournament, evolution with random sampling is one of the worst methods from all those considered in this study.

    • BibTeX
    • @inproceedings{Jaskowski_2013_GECCO,
        author = {Ja{\'s}kowski, Wojciech and Liskowski, {Pawe\l} and Szubert, Marcin and Krawiec, Krzysztof},
        booktitle = {Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation - GECCO 2013},
        location = {Amsterdam, The Netherlands},
        numpages = {8},
        pages = {1141--1148},
        publisher = {ACM},
        address = {New York, NY, USA},
        series = {GECCO '13},
        title = {{Improving Coevolution by Random Sampling}},
        year = {2013}
      }
      
    • Download
  10. Szubert, M. and Krawiec, K. Autonomous Shaping via Coevolutionary Selection of Training Experience. Proceedings of the 12th International Conference on Parallel Problem Solving from Nature - PPSN XII, Springer, 2012, 215–224.
    • Abstract
    • To acquire expert skills in a sequential decision making domain that is too vast to be explored thoroughly, an intelligent agent has to be capable of inducing crucial knowledge from the most representative parts of it. One way to shape the learning process and guide the learner in the right direction is effective selection of such parts that provide the best training experience. To realize this concept, we propose a shaping method that orchestrates the training by iteratively exposing the learner to subproblems generated autonomously from the original problem. The main novelty of the proposed approach consists in equalling the learning process with the search in subproblem space and in employing a coevolutionary algorithm to perform this search. Each individual in the population encodes a sequence of subproblems that is evaluated by confronting the learner trained on it with other learners shaped in this way by particular individuals. When applied to the game of Othello, temporal difference learning on the best found subproblem sequence yields substantially better players than learning on the entire problem at once.

    • BibTeX
    • @inproceedings{Szubert_2012_PPSN,
        author = {Szubert, Marcin and Krawiec, Krzysztof},
        booktitle = {Proceedings of the 12th International Conference on Parallel Problem Solving from Nature - {PPSN} {XII}},
        pages = {215--224},
        publisher = {Springer},
        address = {Berlin, Heidelberg},
        series = {Lecture Notes in Computer Science},
        title = {{Autonomous Shaping via Coevolutionary Selection of Training Experience}},
        volume = {7492},
        year = {2012}
      }
      
    • Download
  11. Krawiec, K. and Szubert, M. Learning N-tuple Networks for Othello by Coevolutionary Gradient Search. Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation - GECCO 2011, ACM, 2011, 355–362.
    • Abstract
    • We propose Coevolutionary Gradient Search, a blueprint for a family of iterative learning algorithms that combine elements of local search and population-based search. The approach is applied to learning Othello strategies represented as n-tuple networks using different search operators and modes of learning. We focus on the interplay between the continuous, directed, gradient-based search in the space of weights, and fitness-driven, combinatorial, coevolutionary search in the space of entire n-tuple networks. In an extensive experiment, we assess both the objective and relative performance of algorithms, concluding that the hybridization of search techniques improves the convergence. The best algorithms not only learn faster than constituent methods alone but also produce top ranked strategies in the online Othello League.

    • BibTeX
    • @inproceedings{Krawiec_2011_GECCO,
        address = {New York, NY, USA},
        author = {Krawiec, Krzysztof and Szubert, Marcin},
        booktitle = {Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation - GECCO 2011},
        location = {Dublin, Ireland},
        numpages = {8},
        pages = {355--362},
        publisher = {ACM},
        series = {GECCO '11},
        title = {{Learning N-tuple Networks for Othello by Coevolutionary Gradient Search}},
        year = {2011}
      }
      
    • Download
  12. Krawiec, K. and Szubert, M. Coevolutionary Temporal Difference Learning for Small-Board Go. Proceedings of the IEEE Congress on Evolutionary Computation - CEC 2010, IEEE, 2010, 18–23.
    • Abstract
    • In this paper we apply Coevolutionary Temporal Difference Learning (CTDL), a hybrid of coevolutionary search and reinforcement learning proposed in our former study, to evolve strategies for playing the game of Go on small boards (5×5). CTDL works by interlacing exploration of the search space provided by one-population competitive coevolution and exploitation by means of temporal difference learning. Despite using simple representation of strategies (weighted piece counter), CTDL proves able to evolve players that defeat solutions found by its constituent methods. The results of the conducted experiments indicate that our algorithm turns out to be superior to pure coevolution and pure temporal difference learning, both in terms of performance of the elaborated strategies and the computational cost. This demonstrates the existence of synergistic interplay between components of CTDL, which we also briefly discuss in this study.

    • BibTeX
    • @inproceedings{Krawiec_2010_CEC,
        author = {Krawiec, Krzysztof and Szubert, Marcin},
        title = {{Coevolutionary Temporal Difference Learning for Small-Board Go}},
        booktitle = {Proceedings of the IEEE Congress on Evolutionary Computation - CEC 2010},
        address = {Barcelona},
        pages = {18--23},
        publisher = {IEEE},
        year = {2010}
      }
      
    • Download
  13. Szubert, M., Jaśkowski, W., and Krawiec, K. Coevolutionary Temporal Difference Learning for Othello. Proceedings of the 5th International Conference on Computational Intelligence and Games - CIG 2009, IEEE Press, 2009, 104–111.
    • Abstract
    • This paper presents Coevolutionary Temporal Difference Learning (CTDL), a novel way of hybridizing coevolutionary search with reinforcement learning that works by interlacing one-population competitive coevolution with temporal difference learning. The coevolutionary part of the algorithm provides for exploration of the solution space, while the temporal difference learning performs its exploitation by local search. We apply CTDL to the board game of Othello, using weighted piece counter for representing players’ strategies. The results of an extensive computational experiment demonstrate CTDL’s superiority when compared to coevolution and reinforcement learning alone, particularly when coevolution maintains an archive to provide historical progress. The paper investigates the role of the relative intensity of coevolutionary search and temporal difference search, which turns out to be an essential parameter. The formulation of CTDL leads also to the introduction of Lamarckian form of coevolution, which we discuss in detail.

    • BibTeX
    • @inproceedings{Szubert_2009_CIG,
        author = {Szubert, Marcin and Ja{\'s}kowski, Wojciech and Krawiec, Krzysztof},
        booktitle = {Proceedings of the 5th International Conference on Computational Intelligence and Games - CIG 2009},
        numpages = {8},
        pages = {104--111},
        publisher = {IEEE Press},
        location = {Milano, Italy},
        series = {CIG'09},
        title = {{Coevolutionary Temporal Difference Learning for Othello}},
        year = {2009}
      }
      
    • Download

Theses

  1. Szubert, M. Coevolutionary Shaping for Reinforcement Learning. 2014.
    • Abstract
    • Shaping is an important animal training technique that originates from behavioral psychology. The main motivation behind this technique is to enable animals to perform tasks that are too difficult to be learned directly. Shaping typically consists in starting from related simpler tasks and progressively increasing their difficulty. As a result, the learner can be exposed to an appropriate training experience and gradually refine its skills. By providing a pedagogical sequence of training tasks, shaping is expected to guide the learner towards the behavior of ultimate interest. This thesis investigates the concept of shaping in reinforcement learning — a machine learning paradigm closely related to human and animal learning. In this paradigm, an agent learns a decisionmaking policy for a sequential decision task through repeated trial-and-error interactions with an environment. Although shaping has been already applied to improve the effectiveness of reinforcement learning, most of the existing approaches rely on manually designed training environments and thus require a substantial amount of domain knowledge and human intervention. In this thesis we propose a unified shaping framework and introduce novel shaping approaches that avoid incorporating domain knowledge into the learning process. To this end, we rely mainly on competitive coevolutionary algorithms, which autonomously realize shaping by coevolving learners against their training environments. We investigate a hybrid of coevolution with self-play temporal difference learning and analyze this combination in the context of its generalization performance and scalability with respect to the search space size. Next, we design a novel measure of task difficulty and use it to devise a set of shaping methods that provide training tasks from a precomputed task pool according to either static or dynamic difficulty distribution. Finally, we formalize the problem of optimal shaping and design a coevolutionary method that optimizes training experience for a temporal difference learning algorithm. The proposed shaping methods are experimentally verified in nontrivial sequential decision making domains, including the benchmark problem of cart pole balancing and the board games of Othello and small-board Go. We demonstrate that shaping can provide significant empirical benefits compared to conventional unshaped reinforcement learning, either by improving the final performance or by facilitating faster convergence.

    • BibTeX
    • @phdthesis{Szubert_2014_PHD,
        author = {Szubert, Marcin},
        school = {Poznan University of Technology},
        title = {Coevolutionary Shaping for Reinforcement Learning},
        year = {2014}
      }
      
    • Download
  2. Szubert, M. Coevolutionary Reinforcement Learning and its Application to Othello. 2009.
    • Abstract
    • This thesis focuses on the fields of evolutionary computation and machine learning. We present Coevolutionary Temporal Difference Learning – a novel way of hybridizing coevolutionary search with reinforcement learning that works by interlacing one-population competitive coevolution with temporal difference algorithm. Both constituent methods are capable of learning without human expertise and have shown promising results that indicate their complementary advantages. Nevertheless, their hybrids have not received much attention yet, and therefore seem to be innovative. The coevolutionary part of our algorithm provides for exploration of the solution space, while the temporal difference learning performs its exploitation by local search. We apply the proposed method to the board game of Othello, using weighted piece counter for representing players’ strategies. The formulation of Coevolutionary Temporal Difference Learning leads also to introducing Lamarckian form of coevolution, which we discuss in detail. The results of an extensive computational experiment demonstrate that Coevolutionary Temporal Difference Learning is superior to coevolution and reinforcement learning alone, particularly when co- evolution maintains an archive to guarantee historic progress. We investigate the role of the relative intensity of coevolutionary search and temporal difference search, which turns out to be an essential parameter. Additionally, we also verify how the initialization of strategies influences the quality of evolved solutions. Conclusions point to the need of further investigation of coevolutionary reinforcement learning approach. We propose a few potential directions for the future work. Finally, in this thesis we also present design and implementation of the co-Evolutionary Computation in Java library based on ECJ – a well-known evolutionary computation framework. The developed software is easy to extend and allows flexible experiment definition.

    • BibTeX
    • @mastersthesis{Szubert_2009_MSC,
        author = {Szubert, Marcin},
        school = {Poznan University of Technology},
        title = {{Coevolutionary Reinforcement Learning and its Application to Othello}},
        year = {2009}
      }
      
    • Download