see also for CMA-ES papers
Update May 23, 2014
1 William P. Thurston. On proof and progress in mathematics. In:>math.HO (1994).
2 Sebastien Bubeck. Theory of Convex Optimization for Machine Learning. monograph (2014).
3 Kevin Jamieson , Matthew Malloy , Robert Nowak, and Sebastien Bubeck. lil’ UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits. In:>stat.ML (2013).
4 Philippe Preux, Remi Munos and Michal Valko. Bandits attack function optimization. In: IEEE Congress on Evolutionary Computation (2014).
5 Razvan Pascanu and Yoshua Bengio. Revisiting natural gradient for deep networks. In:>cs.LG (2014).
6 Dirk Sudholt. How Crossover Speeds Up Building-Block Assembly in Genetic Algorithms. In:>cs.NE (2014).

Update December 11, 2013
1 Jorg Bremer and Michael Sonnenschein. Constraint-handling for Optimization with Support Vector Surrogate Models. A Novel Decoder Approach. In: to appear.
2 Tobias Glasmachers and Urun Dogan. Accelerated Coordinate Descent with Adaptive Coordinate Frequencies. In: ACML (2013).
3 Thomas Back, Christophe Foussette and Peter Krause. Contemporary Evolution Strategies. Chapter 4. In: Springer Book (2014).
4 Ali Ahraria and Masoud Shariat-Panahi. An improved evolution strategy with adaptive population size. In: Optimization journal (2013).

Update October 6, 2013
1 Jeremy Bensadon. Black-box optimization using geodesics in statistical manifolds. In:>math (2013).
2 Tobias Glasmachers. A natural evolution strategy with asynchronous strategy updates. In: GECCO (2013).
3 Freek Stulp and Olivier Sigaud. Policy Improvement: Between Black-Box Optimization and Episodic Reinforcement Learning. In: JFPDA (2013).
4 Frank Hutter, Holger Hoos and Kevin Leyton-Brown. An Evaluation of Sequential Model-Based Optimization for Expensive Blackbox Functions. In: GECCO-BBOB (2013).
5 Ouassim Ait ElHara, Anne Auger and Nikolaus Hansen. A Median Success Rule for Non-Elitist Evolution Strategies: Study of Feasibility. In: GECCO (2013).
6 Tom Schaul and Yann LeCun. Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients.>cs (2013).
7 Ata Kaban, Jakramate Bootkrajang and Robert J. Durrant. Towards Large Scale Continuous EDA: A Random Matrix Theory Perspective. In: GECCO (2013).
