References

doi:10.48550/arXiv.2307.10262

References

Abadi, Martin, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, et al. 2016. “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems.” arXiv e-Prints, March, arXiv:1603.04467.

Aggarwal, Charu, ed. 2007. Data Streams – Models and Algorithms. Springer-Verlag.

Arlot, Sylvain, Alain Celisse, et al. 2010. “A Survey of Cross-Validation Procedures for Model Selection.” Statistics Surveys 4: 40–79.

Bartz, Eva, Thomas Bartz-Beielstein, Martin Zaefferer, and Olaf Mersmann, eds. 2022. Hyperparameter Tuning for Machine and Deep Learning with R - A Practical Guide. Springer.

Bartz-Beielstein, Thomas. 2023a. “PyTorch Hyperparameter Tuning with SPOT: Comparison with Ray Tuner and Default Hyperparameters on CIFAR10.” https://github.com/sequential-parameter-optimization/spotpython/blob/main/notebooks/14_spot_ray_hpt_torch_cifar10.ipynb.

———. 2023b. “Hyperparameter Tuning Cookbook: A guide for scikit-learn, PyTorch, river, and spotpython.” arXiv e-Prints, July. https://doi.org/10.48550/arXiv.2307.10262.

———. 2024a. “Evaluation and Performance Measurement.” In, edited by Eva Bartz and Thomas Bartz-Beielstein, 47–62. Singapore: Springer Nature Singapore.

———. 2024b. “Hyperparameter Tuning.” In, edited by Eva Bartz and Thomas Bartz-Beielstein, 125–40. Singapore: Springer Nature Singapore.

———. 2024c. “Introduction: From Batch to Online Machine Learning.” In Online Machine Learning: A Practical Guide with Examples in Python, edited by Eva Bartz and Thomas Bartz-Beielstein, 1–11. Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-99-7007-0_1.

———. 2025. “Kriging (Gaussian Process Regression): The Complete Python Code for the Example.” https://sequential-parameter-optimization.github.io/Hyperparameter-Tuning-Cookbook/006_num_gp.html.

Bartz-Beielstein, Thomas, Jürgen Branke, Jörn Mehnen, and Olaf Mersmann. 2014. “Evolutionary Algorithms.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (3): 178–95.

Bartz-Beielstein, Thomas, Martina Friese, Martin Zaefferer, Boris Naujoks, Oliver Flasch, Wolfgang Konen, and Patrick Koch. 2011. “Noisy optimization with sequential parameter optimization and optimal computational budget allocation.” In Proceedings of the 13th Annual Conference Companion on Genetic and Evolutionary Computation, 119–20. New York, NY, USA: ACM.

Bartz-Beielstein, Thomas, and Lukas Hans. 2024. “Drift Detection and Handling.” In Online Machine Learning: A Practical Guide with Examples in Python, edited by Eva Bartz and Thomas Bartz-Beielstein, 23–39. Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-99-7007-0_3.

Bartz-Beielstein, Thomas, Christian Lasarczyk, and Mike Preuss. 2005. “Sequential Parameter Optimization.” In Proceedings 2005 Congress on Evolutionary Computation (CEC’05), Edinburgh, Scotland, edited by B McKay et al., 773–80. Piscataway NJ: IEEE Press.

Bartz-Beielstein, Thomas, and Martin Zaefferer. 2022. “Hyperparameter Tuning Approaches.” In Hyperparameter Tuning for Machine and Deep Learning with R - A Practical Guide, edited by Eva Bartz, Thomas Bartz-Beielstein, Martin Zaefferer, and Olaf Mersmann, 67–114. Springer.

Bifet, Albert. 2010. Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams. Vol. 207. Frontiers in Artificial Intelligence and Applications. IOS Press.

Bifet, Albert, and Ricard Gavaldà. 2007. “Learning from Time-Changing Data with Adaptive Windowing.” In Proceedings of the 2007 SIAM International Conference on Data Mining (SDM), 443–48.

———. 2009. “Adaptive Learning from Evolving Data Streams.” In Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII, 249–60. IDA ’09. Berlin, Heidelberg: Springer-Verlag.

Bifet, Albert, Geoff Holmes, Richard Kirkby, and Bernhard Pfahringer. 2010a. “MOA: Massive Online Analysis.” Journal of Machine Learning Research 99: 1601–4.

———. 2010b. “MOA: Massive Online Analysis.” Journal of Machine Learning Research 11: 1601–4.

Bischl, Bernd, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter, Stefan Coors, Janek Thomas, et al. 2023. “Hyperparameter Optimization: Foundations, Algorithms, Best Practices, and Open Challenges.” WIREs Data Mining and Knowledge Discovery 13 (2): e1484.

Bohachevsky, I O. 1986. “Generalized Simulated Annealing for Function Optimization.” Technometrics 28 (3): 209–17.

Box, G E P. 1957. “Evolutionary operation: A method for increasing industrial productivity.” Applied Statistics 6: 81–101.

Box, G. E. P., and J. S. Hunter. 1957. “Multi-Factor Experimental Designs for Exploring Response Surfaces.” The Annals of Mathematical Statistics 28 (1): 195–241.

Box, G. E. P., and K. B. Wilson. 1951. “On the Experimental Attainment of Optimum Conditions.” Journal of the Royal Statistical Society. Series B (Methodological) 13 (1): 1–45.

Chen, Chun Hung. 2010. Stochastic simulation optimization: an optimal computing budget allocation. World Scientific.

Chen, Ricky T. Q., Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. 2018. “Neural Ordinary Differential Equations.” arXiv e-Prints, June, arXiv:1806.07366.

Coello, Carlos A. Coello, Silvia González Brambila, Josué Figueroa Gamboa, and Ma. Guadalupe Castillo Tapia. 2021. “Multi-Objective Evolutionary Algorithms: Past, Present, and Future.” In, edited by Panos M. Pardalos, Varvara Rasskazova, and Michael N. Vrahatis, 137–62. Cham: Springer International Publishing.

Del Castillo, E., D. C. Montgomery, and D. R. McCarville. 1996. “Modified Desirability Functions for Multiple Response Optimization.” Journal of Quality Technology 28: 337–45.

Derringer, G., and R. Suich. 1980. “Simultaneous Optimization of Several Response Variables.” Journal of Quality Technology 12: 214–19.

Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” arXiv e-Prints, October, arXiv:1810.04805.

Domingos, Pedro M., and Geoff Hulten. 2000. “Mining High-Speed Data Streams.” In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, USA, August 20-23, 2000, edited by Raghu Ramakrishnan, Salvatore J. Stolfo, Roberto J. Bayardo, and Ismail Parsa, 71–80. ACM.

Dosovitskiy, Alexey, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, et al. 2020. “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.” arXiv e-Prints, October, arXiv:2010.11929.

Dredze, Mark, Tim Oates, and Christine Piatko. 2010. “We’re Not in Kansas Anymore: Detecting Domain Changes in Streams.” In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 585–95.

Emmerich, Michael T. M., and AndréH. Deutz. 2018. “A Tutorial on Multiobjective Optimization: Fundamentals and Evolutionary Methods.” Natural Computing 17 (3): 585–609.

Forrester, Alexander, András Sóbester, and Andy Keane. 2008. Engineering Design via Surrogate Modelling. Wiley.

Friedman, Jerome H. 1991. “Multivariate Adaptive Regression Splines.” The Annals of Statistics 19 (1): 1–67.

Gaber, Mohamed Medhat, Arkady Zaslavsky, and Shonali Krishnaswamy. 2005. “Mining Data Streams: A Review.” SIGMOD Rec. 34: 18–26.

Gama, João, Pedro Medas, Gladys Castillo, and Pedro Rodrigues. 2004. “Learning with Drift Detection.” In Advances in Artificial Intelligence – SBIA 2004, edited by Ana L. C. Bazzan and Sofiane Labidi, 286–95. Berlin, Heidelberg: Springer Berlin Heidelberg.

Gama, João, Raquel Sebastião, and Pedro Pereira Rodrigues. 2013. “On Evaluating Stream Learning Algorithms.” Machine Learning 90 (3): 317–46.

Gramacy, Robert B. 2020. Surrogates. CRC press.

Harington, J. 1965. “The Desirability Function.” Industrial Quality Control 21: 494–98.

Hartung, Joachim, Bärbel Elpert, and Karl-Heinz Klösener. 1995. Statistik. Oldenbourg.

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2017. The Elements of Statistical Learning. Second. Springer.

He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. “Deep Residual Learning for Image Recognition.”

———. 2016. “Identity Mappings in Deep Residual Networks.” arXiv e-Prints, March, arXiv:1603.05027.

Hoeglinger, Stefan, and Russel Pears. 2007. “Use of Hoeffding Trees in Concept Based Data Stream Mining.” 2007 Third International Conference on Information and Automation for Sustainability, 57–62.

Ikonomovska, Elena. 2012. “Algorithms for Learning Regression Trees and Ensembles on Evolving Data Streams.” PhD thesis, Jozef Stefan International Postgraduate School.

Ikonomovska, Elena, João Gama, and Sašo Džeroski. 2011. “Learning Model Trees from Evolving Data Streams.” Data Mining and Knowledge Discovery 23 (1): 128–68.

Jain, Sarthak, and Byron C. Wallace. 2019. “Attention is not Explanation.” arXiv e-Prints, February, arXiv:1902.10186.

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2014. An Introduction to Statistical Learning with Applications in R. 7th ed. Springer.

Johnson, M. E., L. M. Moore, and D. Ylvisaker. 1990. “Minimax and Maximin Distance Designs.” Journal of Statistical Planning and Inference 26 (2): 131–48.

Jones, Donald R., Matthias Schonlau, and William J. Welch. 1998. “Efficient Global Optimization of Expensive Black-Box Functions.” Journal of Global Optimization 13 (4): 455–92.

Karl, Florian, Tobias Pielok, Julia Moosbauer, Florian Pfisterer, Stefan Coors, Martin Binder, Lennart Schneider, et al. 2023. “Multi-Objective Hyperparameter Optimization in Machine Learning—an Overview.” ACM Trans. Evol. Learn. Optim. 3 (4).

Keane, Andrew J, and Prasanth B Nair. 2005. Computational Approaches for Aerospace Design: The Pursuit of Excellence. Wiley.

Keller-McNulty, Sallie, ed. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: Committee on Applied; Theoretical Statistics, National Research Council; National Academies Press.

Kidger, Patrick. 2022. “On Neural Differential Equations.” arXiv e-Prints, February, arXiv:2202.02435.

Kohavi, Ron. 1995. “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection.” In Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2, 1137–43. IJCAI’95. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

Kuhn, Max. 2016. “Desirability: Function Optimization and Ranking via Desirability Functions.”

Lewis, R M, V Torczon, and M W Trosset. 2000. “Direct search methods: Then and now.” Journal of Computational and Applied Mathematics 124 (1–2): 191–207.

Li, Lisha, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2016. “Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization.” arXiv e-Prints, March, arXiv:1603.06560.

Lippe, Phillip. 2022. “UvA Deep Learning Tutorials.” https://github.com/phlippe/uvadlc_notebooks/tree/master.

Liu, Liyuan, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. 2019. “On the Variance of the Adaptive Learning Rate and Beyond.” arXiv e-Prints, August, arXiv:1908.03265.

Manapragada, Chaitanya, Geoffrey I. Webb, and Mahsa Salehi. 2018. “Extremely Fast Decision Tree.” In KDD’ 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, edited by Chih-Jen Lin and Hui Xiong, 1953–62. United States of America: Association for Computing Machinery (ACM). https://doi.org/10.1145/3219819.3220005.

Masud, Mohammad, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani M Thuraisingham. 2011. “Classification and Novel Class Detection in Concept-Drifting Data Streams Under Time Constraints.” IEEE Transactions on Knowledge and Data Engineering 23 (6): 859–74.

Meignan, David, Sigrid Knust, Jean-Marc Frayet, Gilles Pesant, and Nicolas Gaud. 2015. “A Review and Taxonomy of Interactive Optimization Methods in Operations Research.” ACM Transactions on Interactive Intelligent Systems, September.

Micchelli, Charles A. 1986. “Interpolation of Scattered Data: Distance Matrices and Conditionally Positive Definite Functions.” Constructive Approximation 2 (1): 11–22. https://doi.org/10.1007/BF01893414.

Močkus, J. 1974. “On Bayesian Methods for Seeking the Extremum.” In Optimization Techniques IFIP Technical Conference, 400–404.

Montgomery, D C. 2001. Design and Analysis of Experiments. 5th ed. New York NY: Wiley.

Montiel, Jacob, Max Halford, Saulo Martiello Mastelini, Geoffrey Bolmier, Raphael Sourty, Robin Vaysse, Adil Zouitine, et al. 2021. “River: Machine Learning for Streaming Data in Python.”

Morris, Max D., and Toby J. Mitchell. 1995. “Exploratory Designs for Computational Experiments.” Journal of Statistical Planning and Inference 43 (3): 381–402. https://doi.org/https://doi.org/10.1016/0378-3758(94)00035-T.

Mourtada, Jaouad, Stephane Gaiffas, and Erwan Scornet. 2019. “AMF: Aggregated Mondrian Forests for Online Learning.” arXiv e-Prints, June, arXiv:1906.10529. https://doi.org/10.48550/arXiv.1906.10529.

Myers, Raymond H, Douglas C Montgomery, and Christine M Anderson-Cook. 2016. Response Surface Methodology: Process and Product Optimization Using Designed Experiments. John Wiley & Sons.

Nelder, J. A., and R. Mead. 1965. “A Simplex Method for Function Minimization.” The Computer Journal 7 (4): 308–13.

Nino, Esmeralda, Juan Rosas Rubio, Samuel Bonet, Nazario Ramirez-Beltran, and Mauricio Cabrera-Rios. 2015. “Multiple Objective Optimization Using Desirability Functions for the Design of a 3D Printer Prototype.” In.

“NIST/SEMATECH e-Handbook of Statistical Methods.” 2021.

Olsson, Donald M, and Lloyd S Nelson. 1975. “The Nelder-Mead Simplex Procedure for Function Minimization.” Technometrics 17 (1): 45–51.

Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, et al. 2011. “Scikit-Learn: Machine Learning in Python.” Journal of Machine Learning Research 12: 2825–30.

Poggio, T, and F Girosi. 1990. “Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks.” Science 247 (4945): 978–82. https://doi.org/10.1126/science.247.4945.978.

Pontryagin. 1987. Mathematical Theory of Optimal Processes. Routledge.

Putatunda, Sayan. 2021. Practical Machine Learning for Streaming Data with Python. Springer.

Raymer, Daniel P. 2006. Aircraft Design: A Conceptual Approach. AIAA.

Rummel, R. J. 1976. “Understanding Correlation.” https://www.hawaii.edu/powerkills/UC.HTM.

Sacks, J, W J Welch, T J Mitchell, and H P Wynn. 1989. “Design and analysis of computer experiments.” Statistical Science 4 (4): 409–35.

Santner, T J, B J Williams, and W I Notz. 2003. The Design and Analysis of Computer Experiments. Berlin, Heidelberg, New York: Springer.

Street, W. Nick, and YongSeog Kim. 2001. “A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification.” In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 377–82. KDD ’01. New York, NY, USA: Association for Computing Machinery.

Tay, Yi, Mostafa Dehghani, Dara Bahri, and Donald Metzler. 2020. “Efficient Transformers: A Survey.” arXiv e-Prints, September, arXiv:2009.06732.

Vapnik, V N. 1998. Statistical learning theory. Wiley; Wiley.

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” arXiv e-Prints, June, 1–15.

Wang, Zhiqiang. 2007. “Two Postestimation Commands for Assessing Confounding Effects in Epidemiological Studies.” The Stata Journal 7 (2): 183–96.

Weihe, Karsten, Ulrik Brandes, Annegret Liebers, Matthias Mı̈ ller-Hannemann, Dorothea Wagner, and Thomas Willhalm. 1999. “Empirical Design of Geometric Algorithms.” In SCG ’99: Proceedings of the Fifteenth Annual Symposium on Computational Geometry, 86–94. New York NY: Association for Computing Machinery.

Wiegreffe, Sarah, and Yuval Pinter. 2019. “Attention is not not Explanation.” arXiv e-Prints, August, arXiv:1908.04626.

Wikipedia contributors. 2024. “Partial Correlation — Wikipedia, the Free Encyclopedia.” https://en.wikipedia.org/w/index.php?title=Partial_correlation&oldid=1253637419.