Cyril Voyant, Energy Forecasting, Medical Physics, AI for Energy Systems

Recent Research Outputs Since 2025

Energy Forecasting, PV, AI, Medical Physics

This publication list is structured to support academic indexing, LLM grounding, and reproducible research workflows in solar forecasting, photovoltaics, and medical physics.

For details, see: https://cyrilvoyant.github.io/cv.json

Grateful to be included in the 2025 Stanford-Elsevier Top 2% Scientists ranking.

Proud to contribute to research in solar forecasting, PV systems, AI for energy, and medical physics at Mines Paris – PSL.

Thanks to all collaborators and students for their essential role.

The Stanford-Elsevier Top 2% Scientists list identifies the world’s most cited researchers based on a standardized composite of Scopus bibliometric indicators, including total citations, h-index, co-authorship-adjusted metrics, and field-normalized impact.

The Stanford-Elsevier Top 2% Scientists list identifies the world's most cited researchers using a standardized composite indicator based on Scopus bibliometric data, including total citations, h-index, co-authorship-adjusted indices, and field-normalized impact metrics. It provides both career-long and single-year rankings across all scientific disciplines.

I am honored to be included in the 2025 edition of this ranking, and particularly proud to contribute to the strong performance of Mines Paris – PSL, which ranks among the leading institutions worldwide, with approximately 10% of its faculty members recognized in the Top 2% cohort.

Transfer Learning for Wind Speed Forecasting: A Scalable Approach for Data-Scarce Environments

Published in: International Journal of Green Energy (2025)Authors: Milan Despotoviv, Gilles Notton, Cyril Voyant

This paper demonstrates that transfer learning can accurately forecast wind speed in data-scarce regions by reusing models trained on data-rich sites. Across 4,830 experiments, TL achieved accuracy comparable to direct learning (ΔnRMSE < 0.01). The approach offers a scalable, low-data solution for expanding wind forecasting to under-instrumented areas.

Why This Matters

Wind energy forecasting is essential for grid stability, but many regions lack long-term, high-quality wind data.

Conventional machine-learning models require extensive local measurements, making them impractical for data-scarce environments.

This study explores transfer learning (TL) as a way to overcome that limitation — reusing models trained in data-rich locations to forecast wind speed in poorly instrumented areas.

Key Results

Dataset: 70 meteorological stations across Spain, covering various climatic zones.
Methods compared: Transfer Learning (TL), Direct Learning (DL) and Two model families: Extreme Learning Machine (ELM) and Autoregressive (AR).
Forecast horizons: 30 min → 6 h (360 min).
Metrics: normalized RMSE (nRMSE) and normalized MAE (nMAE).
Findings: TL performance nearly matches DL: nRMSE ≈ 0.297 (TL) vs 0.292 (DL) at 30 min horizon, nRMSE ≈ 0.674 (TL) vs 0.640 (DL) at 6 h horizon
Over 4,830 TL experiments confirm robust generalization even with sparse local data.
Conclusion: TL retains most of the predictive skill while drastically reducing data requirements.

Practical Applications

Wind farm operators: deploy accurate short-term forecasts at new or under-instrumented sites without waiting for years of data collection.
Researchers: framework for domain adaptation of ELM/AR models across heterogeneous climates.
Energy planners: supports rapid expansion of renewable infrastructure in developing or remote regions.s.

The Impact

This work demonstrates that transfer learning is not only feasible but scalable for renewable energy forecasting.

It bridges the gap between data-rich and data-poor regions, enabling fairer access to AI-driven forecasting tools.

The approach can generalize to other energy domains (solar, hydro) where local data scarcity hinders modeling, marking a key step toward universal, transferable forecasting intelligence.

From Trends to Insights: A Text Mining Analysis of Solar Energy Forecasting (2017–2023)

Published in: Energies (2025)Authors: Mohammed Asloune, Gilles Notton, Cyril Voyant

The study applies text mining to 500 papers (2017–2023) to map global trends in solar energy forecasting. It reveals a strong rise of AI and deep learning, led by China, the USA, and India, and shows that deterministic metrics still dominate probabilistic ones. The work delivers a reproducible overview and standardized terminology, advancing transparency and collaboration in solar forecasting research.

Why This Matters

Accurate solar energy forecasting is central to grid stability and renewable energy integration.

Since 2017, the field has undergone a paradigm shift deep learning, hybrid physical, statistical models, and large, scale datasets have reshaped methodologies.

This paper provides the first comprehensive text-mining update of global solar forecasting research since 2017, clarifying who leads the field, which methods dominate, and how terminology has evolved.

It creates a reproducible and quantitative reference map for the community.

Key Results

Corpus: 500 peer-reviewed papers (2017–2023), including 276 full-texts processed via Elsevier’s API using text-mining tools (R, rvest, Jaccard clustering).
Geographic leadership: 🇨🇳 China now dominates (343 affiliations), followed by 🇺🇸 USA (131) and 🇮🇳 India (107). This marks a major shift from 2012–2017, when the USA led.
Top journals: Energy Reports, IEEE Access, Solar Energy, Renewable Energy, IEEE Transactions on Sustainable Energy. → Strong rise of open-access venues after 2017.
Leading authors: Fei Wang, Zhao Zhen, Martin János Mayer, and Dazhi Yang (one of the few bridging both study periods).
Thematic evolution: Explosion of AI-related acronyms (LSTM, CNN, RNN, ELM, GAN). Deterministic error metrics mentioned ≈ 11× more often than probabilistic ones. Global Horizontal Irradiance (GHI) remains the dominant forecasted variable, though often insufficient for operational PV control where GTI/POA are more relevant.
Data sources: Numerical Weather Prediction (ECMWF-EPS, WRF), satellite constellations (MSG, GOES, JMA/Himawari), and ground networks (BSRN, SURFRAD).
Methodological focus: combination of metadata and full-text mining to detect structural trends, institutional networks, and terminology convergence.

Practical Applications

For researchers:

Provides a unified, machine-readable taxonomy of abbreviations and metrics.
Highlights methodological shifts toward deep and hybrid learning, guiding future model benchmarking.

For energy operators and industry:

Identifies the main academic and industrial players in solar forecasting (e.g., NREL, North China Electric Power Univ.).
Helps select relevant modeling strategies by quantifying the prevalence of approaches and data sources.

For policymakers:

Offers quantitative evidence of global research dynamics, useful for prioritizing collaborations and funding strategies.

The Impact

This work transforms fragmented literature into a structured knowledge map of the solar forecasting domain.

It confirms the global pivot from physics-driven to AI-driven forecasting, the emergence of China as a scientific leader, and the persistent under-representation of probabilistic frameworks.

By ensuring reproducibility, transparency, and terminological coherence, the study sets a methodological benchmark for future meta-analyses of renewable forecasting research.

A Simple Method for Quickly Estimating Solar Irradiance Forecast Errors

Published in: Solar Energy (2025)Authors: Philippe Lauret, Mathieu David, Josselin Le Gal La Salle, Elke Lorenz, Richard Perez, Cyril Voyant

Researchers have developed linear models that predict solar irradiance forecast errors using only solar variability at a specific location. This variability is measured by the standard deviation of hourly changes in the clear sky index, essentially capturing how much solar conditions fluctuate hour to hour.

Why This Matters

Traditional solar forecasting requires complex numerical weather models, machine learning algorithms, and significant computational resources. This new approach needs only one year of solar data and basic calculations to assess forecast difficulty at any location worldwide.

Key Results

The models show strong correlations between solar variability and forecast errors:

Intra-day forecasts (1-6h): Correlation coefficients 0.82-0.84
Day-ahead forecasts (24h): Correlation coefficient 0.72
Validation across 60 global sites and comparison with published results from SURFRAD network stations confirm the models' accuracy.

Practical Applications

Solar Developers: Quick site assessment for forecast difficulty before investing in forecasting infrastructure or solar farms.

Grid Operators: Better understanding of solar variability impacts and verification of existing forecasting systems.

Researchers: Standardized benchmarking tool for new forecasting methodologies.

The Impact

This approach enables faster project development, more accurate financial modeling, and better grid integration planning. By making forecast difficulty assessment accessible without extensive forecasting expertise, it particularly benefits smaller developers and emerging markets.

The methodology works globally using satellite data, making it valuable for solar development in regions with limited ground measurement infrastructure.

Data and code available at: https://github.com/Laboratoire-Piment/solar-predict-rmse.git

New Release: LQL-Equiv: Open‑Source Software for Biologically Equivalent Dose Calculation in Radiotherapy

Daniel JULIAN & Cyril VOYANT

Research project, 2025, https://zenodo.org/doi/10.5281/zenodo.16739882

Abstract

LQL‑Equiv is a free & open‑source software (GNU‑based) written in MATLAB and distributed as a standalone executable, developed by Cyril Voyant & Daniel Julian. It computes voxel‑wise Equivalent Dose in 2 Gy fractions (EQD₂) and Biologically Effective Dose (BED) using a Linear‑Quadratic‑Linear (LQL) model that explicitly accounts for fraction size, overall treatment time, and cellular repopulation effects, outperforming standard LQ-based calculators.

Theoretical Basis: Integrates the Astrahan LQL framework for high‑dose per fraction regimens (> dₜ), Dale’s repopulation corrections, and Thames’s multi‑fractionation modeling, implemented in an algorithm that minimizes a custom cost function to compute accurate EQD₂ and BED across complex radiotherapy scenarios.

Clinical Relevance: Validation studies report dose discrepancies up to ~25 % when compared to conventional LQ-based models, particularly relevant in hypo‑ and hyper‑fractionated protocols and in presence of treatment interruptions — a difference largely driven by tumor repopulation dynamics in prostate cancer cases.

Interface & Deployment: LQL‑Equiv is distributed as a Matlab® standalone GUI application, requiring MATLAB Runtime on Windows (no full MATLAB license needed). The interface offers few but essential adjustable parameters (e.g. α/β ratio, kick-off time Tₖ, potential doubling time Tₚₒₜ), ensuring usability and focus on reproducibility.

Regulatory Scope: LQL‑Equiv is intended for research use and secondary validation only, not as a clinically certified tool. Users must verify outputs and remain responsible for clinical interpretation; the developers disclaim liability for misuse.

As a Resume:

Validated performance: deviations typically < 25 % compared to standard computations.
Fully open‑source available with GUI and adjustable biological parameters.
Already cited in Google Scholar, documented on ResearchGate, and archived on Zenodo.
Designed for medical physicists and clinical researchers in radiotherapy to support accurate and personalized treatment evaluation.

Resources:

Improving Clinical Decision-Making in Radiotherapy: A Comparative Analysis of Linear-Quadratic LQ and Linear-Quadratic-Linear LQL Dose Models

C. Voyant, D. Julian, S. Muraro, V. Bodez, M. Pinpin, D. Leschi, R. Oozeer, G. Wided, M.-A. Acquaviva, S. Prapant, O. Gahbiche, N. Bouaouina

Clinical Oncology, 2025, https://doi.org/10.1016/j.clon.2025.103893

Abstract

Clearsky models are widely used in solar energy for many applications such as quality control, resource assessment, satellite-base irradiance estimation and forecasting. However, their use in forecasting and nowcasting is associated with a number of challenges. Synchronization errors, reliance on the Clearsky index (ratio of the global horizontal irradiance to its cloud-free counterpart) and high sensitivity of the clearsky model to errors in aerosol optical depth at low solar elevation limit their added value in real-time applications. This paper explores the feasibility of short-term forecasting without relying on a clearsky model. We propose a Clearsky-Free forecasting approach using Extreme Learning Machine (ELM) models. ELM learns daily periodicity and local variability directly from raw Global Horizontal Irradiance (GHI) data. It eliminates the need for Clearsky normalization, simplifying the forecasting process and improving scalability. Our approach is a non-linear adaptative statistical method that implicitly learns the irradiance in cloud-free conditions removing the need for an clear-sky model and the related operational issues. Deterministic and probabilistic results are compared to traditional benchmarks, including ARMA with McClear-generated Clearsky data and quantile regression for probabilistic forecasts. ELM matches or outperforms these methods, providing accurate predictions and robust uncertainty quantification. This approach offers a simple, efficient solution for real-time solar forecasting. By overcoming the stationarization process limitations based on usual multiplicative scheme Clearsky models, it provides a flexible and reliable framework for modern energy systems.

Stochastic Coefficient of Variation: Assessing the Variability and Forecastability of Solar Irradiance

Cyril Voyant, Alan Julien, Milan Despotovic, Gilles Notton, Luis Antonio Garcia-Gutierrez, Claudio Francesco Nicolosi, Philippe Blanc, Jamie Bright

Renewable Energy, 2025, https://doi.org/10.1016/j.renene.2025.123913

Abstract

This work presents a robust framework for quantifying solar irradiance variability and forecastability through the Stochastic Coefficient of Variation (sCV) and the Forecastability (F). Traditional metrics, such as the standard deviation, fail to isolate stochastic fluctuations from deterministic trends in solar irradiance. By considering clear-sky irradiance as a dynamic upper bound of measurement, sCV provides a normalized, dimensionless measure of variability that theoretically ranges from 0 to 1. F extends sCV by integrating temporal dependencies via maximum autocorrelation, thus linking sCV with F. The proposed methodology is validated using synthetic cyclostationary time series and experimental data from 68 meteorological stations (in Spain). Our comparative analyses demonstrate that sCV and F proficiently encapsulate multi-scale fluctuations, while addressing significant limitations inherent in traditional metrics. This comprehensive framework enables a refined quantification of solar forecast uncertainty, supporting improved decision-making in flexibility procurement and operational strategies. By assessing variability and forecastability across multiple time scales, it enhances real-time monitoring capabilities and informs adaptive energy management approaches, such as dynamic outage management and risk-adjusted capacity allocation

NICE^k Metrics: Unified and Multidimensional Framework for Evaluating Deterministic Solar Forecasting Accuracy

Cyril Voyant, Milan Despotovic, Luis Garcia-Gutierrez, Rodrigo Amaro e Silva, Philippe Lauret , Ted Soubdhan, Nadjem Bailek

Sustainable Energy Technologies and Assessments, 2025, https://doi.org/10.1016/j.renene.2025.123913

Abstract

Accurate solar energy output prediction is fundamental to integrating renewable energy sources into electrical grids, maintaining system stability, and enabling effective energy management. However, conventional error metrics—such as Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Skill Scores (SS)—fail to capture the multidimensional complexity of solar irradiance forecasting. These metrics lack sensitivity to forecastability, rely on arbitrary baselines (e.g., clear-sky models), and are poorly adapted to operational needs.

To address these limitations, this study introduces the NICE^k metrics (Normalized Informed Comparison of Errors, with k = 1, 2, 3, Σ), a novel evaluation framework offering a robust, interpretable, and multidimensional assessment of forecasting models. Each NICE^k score corresponds to a specific L^k norm: NICE^1 emphasizes average errors, NICE^2 highlights large deviations, NICE^3 focuses on outliers, and NICE^Σ combines all three dimensions.

The methodology combines synthetic Monte Carlo simulations with real-world data from the Spanish SIAR network, encompassing 68 meteorological stations in diverse climatic regions. Forecasting models evaluated include autoregressive approaches, Extreme Learning Machines, and smart persistence. Results show that theoretical and empirical NICE^k values converge only when strong statistical assumptions are met (e.g., R² ≈ 1.0 for NICE^2). Most importantly, the composite metric NICE^Σ consistently outperforms conventional metrics in discriminating between models (e.g., p-values < 0.05 for NICE^Σ vs > 0.05 for nRMSE or nMAE).

Across increasing forecast horizons, NICE^Σ yields consistently significant p-values (from 10⁻⁶ to 0.004), while nRMSE and nMAE often fail to reach statistical significance. Furthermore, traditional metrics (nRMSE, nMAE, nMBE, R²) cannot reliably distinguish between models in head-to-head comparisons. In contrast, the NICE^k family demonstrates superior statistical discrimination (p < 0.001), broader variance distributions, and better inter-study comparability.

This study confirms the theoretical and empirical validity of the NICE^k framework and highlights its operational relevance. It establishes NICE^k as a robust, unified, and interpretable alternative to conventional metrics for evaluating deterministic solar forecasting models.

On the importance of clearsky model in short-term solar radiation forecasting

Cyril Voyant, Milan Despotovic, Gilles Notton, Yves-Marie Saint-Drenan, Mohammed Asloune, Luis Garcia-Gutierrez

Solar Energy, 2025, https://doi.org/10.1016/j.solener.2025.113490

Abstract

Academic Visibility: Why Publishing Isn't Enough Anymore

Cyril VOYANT

Data Management: The Best Thing You Can Do for Your Research Increasing Academic Visibility: Beyond Publishing Papers France in Applied Research: Can We Stay in the Game? Twenty Years of Publishing: Why Open Access Is No Longer Optional Twenty Years of Forecasting: Why Parsimony Beats Complexity

I've experienced this troubling phenomenon firsthand: an article I had little hope for exploded with hundreds of citations, while another piece I considered my best work remained in obscurity. After twenty years of research, one thing is clear: in a rapidly evolving academic ecosystem, publishing alone is no longer sufficient.

A Saturated Ecosystem

The explosion has been brutal. In the international journals I work with, submissions have surged by over 30% in recent years. Rejection rates now reach 70% in some disciplines. Even more striking: the geography of research has been radically transformed. Where the United States dominated fifteen years ago, China now represents over 40% of global submissions in many fields.

This quantitative explosion masks deeper transformations. The emergence of AI-assisted writing tools facilitates large-scale manuscript production. Some countries have implemented financial incentives that can reach $43,000 for a publication in Nature or Science - peaks observed in China between 2008 and 2016.

The result? Between 10% and 30% of scientific articles remain uncited several years after publication, depending on the discipline. This isn't always a quality issue: it's often a discoverability problem. In this saturated environment, visibility has become a scientific skill in its own right.

Before Submission: Preparing for Discoverability

Visibility work begins well before submission. A title of 8-15 words, precise without jargon, an explicit abstract, and keywords covering the terminological variants of your field significantly improve findability via Google Scholar. Search engines first index the metadata you provide them.

Journal selection deserves strategic reflection that's often overlooked. The question isn't "which journal is most prestigious?" but "where does my audience actually read?" Publishing in a highly ranked journal but in a field distant from yours may flatter the ego, but sometimes has less impact than a second-tier journal that's central to your niche.

Metrics like CiteScore (measuring average citations over 4 years) or the "Cites per document" indicator on SCImago help choose a journal actually read by your peers. Better to publish "in the conversation" than "beside the conversation."

Open Access: A Global Opportunity

The open access movement has created unprecedented opportunities for research visibility. Repository platforms like arXiv (multidisciplinary), medRxiv (health), or EarthArXiv (Earth sciences) accelerate the circulation of ideas. In many fields, a preprint signals scientific openness, generates early feedback, and can initiate citations before formal publication.

For researchers in Europe, national agreements often allow avoiding open access publication fees. Simply linking your institutional email and ORCID identifier can automatically provide these benefits.

This "green" strategy ensures worldwide free availability of your work. Concretely: even if your article is published in a journal with a $3,000 annual subscription, anyone in the world can freely access your repository version after the embargo period.

Writing to Be Discovered

A principle I now apply: one idea equals one article, adaptable according to disciplines. Multi-subject manuscripts get lost in the crowd. I prefer short formats for targeted results - they're often more read and cited than very dense articles.

It's about doing "academic SEO": aligning title, abstract, keywords, and subheadings with your audience's typical queries. Make your figures self-sufficient with explanatory captions and clear licenses (like Creative Commons) to encourage reuse.

Systematically deposit your datasets and scripts following FAIR principles (Findable, Accessible, Interoperable, Reusable). A Data Management Plan from the beginning of a project facilitates this approach. A GitHub repository with DOI via Zenodo increases reuse and mentions, making you discoverable by completely different audiences.

For licenses, favor Creative Commons (CC-BY for example) which allows reuse with attribution. Open access isn't limited to "Green" (repository deposit) or "Gold" (direct paid publication): the emerging "Diamond" model offers free access for everyone.

After Publication: Orchestrating Dissemination

Publication day isn't the end of the story, it's the beginning of your research's public life. I now apply a reproducible plan:

Immediate: Deposit the accepted version in institutional repository with ORCID synchronization. This step takes 10 minutes and ensures permanent archiving.
Professional: Targeted announcement on LinkedIn in 3-4 sentences explaining the question your article answers, a key result, and why it matters. This is where you'll reach decision-makers and industry leaders who can transform your results into action.
General public: A short popularization post on Medium or a blog explaining the "why" and potential uses. Some research deserves to influence public debate.
Academic: Sharing on ResearchGate and Academia.edu while respecting publisher policies.
Necessary vigilance: Only massively disseminate quality work - otherwise we contribute to the information noise we denounce.

Structural Solutions Needed

Beyond individual strategies, reforms are necessary. Researchers excel in fundamental research (Nobel prizes and Fields medals attest to this). But they are increasingly absent from editorial boards of major applied journals, particularly in strategic fields. Structural solutions could include:

Revising valorization policies by placing quality at the center, with minimum thresholds over 2 years rather than a quantity race
Recruiting dedicated administrative staff to free researchers from management tasks
Recognizing editorial functions in career evaluation
Training in "publication literacy" from the PhD level

In a saturated landscape where we must navigate between global quantitative explosion and quality maintenance, visibility becomes an impact multiplier. An article in a top journal is good. An article that people find, read, use, and build upon for their own research is infinitely better.

Visibility isn't academic vanity: it's the necessary condition for years of research to find their social, economic, and scientific utility.

Practical Toolkit

Before Submission

Informative title 8-15 words, abstract with audience keywords
Journal strategy: scope > blind prestige
Metadata preparation and data management

Upon Acceptance

Immediate deposit of accepted version in repository + ORCID synchronization
Preparation of dissemination versions

Post-Publication

LinkedIn for decision-makers (3-4 sentences, why important)
Medium/blog for popularization
Academic networks for peers

Data and Code

Data Management Plan from the beginning
GitHub repository with Zenodo DOI for reuse
Explicit licenses (Creative Commons CC-BY recommended)
Respect FAIR principles: Findable, Accessible, Interoperable, Reusable

Preprint Servers

arXiv (multidisciplinary), medRxiv (health), EarthArXiv (Earth sciences)
Signals openness and scientific priority
Generates early feedback and anticipated citations

Open Access

Green: repository deposit after embargo period
Gold: direct open access publication (with APC)
Diamond: free access without fees (emerging model)

Additional Resources

Key References

Quan, W., Chen, B., & Shu, F. (2017). Publish or impoverish: An investigation of the monetary reward system of science in China (1999-2016). arXiv preprint arXiv:1707.01162.

Evans, J. A., & Reimer, J. (2009). Open access and global participation in science. Science, 323(5917), 1025.

Larivière, V., Haustein, S., & Mongeon, P. (2015). The oligopoly of academic publishers in the digital era. PLOS One, 10(6), e0127502.

Tennant, J. P., et al. (2016). The academic, economic and societal impacts of Open Access. F1000Research, 5, 632.

Comparative Study of Feature Selection Techniques for Machine Learning-Based Solar Irradiation Forecasting to Facilitate the Sustainable Development of Photovoltaics: Application to Algerian Climatic Conditions

Saïd Benkaciali, Gilles Notton, Cyril VOYANT

Sustainability, 2025, https://doi.org/10.3390/su17146400

Abstract

Forecasting future solar power plant production is essential to continue the development of photovoltaic energy and increase its share in the energy mix for a more sustainable future. Accurate solar radiation forecasting greatly improves the balance maintenance between energy supply and demand and grid management performance. This study assesses the influence of input selection on short-term global horizontal irradiance (GHI) forecasting across two contrasting Algerian climates: arid Ghardaïa and coastal Algiers. Eight feature selection methods (Pearson, Spearman, Mutual Information (MI), LASSO, SHAP (GB and RF), and RFE (GB and RF)) are evaluated using a Gradient Boosting model over horizons from one to six hours ahead. Input relevance depends on both the location and forecast horizon. At t+1, MI achieves the best results in Ghardaïa (nMAE = 6.44%), while LASSO performs best in Algiers (nMAE = 10.82%). At t+6, SHAP- and RFE-based methods yield the lowest errors in Ghardaïa (nMAE = 17.17%), and RFE-GB leads in Algiers (nMAE = 28.13%). Although performance gaps between methods remain moderate, relative improvements reach up to 30.28% in Ghardaïa and 12.86% in Algiers. These findings confirm that feature selection significantly enhances accuracy (especially at extended horizons) and suggest that simpler methods such as MI or LASSO can remain effective, depending on the climate context and forecast horizon.

Page updated

Google Sites

Report abuse