Reinforcement Learning for Continuous PV Power Forecasting with Reward-Based Model Updates

Co-Supervised by: Fereshteh Jafari

If you are interested in this topic or have further questions, do not hesitate to contact fereshteh.jafari@unibe.ch.

Background / Context

Traditional supervised learning approaches for PV power prediction optimize for statistical accuracy metrics (MAE, RMSE) but do not directly consider the operational context where predictions are used. In real-world energy systems, the cost of prediction errors varies depending on the situation-underestimating solar generation during peak demand leads to penalties as well as overestimating during low demand periods. Reinforcement Learning (RL) offers a framework to directly optimize for operational objectives while continuously adapting to changing conditions. By designing reward functions that incorporate both prediction accuracy and operational costs, RL agents can learn to make predictions that are not just statistically optimal but also operationally valuable. This approach is particularly promising for online learning scenarios where the model must adapt to non-stationary environments and changing system dynamics, and where reward functions do not have simple relations with standard accuracy metrics.

Research Question(s) / Goals

The research aims to investigate whether reinforcement learning can provide superior PV power predictions compared to traditional supervised learning approaches by:

Developing RL frameworks that optimize for operational objectives rather than just statistical accuracy
Analyze adjustment of energy prices to separate systematic and random parts
Designing reward functions that capture the prediction errors
Enabling continuous model adaptation through online policy updates
Handling non-stationary environments with changing weather patterns and system characteristics

Approach / Methods

The student will:

Formulate online PV prediction as a reinforcement learning problem with appropriate state, action, and reward definitions
Implement and compare different RL algorithms
Design reward functions incorporating prediction accuracy, and system constraints
Develop online learning strategies (using temporal difference methods, dynamic programming, Markov chains, etc.)
Conduct extensive experimental evaluation comparing RL approaches with supervised learning baselines
Analyze convergence properties and stability under different environmental conditions
Investigate reward shaping techniques for improved learning efficiency

Expected Contributions / Outcomes

Novel RL formulation for online PV power prediction with theoretical justification
Comprehensive experimental evaluation demonstrating improved operational performance
Analysis of reward function design and its impact on learning behavior
Investigation of RL agent behavior under different weather patterns and seasonal variations
Framework for online model updates using prediction error feedback
Potential publication in renewable energy or machine learning conferences/journals

Required Skills / Prerequisites

Background in machine learning and reinforcement learning theory
Familiar with Python and deep learning frameworks (PyTorch, TensorFlow)
Understanding of Markov Decision Processes and RL algorithms
Background in optimization theory and control systems

Possible Extensions

Integration with energy market dynamics and pricing signals
Transfer learning between different PV installations

Further Reading / Starting Literature

Sutton, R. S., & Barto, A. G. (2018). “Reinforcement learning: An introduction.” MIT Press.
Li, Y. (2017). “Deep reinforcement learning: An overview.” arXiv preprint arXiv:1701.07274.
Raza, M. Q., & Khosravi, A. (2015). “A review on artificial intelligence based load demand forecasting techniques for smart grid and buildings.” Renewable and Sustainable Energy Reviews, 50, 1352-1372.