Reinforcement Learning for Continuous PV Power Forecasting with Reward-Based Model Updates
Co-Supervised by: Fereshteh Jafari
If you are interested in this topic or have further questions, do not hesitate to contact fereshteh.jafari@unibe.ch.
Background / Context
Traditional supervised learning approaches for PV power prediction optimize for statistical accuracy metrics (MAE, RMSE) but do not directly consider the operational context where predictions are used. In real-world energy systems, the cost of prediction errors varies depending on the situation-underestimating solar generation during peak demand leads to penalties as well as overestimating during low demand periods. Reinforcement Learning (RL) offers a framework to directly optimize for operational objectives while continuously adapting to changing conditions. By designing reward functions that incorporate both prediction accuracy and operational costs, RL agents can learn to make predictions that are not just statistically optimal but also operationally valuable. This approach is particularly promising for online learning scenarios where the model must adapt to non-stationary environments and changing system dynamics, and where reward functions do not have simple relations with standard accuracy metrics.
Research Question(s) / Goals
The research aims to investigate whether reinforcement learning can provide superior PV power predictions compared to traditional supervised learning approaches by:
- Developing RL frameworks that optimize for operational objectives rather than just statistical accuracy
- Analyze adjustment of energy prices to separate systematic and random parts
- Designing reward functions that capture the prediction errors
- Enabling continuous model adaptation through online policy updates
- Handling non-stationary environments with changing weather patterns and system characteristics
Approach / Methods
The student will:
- Formulate online PV prediction as a reinforcement learning problem with appropriate state, action, and reward definitions
- Implement and compare different RL algorithms
- Design reward functions incorporating prediction accuracy, and system constraints
- Develop online learning strategies (using temporal difference methods, dynamic programming, Markov chains, etc.)
- Conduct extensive experimental evaluation comparing RL approaches with supervised learning baselines
- Analyze convergence properties and stability under different environmental conditions
- Investigate reward shaping techniques for improved learning efficiency
Expected Contributions / Outcomes
- Novel RL formulation for online PV power prediction with theoretical justification
- Comprehensive experimental evaluation demonstrating improved operational performance
- Analysis of reward function design and its impact on learning behavior
- Investigation of RL agent behavior under different weather patterns and seasonal variations
- Framework for online model updates using prediction error feedback
- Potential publication in renewable energy or machine learning conferences/journals
Required Skills / Prerequisites
- Background in machine learning and reinforcement learning theory
- Familiar with Python and deep learning frameworks (PyTorch, TensorFlow)
- Understanding of Markov Decision Processes and RL algorithms
- Background in optimization theory and control systems
Possible Extensions
- Integration with energy market dynamics and pricing signals
- Transfer learning between different PV installations
Further Reading / Starting Literature
- Sutton, R. S., & Barto, A. G. (2018). “Reinforcement learning: An introduction.” MIT Press.
- Li, Y. (2017). “Deep reinforcement learning: An overview.” arXiv preprint arXiv:1701.07274.
- Raza, M. Q., & Khosravi, A. (2015). “A review on artificial intelligence based load demand forecasting techniques for smart grid and buildings.” Renewable and Sustainable Energy Reviews, 50, 1352-1372.