Time Series Analysis for Energy Data - Spring 2023
ENVIRON 790.30 Time Series Analysis for Energy Data
Spring 2023
Course Overview
Class Hours
T-Th 12:00 to 1:15 LSRC A247
To join class on Zoom click here.
Passcode: F2022
Instructor
Luana Medeiros Marangon Lima
Office: Gross Hall - 102K
E-mail: luana.marangon.lima@duke.edu
Office hours: Thursdays 10:00-11:00am (Gross Hall 102K or Luana’s Zoom), or by appointment.
Teaching Assistant
Yu Hai
email: yu.hai@duke.edu
Office hour: TBD
Communication
We will use Slack for communication. I will add all students to the slack workspace I’ve created for the class. Using slack will assure I never miss an email from you and will also keep us one text message away! You may use slack on your computer and/or phone.
Click here to join our slack workspace.
Course Description
Time series and forecasting methods continue to improve due to the enhancements in computing power and capability of dealing with larger data sets. This course will focus on time series analysis, modeling and forecasting, with emphasis on energy and environment applications.
Throughout the course we will use real data sets from the US Energy Information Administration (EIA), National Oceanic and Atmospheric Administration (NOAA) and the National Renewable Energy Laboratory (NREL). This course will use R for most statistical analysis. Lectures will feature R syntax and/or demonstrations using the R Studio user interface. Note that R and R Studio work on Windows, Linux, and Mac operating systems.
Energy Analytics usually involve getting data, parsing the data and transforming the data to a state where you can actually apply time series analysis. This work is better done in R, therefore the course will also cover a short introduction to Python.
Upon completion of the course, the students will be able to use R to carry out basic statistical modeling and analysis and fit a model to data. The goal of teh course is to enable students to learn from data in order to gain useful predictions and insights.
Course Format and Grading
The course consists of lectures at which we will discuss theory and applications. We will learn the time series concepts through data analysis projects. During the classes we will also dedicate some time to learn the statistical packages in R related to the topic as well as small group problem solving. Aside from the in class problems, there will be a set of assignments, a forecasting competition and a final project. Grades will be based on:
Percentage | |
---|---|
Assignments - A1 to A7 | 70% |
A8 - Forecasting Competition | 10% |
Final Project | 20% |
The assignments involve applying concepts and tools learned in class to an specific data set or problem. Students might work together and help each other. However, the assignments are to be submitted individually. The table below shows possible due dates for the assignments.
Policy on late submissions: Assignments are due at 11:59pm. Assignments submitted at least 2 hours after the deadline will have 1 point out of 100 deduction by hour. After that, there will be a 5 points out of 100 deduction per day.
The final project could take several forms. If you have an interesting dataset, you may choose to work with it using existing methods and software tools to run your time series analysis. Another idea is to take some previously published data and analysis and use it as a starting point. You could simply take the data and do your own analysis. Or you may reproduce part of the published analysis, but in this case you will need to go further and try different models and analysis with the data. Make sure you clearly state the difference between what you have done and what was done previously. Students are encouraged to work in teams of two or three for a project.
There will be two short presentations of your final project. For the first you will present the data set you will use, what you plan to do with it and the project motivation. For the second presentation you will show the class the main results obtained throughout the analysis. Aside from the presentations, you are required to submit a final report as if you were writing a research paper. Describe the data sets, tools used and results. If the data set has been used before show what else you have done with it and compare with previous published results.
The final project grading will be weighted as follows:
Percentage | |
---|---|
Proposal Presentation | 20% |
Final Presentation | 40% |
Report | 40% |
Class Etiquette
You should take responsibility for your education. I expect students to attend every class and get to class on time. If you must enter the class late, please do so quietly. Retain from using phones and tablets for social media during class. Some classes will involve coding on your laptop. I expect you to focus on the assignment and refrain from any web browsing that may disrupt the progress of your work. Your classmates deserve your respect and support. We will likely have students from many different backgrounds and countries in this class and you should all feel comfortable and make each other comfortable while participating.
Nicholas School Honor Code
All activities of Nicholas School students, including those in this course, are governed by the Duke Community Standard, which states: “Duke University is a community dedicated to scholarship, leadership, and service and to the principles of honesty, fairness, respect, and accountability. Citizens of this community commit to reflect upon and uphold these principles in all academic and nonacademic endeavors, and to protect and promote a culture of integrity. To uphold the Duke Community Standard:
- I will not lie, cheat, or steal in my academic endeavors;
- I will conduct myself honorably in all my endeavors; and
- I will act if the Standard is compromised.”
Please add the following affirmation to the end of all assignments, and sign your name beside it: “I have adhered to the Duke Community Standard in completing this assignment.”
Land Acknowledgment
“What is now Durham was originally the territory of several Native nations, including Tutelo (TOO-tee-lo) and Saponi (suh-POE-nee) - speaking peoples. Many of their communities were displaced or killed through war, disease, and colonial expansion. Today, the Triangle is surrounded by contemporary Native nations, the descendants of Tutelo, Saponi, and other Indigenous peoples who survived early colonization. These nations include the Haliwa-Saponi (HALL-i-wa suh-POE-nee), Sappony (suh-POE-nee), and Occaneechi (oh-kuh-NEE-chee) Band of Saponi. North Carolina’s Research Triangle is also home to a thriving urban Native American community who represent Native nations from across the United States. Together, these Indigenous nations and communities contribute to North Carolina’s ranking as the state with the largest Native American population east of Oklahoma.”
Course Modules
The class topics are divided into twelve modules. There will be readings and/or recording associated with each module.
M1 - Getting started, Intro to TSA, R and RStudio
M2 - Autocovariance and autocorrelation
M3 - Trend and seasonality
M4 - Missing data and outliers
M5 - ARIMA Models
M6 - Seasonal ARIMA Models
M7 - Intro to forecasting
M8 - Model Performance
M9 - State space models
M10 - Advanced forecasting models
M11 - Model based scenario generation
Class Proposed Schedule
The proposed schedule below is subject to change. My initial plan is to cover all the material listed here but I might modify it if extra time is needed for some particular topics. I will update this table as needed during the semester.
Lecture | Module | Date | Topic | Homework |
---|---|---|---|---|
L1 | M1 | Jan 12 | Introductions and Course Overview | Join Slack Workspace |
L2 | M1 | Jan 17 | Intro to Time Series Analysis, Intro to R and RStudio, Github, R Markdown | A01 |
L3 | M2 | Jan 19 | Autocovariance and autocorrelation function | |
L4 | M2 | Jan 24 | Partial autocorrelation function ACF and PACF in R |
|
L5 | M2 M3 |
Jan 26 | ACF, PACF and plots in R Trend Component Estimation |
A02 |
L6 | M3 | Jan 31 | Seasonal Component - Stochastic vs Deterministic Trend | |
L7 | M3 | Feb 2 | Trend and Seasonal component estimation in R | A03 |
L8 | M3 | Feb 7 | Stationarity Tests: Mann Kendall, Spearman, Augmented Dickey Fuller | A04 |
L9 | M4 | Feb 9 | Outlier types, detection, how to handle missing data | A05 |
L10 | M4 | Feb 14 | A3 Solution Finish outliers in R |
|
L11 | M5 | Feb 16 | Intro to the Traditional Box & Jenkins Models - ARIMA family Stationary Models: AR and MA process |
|
L12 | M5 | Feb 21 | A4 Solution AR and MA order (poll) ARIMA(p,d,q) Models |
|
L13 | M5 | Feb 23 | ARIMA(p,d,q) Fitting ARIMA Models in R |
A06 |
L14 | M6 | Feb 28 | A5 Solution Seasonal ARIMA and Periodic ARMA Models |
A07 |
L15 | M6 / M7 | Mar 2 | Finish SARIMA in R Intro to Forecasting Averaging Techniques |
|
L16 | M7 | Mar 7 | Forecasting with ARIMA Models Forecasting in R |
|
L17 | M8 | Mar 9 (remote assynchronous) | Watch recodings for M8 Work on Project, Team building |
Project Proposal (2-3 slides) |
- | Mar 14 | Spring break no class | ||
- | Mar 16 | Spring break no class | ||
L18 | M8 | Mar 21 | Review Model Diagnostics Review Residual Analysis and Model Selection Model Performance in R |
A08 |
L18 | M9 | Mar 23 | Model Performance in R State-Space Models Bayesian Statistics |
A08 |
L20 | M9 | Mar 28 | State Space Models in R Go over Forecasting Competition Forecasting higher frequency time series |
A09 - part I |
L21 | M10 | Mar 30 | Advanced Forecasting Models in R | A09 - part II |
L22 | M11 | Apr 4 | TBATS models in R Scenario Generation |
Work on project/competition |
L23 | M11 | Apr 6 | Scenario Generation in R Course Recap Course Evaluation |
Work on project/competition |
L24 | - | Apr 11 | Final Project Presentations Presentation Schedule |
Work on project |
- | - | Apr 13 | MEM Symposium - no class | Work on project |
L25 | - | Apr 18 | Final Project Presentations Presentation Schedule |
Submit project and competition knitted files |
Assigments Schedule
You should use Sakai to submit your work. Assignments should be submitted using the Assignments tab. This is a tentative schedule.
Assigment | Due Date |
---|---|
A01 - Course Set Up Sakai link GitHub link |
Jan 24 |
A02 - ACF & PACF Sakai link GitHub link |
Feb 3 |
A03 - Trend and Seasonality Sakai link GitHub link |
Feb 10 |
A04 - Stationarity Tests Sakai link GitHub link |
Feb 20 |
A05 - Decomposition Sakai link GitHub link |
Feb 27 |
A06 - AR and MA models Sakai link GitHub link |
Mar 6 |
A07 - ARIMA models Sakai link GitHub link |
Mar 20 |
A08 - Model performance Sakai link GitHub link |
Mar 27 |
A09 - Forecasting competition Sakai link GitHub link Kaggle link |
Enter competition and create Github repo by Apr3rd (Part 1) First model by Apr 7th (Part 2) Final submission + knitted file by Apr 28 (Part 3) |
Final Project Presentation & Short report |
Apr 28 |