Understanding the local electricity market

📖 Background You work for an energy company in Australia. Your company builds solar panel arrays and then sells the energy they produce to industrial customers. The company wants to expand to the city of Melbourne in the state of Victoria.

Prices and demand for electricity change every day. Customers pay for the energy received using a formula based on the local energy market's daily price.

Your company's pricing committee wants your team to estimate energy prices for the next 12-18 months to use those prices as the basis for contract negotiations.

In addition, the VP of strategy is researching investing in storage capacity (i.e., batteries) as a new source of revenue. The plan is to store some of the energy produced by the solar panels when pricing conditions are unfavorable and sell it by the next day on the open market if the prices are higher.

💾 The data

You have access to over five years of energy price and demand data (source):

  • "date" - from January 1, 2015, to October 6, 2020.
  • "demand" - daily electricity demand in MWh.
  • "price" - recommended retail price in AUD/MWh.
  • "demand_pos_price" - total daily demand at a positive price in MWh.
  • "price_positive" - average positive price, weighted by the corresponding intraday demand in AUD/MWh.
  • "demand_neg_price" - total daily demand at a negative price in MWh.
  • "price_negative" - average negative price, weighted by the corresponding intraday demand in AUD/MWh.
  • "frac_neg_price" - the fraction of the day when the demand traded at a negative price.
  • "min_temperature" - minimum temperature during the day in Celsius.
  • "max_temperature" - maximum temperature during the day in Celsius.
  • "solar_exposure" - total daily sunlight energy in MJ/m^2.
  • "rainfall" - daily rainfall in mm.
  • "school_day" - "Y" if that day was a school day, "N" otherwise.
  • "holiday" - "Y" if the day was a state or national holiday, "N" otherwise.

Note: The price was negative during some intraday intervals, so energy producers were paying buyers rather than vice-versa.

import numpy as np 
import pandas as pd 
import seaborn as sns
import matplotlib.pyplot as plt
import xgboost as xgb
from xgboost import plot_importance, plot_tree
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.model_selection import RandomizedSearchCV
# --- Set defaults of the notebook ---
sns.set(font="'Source Code Pro', monospace")
plt.rcParams["font.family"] = "'Source Code Pro', monospace"
import warnings
warnings.filterwarnings("ignore")
# Color Palettes
treasure_colors = ["#703728", "#c86b25", "#dc9555", "#fed56f", "#c89a37"]
pirate_colors = ["#010307", "#395461", "#449FAF", "#B1F4FC", 
                 "#F4D499", "#835211"]
df_electricity = pd.read_csv('./data/energy_demand.csv', parse_dates=['date'],index_col =["date"])
df_electricity.head()
demand price demand_pos_price price_positive demand_neg_price price_negative frac_neg_price min_temperature max_temperature solar_exposure rainfall school_day holiday
date
2015-01-01 99635.030 25.633696 97319.240 26.415953 2315.790 -7.240000 0.020833 13.3 26.9 23.6 0.0 N Y
2015-01-02 129606.010 33.138988 121082.015 38.837661 8523.995 -47.809777 0.062500 15.4 38.8 26.8 0.0 N N
2015-01-03 142300.540 34.564855 142300.540 34.564855 0.000 0.000000 0.000000 20.0 38.2 26.5 0.0 N N
2015-01-04 104330.715 25.005560 104330.715 25.005560 0.000 0.000000 0.000000 16.3 21.4 25.2 4.2 N N
2015-01-05 118132.200 26.724176 118132.200 26.724176 0.000 0.000000 0.000000 15.0 22.0 30.7 0.0 N N

💪 Competition challenge

Create a report that covers the following:

  1. How do energy prices change throughout the year? Are there any patterns by season or month of the year?
  2. Build a forecast of daily energy prices the company can use as the basis of its financial planning.
  3. Provide guidance on how much revenue the energy storage venture could generate per year using retail prices and a 70MWh storage system.

⌛️ Exploratory Data Analysis!

color_pal = ["#F8766D", "#D39200", "#93AA00", "#00BA38", "#00C19F", "#00B9E3", "#619CFF", "#DB72FB"]
_ = df_electricity["price"].plot(style='.', figsize=(15,5), color=color_pal[0], title='Daily electricity price in Victoria from January 2015 to October 2020')
plt.ylim(0,300)
plt.ylabel("Price")
Text(0, 0.5, 'Price')

Insights 📍 : The distribution of daily electricity price

Based on the plot above show:

  1. The distribution of daily electricity price in Victoria from January 2015 to October 2020 .
  2. There is a significant electricity price change especially in 2016 to 2020.
(df_electricity.describe()
 .style
 .highlight_max(axis=0,color="#c07fef")
 .highlight_min(axis=0,color="#00FF00")
 .set_caption("Statistics of Electricity in Australia for 2015-2020"))
Statistics of Electricity in Australia for 2015-2020
demand price demand_pos_price price_positive demand_neg_price price_negative frac_neg_price min_temperature max_temperature solar_exposure rainfall
count 2106.000000 2106.000000 2106.000000 2106.000000 2106.000000 2106.000000 2106.000000 2106.000000 2106.000000 2105.000000 2103.000000
mean 120035.476503 76.079554 119252.305055 76.553847 783.171448 -2.686052 0.008547 11.582289 20.413200 14.743373 1.505944
std 13747.993761 130.246805 14818.631319 130.114184 3578.920686 19.485432 0.039963 4.313711 6.288693 7.945527 4.307897
min 85094.375000 -6.076028 41988.240000 13.568986 0.000000 -342.220000 0.000000 0.600000 9.000000 0.700000 0.000000
25% 109963.650000 38.707040 109246.250000 39.117361 0.000000 0.000000 0.000000 8.500000 15.525000 8.200000 0.000000
50% 119585.912500 66.596738 119148.082500 66.869058 0.000000 0.000000 0.000000 11.300000 19.100000 12.700000 0.000000
75% 130436.006250 95.075012 130119.477500 95.130181 0.000000 0.000000 0.000000 14.600000 23.900000 20.700000 0.800000
max 170653.840000 4549.645105 170653.840000 4549.645105 57597.595000 0.000000 0.625000 28.000000 43.500000 33.300000 54.600000
fig, ax = plt.subplots(4, 1, figsize = (15, 20))
ax[0].tick_params(axis='both', which='major', labelsize=13)
ax[1].tick_params(axis='both', which='major', labelsize=13)
ax[2].tick_params(axis='both', which='major', labelsize=13)
ax[3].tick_params(axis='both', which='major', labelsize=13)


df_electricity.reset_index().plot(x='date', y='demand_pos_price', lw = 1, ax = ax[0])
ax[0].set_title("Daily electricity demand in Victoria from January 2015 to October 2020")
ax[0].set_ylabel("Demand Positive Price [MWh]")
ax[0].get_legend().remove()


df_electricity.reset_index().plot(x='date', y='demand_neg_price', lw = 1, color='red', ax = ax[1])
ax[1].set_ylabel("Demand Negative Price [MWh]")
ax[1].get_legend().remove()


df_electricity.reset_index().plot(x='date', y='price_positive', lw = 1, color='red', ax = ax[2])
ax[2].set_ylabel("Price Positive [AUD$/MWh]")
ax[2].get_legend().remove()


df_electricity.reset_index().plot(x='date', y='price_negative', lw = 1, color='red', ax = ax[3])
ax[3].set_ylabel("Price Negative [AUD$/MWh]")
ax[3].get_legend().remove()

Insights 📍 : Daily electricity demand in Victoria from January 2015 to October 2020

Based on the plot above show:

  1. Demand positive price shows some seasonality based on plot shown above</span>. Maybe this feature may benefit our model.
  2. Demand negative price We can see there is a significant difference electricity price through maximum price which is very far from the mean and standard deviation.
  3. Positive price mostly with small values except for some values that is greater than 1000.
  4. Price negative ocurred in each year with variation of values.

Make a correlation plot

def heatmap(x, y, size):
    fig, ax = plt.subplots()
    
    # Mapping from column names to integer coordinates
    x_labels = [v for v in sorted(x.unique())]
    y_labels = [v for v in sorted(y.unique())]
    x_to_num = {p[1]:p[0] for p in enumerate(x_labels)} 
    y_to_num = {p[1]:p[0] for p in enumerate(y_labels)} 
    
    size_scale = 500
    ax.scatter(
        x=x.map(x_to_num), # Use mapping for x
        y=y.map(y_to_num), # Use mapping for y
        s=size * size_scale, # Vector of square sizes, proportional to size parameter
        marker='s' # Use square as scatterplot marker
    )
    
    # Show column labels on the axes
    ax.set_xticks([x_to_num[v] for v in x_labels])
    ax.set_xticklabels(x_labels, rotation=45, horizontalalignment='right')
    ax.set_yticks([y_to_num[v] for v in y_labels])
    ax.set_yticklabels(y_labels)
    
data = df_electricity
columns = df_electricity.columns
corr = data[columns].corr()
corr = pd.melt(corr.reset_index(), id_vars='index') # Unpivot the dataframe, so we can get pair of arrays for x and y
corr.columns = ['x', 'y', 'value']
heatmap(
    x=corr['x'],
    y=corr['y'],
    size=corr['value'].abs()
)

Insights 📍 : Correlation Matrix

Based on the plot above show:

  1. Price target has higher correlation with price_positive, demand, demand_pos_price and frac_negative_price.
plt.figure(figsize=(18, 12))
for i, (combi, df) in enumerate(df_electricity.groupby([df_electricity.index.year])):
    ax = plt.subplot(2, 3, i+1, ymargin=0.5)
    #print(df.num_sold.values.shape, df.num_sold.values)
    ax.plot(df.price)
    ax.set_title(combi)
    plt.xticks(rotation=45)
plt.tight_layout(h_pad=3.0)
plt.suptitle('Energy prices change for 2015-2020', y=1.03)
plt.show()

Insights 📍 : Electricity price change

Based on the plot above show:

  1. There is a significant increase of electricity price in December 2015 after a stable price during January-September.> 2. The electricity price tends to be stable in range 50-150 AUD except in February where there is an increase of energy price.> 2. For 2016, significant increase of price occured in January and March followed by stable price during April-December> 3. For 2018-2019, There is a significant increase of electricity price around January-February.
plt.figure(figsize=(18, 12))
for i, (combi, df) in enumerate(df_electricity.groupby([df_electricity.index.year,"holiday"])):
    ax = plt.subplot(3,4, i+1, ymargin=0.5)
    #print(df.num_sold.values.shape, df.num_sold.values)
    ax.plot(df.price)
    ax.set_title(combi)
    plt.xticks(rotation=45)
plt.suptitle('Energy prices change for 2015-2020 on holiday Season', y=1.03)
Text(0.5, 1.03, 'Energy prices change for 2015-2020 on holiday Season')

Insights 📍 : Electricity prices change for 2015-2020 on holiday Season

Based on the plot above show:

  1. There is an unstable price pattern of energy price in holiday season.> 2. For annual Non-Holiday Season, There is an increase of energy price in the beginning of the year
plt.figure(figsize=(18, 12))
for i, (combi, df) in enumerate(df_electricity.groupby([df_electricity.index.year,"school_day"])):
    ax = plt.subplot(3,4, i+1, ymargin=0.5)
    #print(df.num_sold.values.shape, df.num_sold.values)
    ax.plot(df.price)
    ax.set_title(combi)
    plt.xticks(rotation=40)
plt.suptitle('Energy prices change for 2015-2020 on School Season', y=1.03)
Text(0.5, 1.03, 'Energy prices change for 2015-2020 on School Season')

Insights 📍 : Electricity price change for 2015-2020 on School Season

Based on the plot above show:

  1. There is a significant increase of electricity price in the end of 2015 on school season> 2. There is a similar pattern of the increase of electricity price on school season in February-March in 2018, 2019, 2020> 2. For annual non-holiday Season, There is an increase of electricity price in the beginning of the year in 2016, 2018, 2019

Splitting the data based on time

split_date = '01-Jan-2019'
df_train = df_electricity.loc[df_electricity.index <= split_date].copy()
df_test = df_electricity.loc[df_electricity.index > split_date].copy()
def create_features(df, label=None):
    """
    Creates time series features from datetime index
    """
    df['date'] = df.index
    df['hour'] = df['date'].dt.hour
    df['dayofweek'] = df['date'].dt.dayofweek
    df['quarter'] = df['date'].dt.quarter
    df['month'] = df['date'].dt.month
    df['year'] = df['date'].dt.year
    df['dayofyear'] = df['date'].dt.dayofyear
    df['dayofmonth'] = df['date'].dt.day
    df['weekofyear'] = df['date'].dt.isocalendar().week.astype(np.int64) 
    df["school_day"] = pd.get_dummies(df["school_day"])
    df["school_day"] = pd.get_dummies(df["school_day"])
    df['price_7_days_lag'] = df['price'].shift(7)
    df['price_15_days_lag'] = df['price'].shift(15)
    df['price_30_days_lag'] = df['price'].shift(30)
    df['price_7_days_mean'] = df['price'].rolling(window = 7).mean()
    df['price_15_days_mean'] = df['price'].rolling(window = 15).mean()
    df['price_30_days_mean'] = df['price'].rolling(window = 30).mean()
    df['price_7_days_std'] = df['price'].rolling(window = 7).std()
    df['price_15_days_std'] = df['price'].rolling(window = 15).std()
    df['price_30_days_std'] = df['price'].rolling(window = 30).std()
    df['price_7_days_max'] = df['price'].rolling(window = 7).max()
    df['price_15_days_max'] = df['price'].rolling(window = 15).max()
    df['price_30_days_max'] = df['price'].rolling(window = 30).max()
    df['price_7_days_min'] = df['price'].rolling(window = 7).min()
    df['price_15_days_min'] = df['price'].rolling(window = 15).min()
    df['price_30_days_min'] = df['price'].rolling(window = 30).min()
    
 
   
    cols = ['hour','dayofweek','dayofyear','quarter','month','year','dayofmonth','weekofyear','price_positive','demand_pos_price','demand','demand_neg_price','price_negative','frac_neg_price']
    #cols = ['hour','dayofweek','quarter','month','year','dayofyear','dayofmonth','weekofyear','demand_pos_price',]
    
    for d in ('7', '15', '30'):
        for c in ('lag', 'mean', 'std', 'max', 'min'):
            cols.append(f'price_{d}_days_{c}')
    X = df[cols]
    if label:
        y = df[label]
        return X, y
    return X
X_train, y_train = create_features(df_train, label='price')
X_test, y_test = create_features(df_test, label='price')
reg = xgb.XGBRegressor(n_estimators=100,eta=0.1)
fitted_xgb_model = reg.fit(X_train, y_train,
        eval_set=[(X_train, y_train), (X_test, y_test)],
        early_stopping_rounds=50,
       verbose=True) 
[0]	validation_0-rmse:78.10894	validation_1-rmse:232.43314
[1]	validation_0-rmse:71.00056	validation_1-rmse:226.20828
[2]	validation_0-rmse:64.59680	validation_1-rmse:220.67780
[3]	validation_0-rmse:58.83662	validation_1-rmse:215.75482
[4]	validation_0-rmse:53.66943	validation_1-rmse:211.30708
[5]	validation_0-rmse:49.02540	validation_1-rmse:207.36438
[6]	validation_0-rmse:44.84807	validation_1-rmse:203.70424
[7]	validation_0-rmse:41.09291	validation_1-rmse:200.41937
[8]	validation_0-rmse:37.70780	validation_1-rmse:197.01787
[9]	validation_0-rmse:34.63074	validation_1-rmse:193.86409
[10]	validation_0-rmse:31.85113	validation_1-rmse:190.97063
[11]	validation_0-rmse:29.33829	validation_1-rmse:188.28787
[12]	validation_0-rmse:27.05840	validation_1-rmse:185.79985
[13]	validation_0-rmse:24.98889	validation_1-rmse:183.49159
[14]	validation_0-rmse:23.10552	validation_1-rmse:181.34113
[15]	validation_0-rmse:21.39121	validation_1-rmse:179.33682
[16]	validation_0-rmse:19.83474	validation_1-rmse:178.55098
[17]	validation_0-rmse:18.41598	validation_1-rmse:177.83699
[18]	validation_0-rmse:17.11959	validation_1-rmse:177.18620
[19]	validation_0-rmse:15.93410	validation_1-rmse:176.59129
[20]	validation_0-rmse:14.84714	validation_1-rmse:176.04675
[21]	validation_0-rmse:13.85011	validation_1-rmse:175.54684
[22]	validation_0-rmse:12.93431	validation_1-rmse:175.08743
[23]	validation_0-rmse:12.09328	validation_1-rmse:174.66530
[24]	validation_0-rmse:11.32025	validation_1-rmse:173.82794
[25]	validation_0-rmse:10.60783	validation_1-rmse:173.03954
[26]	validation_0-rmse:9.95008	validation_1-rmse:172.29640
[27]	validation_0-rmse:9.34216	validation_1-rmse:171.59558
[28]	validation_0-rmse:8.77918	validation_1-rmse:170.93398
[29]	validation_0-rmse:8.25816	validation_1-rmse:170.30927
[30]	validation_0-rmse:7.77358	validation_1-rmse:169.71977
[31]	validation_0-rmse:7.32364	validation_1-rmse:169.16220
[32]	validation_0-rmse:6.90481	validation_1-rmse:168.63498
[33]	validation_0-rmse:6.51409	validation_1-rmse:168.13646
[34]	validation_0-rmse:6.14908	validation_1-rmse:167.66449
[35]	validation_0-rmse:5.80786	validation_1-rmse:167.21773
[36]	validation_0-rmse:5.48851	validation_1-rmse:166.79488
[37]	validation_0-rmse:5.18929	validation_1-rmse:166.39388
[38]	validation_0-rmse:4.90825	validation_1-rmse:166.01436
[39]	validation_0-rmse:4.64466	validation_1-rmse:165.65491
[40]	validation_0-rmse:4.39683	validation_1-rmse:165.31377
[41]	validation_0-rmse:4.16372	validation_1-rmse:164.99037
[42]	validation_0-rmse:3.94428	validation_1-rmse:164.68390
[43]	validation_0-rmse:3.73747	validation_1-rmse:164.39325
[44]	validation_0-rmse:3.54249	validation_1-rmse:164.11774
[45]	validation_0-rmse:3.35867	validation_1-rmse:163.85646
[46]	validation_0-rmse:3.18519	validation_1-rmse:163.60869
[47]	validation_0-rmse:3.02139	validation_1-rmse:163.37366
[48]	validation_0-rmse:2.86655	validation_1-rmse:163.15067
[49]	validation_0-rmse:2.72040	validation_1-rmse:162.93970
[50]	validation_0-rmse:2.58236	validation_1-rmse:162.73958
[51]	validation_0-rmse:2.45181	validation_1-rmse:162.54944
[52]	validation_0-rmse:2.32823	validation_1-rmse:162.36912
[53]	validation_0-rmse:2.21122	validation_1-rmse:162.19797
[54]	validation_0-rmse:2.10046	validation_1-rmse:162.03554
[55]	validation_0-rmse:1.99561	validation_1-rmse:161.88142
[56]	validation_0-rmse:1.89622	validation_1-rmse:161.73517
[57]	validation_0-rmse:1.80210	validation_1-rmse:161.59613
[58]	validation_0-rmse:1.71306	validation_1-rmse:161.46413
[59]	validation_0-rmse:1.62860	validation_1-rmse:161.33887
[60]	validation_0-rmse:1.54868	validation_1-rmse:161.21994
[61]	validation_0-rmse:1.47296	validation_1-rmse:161.10706
[62]	validation_0-rmse:1.40104	validation_1-rmse:160.99974
[63]	validation_0-rmse:1.33303	validation_1-rmse:160.89801
[64]	validation_0-rmse:1.26847	validation_1-rmse:160.80133
[65]	validation_0-rmse:1.20725	validation_1-rmse:160.70961
[66]	validation_0-rmse:1.14925	validation_1-rmse:160.62248
[67]	validation_0-rmse:1.09428	validation_1-rmse:160.53976
[68]	validation_0-rmse:1.04232	validation_1-rmse:160.46124
[69]	validation_0-rmse:0.99303	validation_1-rmse:160.38670
[70]	validation_0-rmse:0.94642	validation_1-rmse:160.31592
[71]	validation_0-rmse:0.90220	validation_1-rmse:160.24872
[72]	validation_0-rmse:0.86030	validation_1-rmse:160.18489
[73]	validation_0-rmse:0.82075	validation_1-rmse:160.12427
[74]	validation_0-rmse:0.78316	validation_1-rmse:160.06668
[75]	validation_0-rmse:0.74781	validation_1-rmse:160.01205
[76]	validation_0-rmse:0.71458	validation_1-rmse:159.96014
[77]	validation_0-rmse:0.68281	validation_1-rmse:159.91087
[78]	validation_0-rmse:0.65270	validation_1-rmse:159.86403
[79]	validation_0-rmse:0.62447	validation_1-rmse:159.81957
[80]	validation_0-rmse:0.59772	validation_1-rmse:159.77733
[81]	validation_0-rmse:0.57273	validation_1-rmse:159.73723
[82]	validation_0-rmse:0.54923	validation_1-rmse:159.69914
[83]	validation_0-rmse:0.52638	validation_1-rmse:159.66295
[84]	validation_0-rmse:0.50531	validation_1-rmse:159.62856
[85]	validation_0-rmse:0.48540	validation_1-rmse:159.59593
[86]	validation_0-rmse:0.46696	validation_1-rmse:159.56493
[87]	validation_0-rmse:0.44928	validation_1-rmse:159.53546
[88]	validation_0-rmse:0.43296	validation_1-rmse:159.50746
[89]	validation_0-rmse:0.41758	validation_1-rmse:159.48086
[90]	validation_0-rmse:0.40334	validation_1-rmse:159.45570
[91]	validation_0-rmse:0.38988	validation_1-rmse:159.43173
[92]	validation_0-rmse:0.37735	validation_1-rmse:159.40901
[93]	validation_0-rmse:0.36581	validation_1-rmse:159.38736
[94]	validation_0-rmse:0.35467	validation_1-rmse:159.36688
[95]	validation_0-rmse:0.34369	validation_1-rmse:159.34740
[96]	validation_0-rmse:0.33422	validation_1-rmse:159.32890
[97]	validation_0-rmse:0.32500	validation_1-rmse:159.31133
[98]	validation_0-rmse:0.31654	validation_1-rmse:159.29463
[99]	validation_0-rmse:0.30846	validation_1-rmse:159.27872
plt.plot(fitted_xgb_model.evals_result()['validation_0']
['rmse'])
plt.plot(fitted_xgb_model.evals_result()['validation_1']
['rmse'])
plt.ylabel('RMSE', fontsize=14)
plt.xlabel('Price', fontsize=14)
plt.legend(['Train', 'Val'], loc='upper right')
<matplotlib.legend.Legend at 0x7f0267b412e0>

We can see the comparison between training rmse and validation rmse score where the is an underfitting problem in our case where maybe due to outlier electricity price in 2019-2020

feature_important = reg.get_booster().get_score(importance_type='weight')
keys = list(feature_important.keys())
values = list(feature_important.values())

data = pd.DataFrame(data=values, index=keys, columns=["score"]).sort_values(by = "score", ascending=False)
data.nlargest(10, columns="score").plot(kind='barh', figsize = (20,10)) ## plot top 10 features
<AxesSubplot:>

Based on the feature importances extracted from the model, price_positive feature, demand_neg_price, dayofweek,dayofyear and dayofyear show the most 5th of feature importances.

df_test['price_prediction'] = reg.predict(X_test)
price_all = pd.concat([df_test, df_train], sort=False)
_ = price_all[['price','price_prediction']].plot(figsize=(15, 5))
plt.ylim(0,1000)
(0.0, 1000.0)
f, ax = plt.subplots(1)
f.set_figheight(5)
f.set_figwidth(15)
_ = price_all[['price','price_prediction']].plot(ax=ax,style=['-','.'])
ax.set_xbound(lower='01-05-2019', upper='02-10-2020')
ax.set_ylim(0, 2000)
plot = plt.suptitle('May 2019 Forecast vs Actuals')

Insights 📍 : Prediction error price between real price and prediction price

Our model is generalizable for the next 12-18 months based on the plot above. In Machine Learninig, It is a mandatory to make our model to be generalizable as possible so that it can predict the real/actual value in real situation. There is a few errors around february and in the beginining of January 2020. If it can predict the same value for actual demand, it is a overfitting problem. We have to take into account this case to make better prediction that can mimic the actual value with fewer errors.

mean_squared_error(y_true=df_test['price'],y_pred=df_test['price_prediction'])
25369.714397883905
mean_absolute_error(y_true=df_test['price'],y_pred=df_test['price_prediction'])
10.73348890658493
df_test['error'] = df_test['price'] - df_test['price_prediction']
df_test['abs_error'] = df_test['error'].apply(np.abs)
error_by_day = df_test.groupby(['year','month','dayofmonth']).mean()[['price','price_prediction','error','abs_error']]
(error_by_day.sort_values('error', ascending=True).head(10)
.style
.bar(subset="error",color="#c07fef",vmax=error_by_day.error.quantile(0.95))
.highlight_max(axis=0,color="#fef70c")
)
price price_prediction error abs_error
year month dayofmonth
2019 1 22 278.777743 350.801331 -72.023587 72.023587
12 30 295.829202 342.720764 -46.891562 46.891562
2020 1 30 1044.447303 1078.613281 -34.165979 34.165979
10 2 -6.076028 21.710295 -27.786323 27.786323
3 -1.983471 21.918243 -23.901714 23.901714
1 23 -1.761423 19.056923 -20.818346 20.818346
2019 2 2 240.954524 257.686920 -16.732396 16.732396
8 31 79.347198 91.819000 -12.471802 12.471802
2020 8 30 9.421019 20.588486 -11.167467 11.167467
1 4 28.042231 38.049507 -10.007276 10.007276
(error_by_day.sort_values('abs_error', ascending=False).head(10)
.style
.bar(subset="abs_error",color="#c07fef",vmax=error_by_day.abs_error.quantile(0.95))
.highlight_max(axis=0,color="#fef70c")
)
price price_prediction error abs_error
year month dayofmonth
2019 1 24 4549.645105 1078.613281 3471.031824 3471.031824
2020 1 31 2809.437516 812.882507 1996.555008 1996.555008
2019 3 1 1284.799876 812.882507 471.917369 471.917369
1 25 906.437232 645.617493 260.819740 260.819740
22 278.777743 350.801331 -72.023587 72.023587
12 30 295.829202 342.720764 -46.891562 46.891562
2020 1 30 1044.447303 1078.613281 -34.165979 34.165979
10 2 -6.076028 21.710295 -27.786323 27.786323
3 -1.983471 21.918243 -23.901714 23.901714
2019 8 13 267.347650 246.398193 20.949457 20.949457

Insights 📍 : Notice anything about the over forecasted days?

  • #1 worst day - February 2nd, 2020.
  • #6 worst day - March 25, 2019.

Looks like our model influenced by outliers .

(error_by_day.sort_values('abs_error', ascending=True).head(10)
.style
.bar(subset="abs_error",color="#c07fef",vmax=error_by_day.abs_error.quantile(0.95))
.highlight_max(axis=0,color="#fef70c")
)
price price_prediction error abs_error
year month dayofmonth
2019 7 29 95.873753 95.874405 -0.000652 0.000652
8 20 61.926547 61.927246 -0.000699 0.000699
2020 8 6 69.433709 69.434418 -0.000708 0.000708
9 9 37.015434 37.016582 -0.001149 0.001149
8 15 52.515043 52.512608 0.002436 0.002436
2019 10 20 87.533212 87.530640 0.002573 0.002573
2020 4 2 61.537020 61.540165 -0.003145 0.003145
14 40.838554 40.842064 -0.003510 0.003510
2019 6 9 69.561306 69.557457 0.003849 0.003849
2020 3 23 44.639439 44.643665 -0.004226 0.004226

Revenue generated per year using retail prices and a 70MWh storage system by the energy storage venture

predicton_vs_Actual = df_test.rename(columns={"price":"PRICE","price_prediction":"PRICE_PREDICTION"})
predicton_vs_Actual.head()
demand PRICE demand_pos_price price_positive demand_neg_price price_negative frac_neg_price min_temperature max_temperature solar_exposure ... price_30_days_std price_7_days_max price_15_days_max price_30_days_max price_7_days_min price_15_days_min price_30_days_min PRICE_PREDICTION error abs_error
date
2019-01-02 106470.675 92.202011 106470.675 92.202011 0.0 0.0 0.0 18.4 22.2 26.3 ... NaN NaN NaN NaN NaN NaN NaN 92.063560 0.138451 0.138451
2019-01-03 118789.605 127.380303 118789.605 127.380303 0.0 0.0 0.0 15.9 29.5 27.6 ... NaN NaN NaN NaN NaN NaN NaN 126.187721 1.192581 1.192581
2019-01-04 133288.460 121.020997 133288.460 121.020997 0.0 0.0 0.0 18.0 42.6 27.4 ... NaN NaN NaN NaN NaN NaN NaN 121.093582 -0.072585 0.072585
2019-01-05 97262.790 83.493520 97262.790 83.493520 0.0 0.0 0.0 17.4 21.2 12.9 ... NaN NaN NaN NaN NaN NaN NaN 83.704323 -0.210802 0.210802
2019-01-06 93606.215 65.766407 93606.215 65.766407 0.0 0.0 0.0 14.6 22.1 30.9 ... NaN NaN NaN NaN NaN NaN NaN 65.620796 0.145611 0.145611

5 rows × 40 columns

(predicton_vs_Actual.sample(5)
 .style
 .background_gradient(axis=0,subset=["PRICE","PRICE_PREDICTION"],cmap="Blues")
 .set_caption("Comparison of Real price and Prediction price")
)                       
Comparison of Real price and Prediction price
demand PRICE demand_pos_price price_positive demand_neg_price price_negative frac_neg_price min_temperature max_temperature solar_exposure rainfall school_day holiday date hour dayofweek quarter month year dayofyear dayofmonth weekofyear price_7_days_lag price_15_days_lag price_30_days_lag price_7_days_mean price_15_days_mean price_30_days_mean price_7_days_std price_15_days_std price_30_days_std price_7_days_max price_15_days_max price_30_days_max price_7_days_min price_15_days_min price_30_days_min PRICE_PREDICTION error abs_error
date
2019-10-17 00:00:00 116690.040000 92.498044 116690.040000 92.498044 0.000000 0.000000 0.000000 9.500000 15.200000 9.600000 4.400000 1 N 2019-10-17 00:00:00 0 3 4 10 2019 290 17 42 167.981136 108.661677 149.599176 107.013610 111.958930 115.165759 14.385468 26.376231 28.347460 123.261328 167.981136 192.485035 90.164236 54.720115 54.720115 92.161629 0.336415 0.336415
2020-02-28 00:00:00 110006.970000 50.306353 110006.970000 50.306353 0.000000 0.000000 0.000000 13.100000 20.100000 17.600000 0.000000 1 N 2020-02-28 00:00:00 0 4 1 2 2020 59 28 9 48.658203 79.018494 78.272342 50.750555 54.706821 174.695236 6.434453 9.322584 529.975511 60.554702 79.502644 2809.437516 42.236094 42.236094 14.235635 50.108658 0.197695 0.197695
2019-05-26 00:00:00 106153.430000 63.903082 106153.430000 63.903082 0.000000 0.000000 0.000000 9.500000 15.100000 8.600000 6.600000 1 N 2019-05-26 00:00:00 0 6 2 5 2019 146 26 21 76.294168 87.036533 60.918383 90.270987 89.690794 95.045620 17.449081 13.195141 13.561886 115.894456 115.894456 121.029642 63.903082 63.903082 63.903082 63.706802 0.196280 0.196280
2019-03-19 00:00:00 123290.835000 113.644774 123290.835000 113.644774 0.000000 0.000000 0.000000 16.900000 23.400000 12.700000 0.000000 1 N 2019-03-19 00:00:00 0 1 1 3 2019 78 19 12 76.971548 135.206447 110.489593 110.805143 100.527496 148.568061 11.182552 17.085113 215.716053 126.785957 126.785957 1284.799876 95.251188 74.374932 74.374932 113.493477 0.151297 0.151297
2019-04-02 00:00:00 114927.205000 119.397551 114927.205000 119.397551 0.000000 0.000000 0.000000 7.400000 25.900000 16.600000 0.000000 1 N 2019-04-02 00:00:00 0 1 2 4 2019 92 2 14 95.567894 111.149795 153.784296 105.271113 105.146212 103.555577 17.997891 15.698139 17.232821 131.333531 131.333531 135.206447 83.989538 80.292927 74.374932 118.808243 0.589308 0.589308
(predicton_vs_Actual.describe()
 .style
 .highlight_max(axis=0,color="#c07fef")
 .highlight_min(axis=0,color="#00FF00")
 .set_caption("Price for the next 12-18 months dataset")
)
Price for the next 12-18 months dataset
demand PRICE demand_pos_price price_positive demand_neg_price price_negative frac_neg_price min_temperature max_temperature solar_exposure rainfall school_day hour dayofweek quarter month year dayofyear dayofmonth weekofyear price_7_days_lag price_15_days_lag price_30_days_lag price_7_days_mean price_15_days_mean price_30_days_mean price_7_days_std price_15_days_std price_30_days_std price_7_days_max price_15_days_max price_30_days_max price_7_days_min price_15_days_min price_30_days_min PRICE_PREDICTION error abs_error
count 644.000000 644.000000 644.000000 644.000000 644.000000 644.000000 644.000000 644.000000 644.000000 644.000000 644.000000 644.000000 644.000000 644.000000 644.000000 644.000000 644.000000 644.000000 644.000000 644.000000 637.000000 629.000000 614.000000 638.000000 630.000000 615.000000 638.000000 630.000000 615.000000 638.000000 630.000000 615.000000 638.000000 630.000000 615.000000 644.000000 644.000000 644.000000
mean 117643.164884 92.987010 115960.088610 93.920500 1683.076273 -2.628288 0.018763 11.258851 20.047516 14.121429 1.571118 0.579193 0.000000 3.000000 2.310559 5.920807 2019.434783 164.804348 15.633540 24.015528 93.691424 94.312974 95.825203 93.212968 93.538990 91.768173 54.150249 72.852308 87.392116 200.741524 322.505861 494.818502 52.455595 43.836577 36.683321 83.745338 9.241676 10.733489
std 13719.536422 221.045838 16058.351649 220.815258 5468.830434 14.501710 0.061677 4.178613 6.275934 7.527379 4.622587 0.494072 0.000000 2.001555 1.038818 3.204041 0.496114 97.726483 8.830309 13.916315 222.139909 223.477412 225.979776 105.016925 78.052549 55.971449 204.237108 204.005225 183.989880 555.624866 798.388712 1010.230907 26.400018 23.519784 21.738763 83.260063 159.133995 159.040192
min 86891.230000 -6.076028 41988.240000 14.558266 0.000000 -304.150000 0.000000 1.700000 9.600000 1.300000 0.000000 0.000000 0.000000 0.000000 1.000000 1.000000 2019.000000 1.000000 1.000000 1.000000 -1.761423 -1.761423 -1.761423 23.314900 26.399112 30.011994 2.643165 5.364451 8.586690 33.031411 37.881752 53.063441 -6.076028 -6.076028 -6.076028 14.941895 -72.023587 0.000652
25% 107860.791250 47.951427 106688.311250 48.534078 0.000000 0.000000 0.000000 8.300000 15.300000 8.200000 0.000000 0.000000 0.000000 1.000000 1.000000 3.000000 2019.000000 81.750000 8.000000 12.000000 48.336165 48.542669 49.852245 48.849650 50.822983 52.430808 10.068998 12.243217 15.547400 64.270729 75.818450 103.916022 31.208768 21.755296 18.875436 47.338225 -0.289771 0.100342
50% 115552.020000 70.942903 115005.677500 72.009555 0.000000 0.000000 0.000000 11.000000 18.700000 11.700000 0.000000 1.000000 0.000000 3.000000 2.000000 6.000000 2019.000000 162.000000 16.000000 24.000000 72.065599 72.815311 74.405598 75.313011 83.410650 87.232646 15.213009 18.972156 23.018682 100.676262 125.099031 134.747615 49.910197 44.176312 42.328652 71.584919 -0.011184 0.268921
75% 127321.842500 101.439833 127115.777500 101.665942 0.000000 0.000000 0.000000 13.800000 23.200000 19.300000 1.050000 1.000000 0.000000 5.000000 3.000000 8.000000 2020.000000 242.250000 23.000000 35.000000 101.944171 102.608515 103.262500 103.681588 105.118976 105.963637 24.736949 32.092289 38.685609 134.488422 162.491995 243.810160 75.512742 63.903082 52.525320 101.335741 0.233059 0.588213
max 170653.840000 4549.645105 170653.840000 4549.645105 57597.595000 0.000000 0.625000 25.100000 43.500000 32.000000 54.600000 1.000000 0.000000 6.000000 4.000000 12.000000 2020.000000 365.000000 31.000000 52.000000 4549.645105 4549.645105 4549.645105 902.337815 506.831433 318.015562 1663.267896 1140.727552 817.222368 4549.645105 4549.645105 4549.645105 118.261945 89.427805 80.292927 1078.613281 3471.031824 3471.031824
train_df_result =df_train.groupby("year")["price","demand_pos_price","demand_neg_price","frac_neg_price","price_negative","demand"].agg(["mean","max","min"]).stack()
(train_df_result
 .style
 .background_gradient(axis=0,subset=["demand","price"],cmap="Blues")
 .set_caption("Comparison of Real Demand and Prediction Demand")
) 
Comparison of Real Demand and Prediction Demand
price demand_pos_price demand_neg_price frac_neg_price price_negative demand
year
2015 mean 35.068136 124491.479493 170.690384 0.001598 -2.305297 124662.169877
max 188.086125 158052.890000 19480.250000 0.229167 0.000000 158052.890000
min 13.279841 84331.030000 0.000000 0.000000 -318.660000 95093.295000
2016 mean 50.094252 121028.504249 961.034508 0.010303 -4.438947 121989.538757
max 545.737820 160285.015000 29110.575000 0.333333 0.000000 160285.015000
min 6.869135 65215.145000 0.000000 0.000000 -289.190000 90227.480000
2017 mean 94.740161 118910.605411 42.443274 0.000400 -2.460516 118953.048685
max 213.339432 154632.335000 4666.465000 0.041667 0.000000 155060.610000
min 19.865114 85094.375000 0.000000 0.000000 -342.220000 85094.375000
2018 mean 94.648823 118438.166507 372.398726 0.003881 -1.643921 118810.565233
max 1210.137920 165070.595000 22839.285000 0.250000 0.000000 165070.595000
min 14.673588 73658.870000 0.000000 0.000000 -157.920000 88903.065000
2019 mean 78.560979 98933.060000 0.000000 0.000000 0.000000 98933.060000
max 78.560979 98933.060000 0.000000 0.000000 0.000000 98933.060000
min 78.560979 98933.060000 0.000000 0.000000 0.000000 98933.060000
test_df_result =df_test.groupby("year")["price","demand_pos_price","demand_neg_price","frac_neg_price","price_negative","demand","price_prediction"].agg(["mean","max","min"]).stack()
(test_df_result
 .style
 .background_gradient(axis=0,subset=["price","price_prediction"],cmap="Blues")
 .set_caption("Comparison of Real Demand and Prediction Demand")
) 
Comparison of Real Demand and Prediction Demand
price demand_pos_price demand_neg_price frac_neg_price price_negative demand price_prediction
year
2019 mean 117.387745 117288.371772 703.457431 0.007612 -1.718283 117991.829203 106.173088
max 4549.645105 168894.845000 17410.610000 0.166667 0.000000 168894.845000 1078.613281
min 19.170951 80859.020000 0.000000 0.000000 -89.465035 90145.615000 21.854008
2020 mean 61.266055 114233.320500 2956.580768 0.033259 -3.811295 117189.901268 54.589256
max 2809.437516 170653.840000 57597.595000 0.625000 0.000000 170653.840000 1078.613281
min -6.076028 41988.240000 0.000000 0.000000 -304.150000 86891.230000 14.941895
test_df_result =df_test.groupby("year")["price","demand_pos_price","demand_neg_price","frac_neg_price","price_negative","demand","price_prediction"].agg(["mean","max","min"]).stack()
(test_df_result
 .style
 .background_gradient(axis=0,subset=["price","price_prediction"],cmap="Blues")
 .set_caption("Comparison of Real Demand and Prediction Demand")
) 
Comparison of Real Demand and Prediction Demand
price demand_pos_price demand_neg_price frac_neg_price price_negative demand price_prediction
year
2019 mean 117.387745 117288.371772 703.457431 0.007612 -1.718283 117991.829203 106.173088
max 4549.645105 168894.845000 17410.610000 0.166667 0.000000 168894.845000 1078.613281
min 19.170951 80859.020000 0.000000 0.000000 -89.465035 90145.615000 21.854008
2020 mean 61.266055 114233.320500 2956.580768 0.033259 -3.811295 117189.901268 54.589256
max 2809.437516 170653.840000 57597.595000 0.625000 0.000000 170653.840000 1078.613281
min -6.076028 41988.240000 0.000000 0.000000 -304.150000 86891.230000 14.941895
df_electricity["total_price"] = df_electricity["demand"]*df_electricity["price"]
df_electricity.groupby(df_electricity.index.year)["price"].agg(["mean","max","min"]).unstack()
      date
mean  2015      35.068136
      2016      50.094252
      2017      94.740161
      2018      94.648823
      2019     117.281370
      2020      61.266055
max   2015     188.086125
      2016     545.737820
      2017     213.339432
      2018    1210.137920
      2019    4549.645105
      2020    2809.437516
min   2015      13.279841
      2016       6.869135
      2017      19.865114
      2018      14.673588
      2019      19.170951
      2020      -6.076028
dtype: float64
demand_negative_price = df_electricity[df_electricity["price_negative"] < 0]
demand_negative_price
demand price demand_pos_price price_positive demand_neg_price price_negative frac_neg_price min_temperature max_temperature solar_exposure rainfall school_day holiday total_price
date
2015-01-01 99635.030 25.633696 97319.240 26.415953 2315.790 -7.240000 0.020833 13.3 26.9 23.6 0.0 N Y 2.554014e+06
2015-01-02 129606.010 33.138988 121082.015 38.837661 8523.995 -47.809777 0.062500 15.4 38.8 26.8 0.0 N N 4.295012e+06
2015-01-07 153514.820 48.312309 149498.715 49.639712 4016.105 -1.100000 0.020833 18.9 37.4 20.7 0.0 N N 7.416655e+06
2015-01-18 97728.750 17.008681 95473.965 20.911790 2254.785 -148.260000 0.020833 15.3 19.5 23.4 0.0 N N 1.662237e+06
2015-02-13 136070.620 18.736971 133078.540 26.322857 2992.080 -318.660000 0.020833 16.1 32.4 14.9 0.0 Y N 2.549551e+06
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2020-10-01 106641.790 34.654671 95349.610 41.651658 11292.180 -24.426925 0.125000 9.4 19.5 21.2 1.8 N N 3.695636e+06
2020-10-02 99585.835 -6.076028 41988.240 26.980251 57597.595 -30.173823 0.625000 12.8 26.0 22.0 0.0 N N -6.050864e+05
2020-10-03 92277.025 -1.983471 44133.510 32.438156 48143.515 -33.538025 0.583333 17.4 29.4 19.8 0.0 N N -1.830288e+05
2020-10-04 94081.565 25.008614 88580.995 26.571687 5500.570 -0.163066 0.062500 13.5 29.5 8.4 0.0 N N 2.352850e+06
2020-10-05 113610.030 36.764701 106587.375 39.616015 7022.655 -6.511550 0.083333 9.1 12.7 7.3 12.8 N N 4.176839e+06

181 rows × 14 columns

demand_positive_price = df_electricity[df_electricity["price_negative"] >= 0]
demand_positive_price
demand_negative= (demand_negative_price.groupby(demand_negative_price.index.year)["total_price"].agg(["sum","max","min"])).reset_index()
demand_negative

This is the condition when the demand for negative price per year. You can see on 2020 where the minimum price show a negative value.

demand_positive = demand_positive_price.groupby(demand_positive_price.index.year)["total_price"].agg(["sum","max","min"]).reset_index()
demand_positive

This is the condition when the demand for positive price per year. You can see on per year where the minimum price show a an appreciation from the market that is shown on the positive electiricity price.

result = demand_negative.merge(demand_positive,on="date",suffixes =["_demand_pos","_demand_neg"])
result
result["profit"] =result["sum_demand_pos"]-result["sum_demand_neg"]
result

We then substract the result when the demand is positive and demand is negative based on the total price feature that we have been created. Most of the years shows a good result.

(result.sort_values("profit",ascending=False)
 .style
 .background_gradient(axis=0,subset=["profit"],cmap="Blues")
 .set_caption("Revenue Generated per year in the daily electiricity in Victoria Australia")
) 

This is the result of revenue generated per year. We can see a good revenue generated per year. If you find this notebook usefull, Please show your appreciation by upvoting this notebook. Thank you.

References

Here is a few references for creating advanced visualization using matplotlib 1.

1. You can see a lot of data visualization plots here Data Visualization with Python!