dispaset.preprocessing package¶
Submodules¶
dispaset.preprocessing.data_check module¶
This files gathers different functions used in the DispaSET to check the input data
__author__ = ‘Sylvain Quoilin (sylvain.quoilin@ec.europa.eu)’
- dispaset.preprocessing.data_check.check_AvailabilityFactors(plants, AF)[source]¶
Function that checks the validity of the provided availability factors and warns if a default value of 100% is used.
- dispaset.preprocessing.data_check.check_FlexibleDemand(flex)[source]¶
Function that checks the validity of the provided flexibility demand time series
- dispaset.preprocessing.data_check.check_MinMaxFlows(df_min, df_max)[source]¶
Function that checks that there is no incompatibility between the minimum and maximum flows
- dispaset.preprocessing.data_check.check_NonNaNKeys(plants, NonNaNKeys)[source]¶
Checking if keys are of type NonNaN
- Parameters:
plants – plants dataframe
NonNaNKeys – list of NonNaN keys
- dispaset.preprocessing.data_check.check_StrKeys(plants, StrKeys)[source]¶
Checking if keys are of type Str
- Parameters:
plants – plants dataframe
StrKeys – list of Str keys
- dispaset.preprocessing.data_check.check_chp(config, plants)[source]¶
Function that checks the CHP plant characteristics
- dispaset.preprocessing.data_check.check_clustering(plants, plants_merged)[source]¶
Function that checks that the installed capacities are still equal after the clustering process
- Parameters:
plants – Non-clustered list of units
plants_merged – clustered list of units
- dispaset.preprocessing.data_check.check_df(df, StartDate=None, StopDate=None, name='')[source]¶
Function that check the time series provided as inputs
- dispaset.preprocessing.data_check.check_h2(config, plants)[source]¶
Function that checks the H2 (p2h) unit characteristics
- dispaset.preprocessing.data_check.check_heat(config, plants)[source]¶
Function that checks the heat only unit characteristics
- dispaset.preprocessing.data_check.check_heat_demand(plants, data, zones_th)[source]¶
Function that checks the validity of the heat demand profiles :param plants: List of plants :param data: Dataframe with the heat demand time series :param zones_th: list with the heating zones
- dispaset.preprocessing.data_check.check_keys(plants, keys, unit)[source]¶
Checking mandatory keys
- Parameters:
plants – plants dataframe
keys – list of keys
unit – string denoting type of units being checked
- dispaset.preprocessing.data_check.check_p2h(config, plants)[source]¶
Function that checks the p2h unit characteristics
- dispaset.preprocessing.data_check.check_reserves(Reserve2D, Reserve2U, Load)[source]¶
Function that checks the validity of the reserve requirement time series :param Reserve2D: DataFrame of reserves 2D :param Reserve2U: DataFrame of reserves 2U :param Load: DataFrame of Loads
- dispaset.preprocessing.data_check.check_simulation_environment(SimulationPath, store_type='pickle', firstline=7)[source]¶
Function to test the validity of disapset inputs :param SimulationPath: Path to the simulation folder :param store_type: choose between: “list”, “excel”, “pickle” :param firstline: Number of the first line in the data (only if type==’excel’)
- dispaset.preprocessing.data_check.check_sto(config, plants, raw_data=True)[source]¶
Function that checks the storage plant characteristics
- dispaset.preprocessing.data_check.check_temperatures(plants, Temperatures)[source]¶
Function that checks the presence and validity of the temperatures profiles for units with temperature-dependent characteristics
- Parameters:
plants – List of all units
Temperatures – Dataframe of input temperatures
- dispaset.preprocessing.data_check.check_units(config, plants)[source]¶
Function that checks the power plant characteristics
dispaset.preprocessing.data_handler module¶
- dispaset.preprocessing.data_handler.GenericTable(headers, varname, config, default=None)[source]¶
This function loads the tabular data stored in csv files and assigns the proper values to each pre-specified column. If not found, the default value is assigned. :param headers: List with the column headers to be read :param varname: Variable to be read :param config: Config variable :param default: Default value to be applied if no data is found
- Returns:
Dataframe with the time series for each unit
- dispaset.preprocessing.data_handler.NodeBasedTable(varname, config, default=None)[source]¶
This function loads the tabular data stored in csv files relative to each zone of the simulation.
- Parameters:
varname – Variable name (as defined in config)
config – Dispa-SET config data
default – Default value to be applied if no data is found
- Returns:
Dataframe with the time series for each unit
- dispaset.preprocessing.data_handler.UnitBasedTable(plants, varname, config, fallbacks=['Unit'], default=None, RestrictWarning=None)[source]¶
This function loads the tabular data stored in csv files and assigns the proper values to each unit of the plants dataframe. If the unit-specific value is not found in the data, the script can fallback on more generic data (e.g. fuel-based, technology-based, zone-based) or to the default value. The order in which the data should be loaded is specified in the fallback list. For example, [‘Unit’,’Technology’] means that the script will first try to find a perfect match for the unit name in the data table. If not found, a column with the unit technology as header is search. If not found, the default value is assigned.
- Parameters:
plants – Dataframe with the units for which data is required
varname – Variable name (as defined in config)
config – Dispa-SET config file
fallbacks – List with the order of data source.
default – Default value to be applied if no data is found
RestrictWarning – Only display the warnings if the unit belongs to the list of technologies provided in this parameter
- Returns:
Dataframe with the time series for each unit
- dispaset.preprocessing.data_handler.define_parameter(sets_in, sets, value=0)[source]¶
Function to define a DispaSET parameter and fill it with a constant value
- Parameters:
sets_in – List with the labels of the sets corresponding to the parameter
sets – dictionary containing the definition of all the sets (must comprise those referenced in sets_in)
value – Default value to attribute to the parameter
- dispaset.preprocessing.data_handler.export_yaml_config(ExcelFile, YAMLFile)[source]¶
Function that loads the DispaSET excel config file and dumps it as a yaml file.
- Parameters:
ExcelFile – Path to the Excel config file
YAMLFile – Path to the YAML config file to be written
- dispaset.preprocessing.data_handler.load_config(ConfigFile, AbsPath=True)[source]¶
Wrapper function around load_config_excel and load_config_yaml
- dispaset.preprocessing.data_handler.load_config_excel(ConfigFile, AbsPath=True)[source]¶
Function that loads the DispaSET excel config file and returns a dictionary with the values
- Parameters:
ConfigFile – String with (relative) path to the DispaSET excel configuration file
AbsPath – If true, relative paths are automatically changed into absolute paths (recommended)
- dispaset.preprocessing.data_handler.load_config_yaml(filename, AbsPath=True)[source]¶
Loads YAML file to dictionary
- dispaset.preprocessing.data_handler.load_geo_data(path, header=None)[source]¶
Load geo data for individual zones.
- Parameters:
path – absolute path to the geo data file
header – load header
- dispaset.preprocessing.data_handler.load_time_series(config, path, header='infer')[source]¶
Function that loads time series data, checks the compatibility of the indexes and guesses when no exact match between the required index and the data is present
- Param:
config dispaset config
- Param:
path path towards the desired timeseries
- Param:
header list of header names
- Returns:
reindexed timeseries
- dispaset.preprocessing.data_handler.merge_series(plants, oldplants, data, method='WeightedAverage', tablename='')[source]¶
Function that merges the times series corresponding to the merged units (e.g. outages, inflows, etc.)
- Parameters:
plants – Pandas dataframe with final units after clustering (must contain ‘FormerUnits’)
oldplants – Pandas dataframe with the original units
data – Pandas dataframe with the time series and the original unit names as column header
method – Select the merging method (‘WeightedAverage’/’Sum’)
tablename – Name of the table being processed (e.g. ‘Outages’), used in the warnings
- Return merged:
Pandas dataframe with the merged time series when necessary
- dispaset.preprocessing.data_handler.read_Participation(sheet, rowstart, colstart, rowstop, colapart=1)[source]¶
Creates dict for each technology and add 0 for false and 1 for true (first value for without CHP second with CHP) :param sheet: Excel sheet to load data from :param rowstart: Row to start reading the data :param colstart: Column to start reading the data :param rowstop: Row to stop reading the data :param colapart: Columns apart to read the data :return:
- dispaset.preprocessing.data_handler.read_truefalse(sheet, rowstart, colstart, rowstop, colstop, colapart=1)[source]¶
Function that reads a two column format with a list of strings in the first columns and a list of true false in the second column The list of strings associated with a True value is returned
dispaset.preprocessing.preprocessing module¶
This is the main file of the DispaSET pre-processing tool. It comprises a single function that generates the DispaSET simulation environment.
@author: S. Quoilin
- dispaset.preprocessing.preprocessing.build_simulation(config, mts_plot=None, MTSTimeStep=24)[source]¶
A function that builds different simulation environments based on the hydro scheduling option in the config file Hydro scheduling options:
Off - Hydro scheduling turned off, normal call of BuildSimulation function Zonal - Zonal variation of hydro scheduling, if zones are not individually specified in a list
(e.a. zones = [‘AT’,’DE’]) hydro scheduling is imposed on all active zones from the Config file
- Regional - Regional variation of hydro scheduling, if zones from a specific region are not individually
specified in a list (e.a. zones = [‘AT’,’DE’]), hydro scheduling is imposed on all active zones from the Config file simultaneously
- Parameters:
config – Read config file
mts_plot – If ms_plot = True indicative plot with temporary computed reservoir levels is displayed
MTSTimeStep – Run the mid-term scheduling with a different (to speed things up). If unspecified, the old MTS formulation is used
- Return SimData:
Simulation data for unit-commitment module
- dispaset.preprocessing.preprocessing.mid_term_scheduling(config, TimeStep=None, mts_plot=None)[source]¶
This function reads the DispaSET config file, searches for active zones, loads data for each zone individually and solves model using UCM_h_simple.gms
- Parameters:
config – Read config file
TimeStep – Time step (1, 2, 3, 4, 6, 8, 12, 24) number of hours to be considered at once.
mts_plot – If ms_plot = True indicative plot with temporary computed reservoir levels is displayed
- Return profiles:
Newly computed profile levels
dispaset.preprocessing.utils module¶
This file gathers different functions used in the DispaSET pre-processing tools
@author: Sylvain Quoilin
- dispaset.preprocessing.utils.EfficiencyTimeSeries(config, plants, Temperatures)[source]¶
Function that calculates an efficiency time series for each unit In case of generation unit, the efficiency is constant in time (for now) In case of of p2h units, the efficicncy is defined as the COP, which can be temperature-dependent or not If it is temperature-dependent, the formula is:
COP = COP_nom + coef_a * (T-T_nom) + coef_b * (T-T_nom)^2
- Parameters:
config – Dispa-SET config file
plants – Pandas dataframe with the original list of units
Temperatures – Dataframe with the temperature for all relevant units
- Returns:
Dataframe with a time series of the efficiency for each unit
- dispaset.preprocessing.utils.adjust_capacity(inputs, tech_fuel, scaling=1, value=None, singleunit=False, write_gdx=False, dest_path='')[source]¶
Function used to modify the installed capacities in the Dispa-SET generated input data The function update the Inputs.p file in the simulation directory at each call
- Parameters:
inputs – Input data dictionary OR path to the simulation directory containing Inputs.p
tech_fuel – tuple with the technology and fuel type for which the capacity should be modified
scaling – Scaling factor to be applied to the installed capacity
value – Absolute value of the desired capacity (! Applied only if scaling != 1 !)
singleunit – Set to true if the technology should remain lumped in a single unit
write_gdx – boolean defining if Inputs.gdx should be also overwritten with the new data
dest_path – Simulation environment path to write the new input data. If unspecified, no data is written!
- Returns:
New SimData dictionary
- dispaset.preprocessing.utils.adjust_flexibility(inputs, flex_units, slow_units, flex_ratio, singleunit=False, write_gdx=False, dest_path='')[source]¶
Function used to modify the share of the flexible capacity in the Dispa-SET input data The function update the Inputs.p file in the simulation directory at each call
- Parameters:
inputs – Input data dictionary OR path to the simulation directory containing Inputs.p
flex_units – Dispa-SET units table filtered with only the flexible ones
slow_units – Dispa-SET units table filtered with only the slow ones
flex_ratio – Target flexibility ratio (single number for all zones)
singleunit – Set to true if the technology should remain lumped in a single unit
write_gdx – boolean defining if Inputs.gdx should be also overwritten with the new data
dest_path – Simulation environment path to write the new input data. If unspecified, no data is written!
- Returns:
New SimData dictionary
- dispaset.preprocessing.utils.adjust_ntc(inputs, value=None, write_gdx=False, dest_path='')[source]¶
Function used to modify the net transfer capacities in the Dispa-SET generated input data The function update the Inputs.p file in the simulation directory at each call
- Parameters:
inputs – Input data dictionary OR path to the simulation directory containing Inputs.p
value – Absolute value of the desired capacity (! Applied only if scaling != 1 !)
write_gdx – boolean defining if Inputs.gdx should be also overwritten with the new data
dest_path – Simulation environment path to write the new input data. If unspecified, no data is written!
- Returns:
New SimData dictionary
- dispaset.preprocessing.utils.adjust_unit_capacity(SimData, u_idx, scaling=1, value=None, singleunit=False)[source]¶
Function used to modify the installed capacities in the Dispa-SET generated input data The function update the Inputs.p file in the simulation directory at each call
- Parameters:
SimData – Input data dictionary
u_idx – names of the units to be scaled
scaling – Scaling factor to be applied to the installed capacity
value – Absolute value of the desired capacity (! Applied only if scaling != 1 !)
singleunit – Set to true if the technology should remain lumped in a single unit
- Returns:
New SimData dictionary
- dispaset.preprocessing.utils.clustering(plants_in, method='Standard', Nslices=20, PartLoadMax=0.1, Pmax=30)[source]¶
Merge excessively disaggregated power Units.
- Parameters:
plants_in – Pandas dataframe with each power plant and their characteristics (following the DispaSET format)
method – Select clustering method (‘Standard’/’LP’/None)
Nslices – Number of slices used to fingerprint each power plant characteristics. Slices in the power plant data to categorize them (fewer slices involves that the plants will be aggregated more easily)
PartLoadMax – Maximum part-load capability for the unit to be clustered
Pmax – Maximum power for the unit to be clustered
- Returns:
A list with the merged plants and the mapping between the original and merged units
@author: Matthias Zech
- dispaset.preprocessing.utils.create_agg_dict(df_, method='Standard')[source]¶
This function returns a dictionnary with the proper aggregation method for each columns of the units table, depending on the clustering method
Author: Matthias Zech
- dispaset.preprocessing.utils.group_plants(plants, method, df_grouped=False, group_list=None)[source]¶
This function returns the final dataframe with the merged units and their characteristics
- Parameters:
plants – Pandas dataframe with each power plant and their characteristics (following the DispaSET format)
method – Select clustering method (‘Standard’/’LP’/None)
df_grouped – Set to True if this plants dataframe has already been grouped and contains the column “FormerIndexes”
group_list – List of columns whose values must be identical in order to group two units
- Returns:
A list with the merged plants and the mapping between the original and merged units
- dispaset.preprocessing.utils.incidence_matrix(sets, set_used, parameters, param_used)[source]¶
This function generates the incidence matrix of the lines within the nodes A particular case is considered for the node “Rest Of the World”, which is no explicitely defined in DispaSET
- dispaset.preprocessing.utils.interconnections(Simulation_list, NTC_inter, Historical_flows)[source]¶
Function that checks for the possible interconnections of the zones included in the simulation. If the interconnections occurs between two of the zones defined by the user to perform the simulation with, it extracts the NTC between those two zones. If the interconnection occurs between one of the zones selected by the user and one country outside the simulation, it extracts the physical flows; it does so for each pair (country inside-country outside) and sums them together creating the interconnection of this country with the RoW.
- Parameters:
Simulation_list – List of simulated zones
NTC_inter – Day-ahead net transfer capacities (pd dataframe)
Historical_flows – Historical flows (pd dataframe)
- dispaset.preprocessing.utils.pd_timestep(hours)[source]¶
Function that converts time steps in hours into pandas frequencies (e.g ‘1h’, ‘15min’, …)
- dispaset.preprocessing.utils.select_units(units, config)[source]¶
Function returning a new list of units by removing the ones that have unknown technology, zero capacity, or unknown zone
- Parameters:
units – Pandas dataframe with the original list of units
config – Dispa-SET config dictionnary
- Returns:
New list of units