Before uploading data to EnergyDataDK, data owners must provide specific information about their data.
This information is crucial for ensuring smooth data usage by both users and data owners.
Most of the information, unless otherwise specified, is visible to all users.
This guide consists of two parts:
A dataset is essentially a collection of interrelated related datastreams. Therefore not much information is needed to set one up. The dataset set must obviously have a name, so it can be identified, a MQTT topic prefix, to identify the dataset to a data broker, a description, to specify details about the data contained within the dataset, and lastly a picture can be added to make the dataset visually easier to identify in the dataset overview.
Datasets must have a unique name and MQTT topic prefix. The latter is essentially the first level of a hierarchy.
In this example of an MQTT topic: denmark/hovedstaden/lyngby, the prefix would be denmark.
The prefix must consist of only hyphens, underscores, and alphanumeric characters.
Datasets must also have a proper description that informs about the source, period, irregularities, usage, and a POC.
Optionally, an image can be uploaded to graphically represent the dataset.
The name of the dataset will be the primary means of identifying what data is contained within the dataset by EnergyDataDK users. The name should be intuitive for the data owner, internal-, and external users. We suggest including a project, company, or lab name, and to include the information type.
Here are some examples:MQTT topics are a fundamental part of how the MQTT protocol routes messages between publishers and subscribers. They act as “addresses” that define where each message should be delivered. MQTT topics are hierarchical, and have levels separated with slashes (/). So you may consider the prefix to be the first level of the hierachy.
For example, if this would be our MQTT topic: usa/california/san-francisco/silicon-valley, then usa would be our MQTT topic prefix.
A topic prefix is a single string of alphanumeric characters, underscores, and hyphens. Furthermore, since MQTT topics are case sensitive, it’s recommended to only use lowercase letters. The topic prefix is only visible to dataset owners. You can read more about its usage in the API description.
The dataset description can be edited by data owners after the dataset’s creation and should be updated as soon as possible when the above information is available, or if anything changes.
The dataset consists of synthetic data generated for demonstration purposes. It contains randomly generated records representing different data types commonly used in structured datasets. Dataset Structure:The dataset includes 100 records spanning from April 1st, 2023, to April 5th, 2023. There are missing values between 14:00 and 18:00 on the 4th of April due to the server maintenance carried out at that time. Data is recorded at hourly intervals. To use the dataset, the user must sign an NDA. For further details regarding the dataset and the NDA, please contact: example@email.com
- Alphanumeric Data: 2 independent datastreams containing randomly generated text strings.
- Integer Data: 3 independent datastreams with randomly generated numbers.
- Boolean Values: 1 datastream representing True/False values.
Adding a picture is an optional feature, but it makes it easier to identify datasets.
If you have many datasets, avoid using the same picture for all of these, as it would defeat the purpose.
The picture should be intuitive both for the data owner and users with access to the data.
A datastream is essentially a channel where data from a sensor, measurement device or similar is received.
All observations at the channel are a tuple with a time stamp indicating when observation occurred and the value that was measured.
All timestamps in EnergyDataDK are in UTC time.
Each datastream is assigned a name, a MQTT topic suffix, a data type, and is described by a number of mandatory tags (metadata) that qualify the data.
Datastreams must have a unique name and MQTT topic suffix. The latter is essentially the part of the MQTT topic beyond the prefix (first level) of the hierarchical structure. So in this example: denmark/hovedstaden/lyngby, the suffix would be hovedstaden/lyngby.
The suffix must only consist of hyphens, underscores, slashes and alphanumeric characters.
The type of data (integer, double, or string) in the datastream must be declared.
There are a fixed number of mandatory fields that must be filled out and you can additionally add a virtually unlimited umber of extra metadata fields.
Similar to the dataset, it’s important to carefully choose a name which makes it easy to understand for any user what data is recorded in the stream.
usa/california/san-francisco/silicon-valley, then california/san-francisco/silicon-valley would be our MQTT topic suffix. Each datastream has a number of mandatory fields which qualify the data contained therein.
You can also add a virtually unlimited number of custom fields.
Here you should enter more detailed information about the datastream which isn’t already made clear from it’s name.
Here are some CC licenses which describe the terms of use they are listed from most to least permissive below.
The geographical coordinates of where the data is collected. You can enter just a region or address in the text field, the system will offer matches to your query with their corresponding geo-location coordinates.
The name of the installation where the data collection takes place.
The name of the organization responsible for the data collection.
The name of the project the data is being collected for.
This categorizes the datastream by its subject. Since this will likely be similar to the search term used to find a particular datastream it should be clear and consise.
Here a a few examples: “Solar energy”, “CO2 emission”, “district heating”, etc.
The Unit of measurement for the data in the datastream. The system will suggest an option based on your input.
You have the option of adding a virtually unlimited number of extra metadata fields to your datastream, besides the mandatory ones. This could be anything you or other users may find relevant.
Keep in mind that the naming must be very clear and intuitive, since these will be nonstandard fields. You may also want to consider adding some documentation about these metadata fields in the description of the dataset.