Python OpenET Client Documentation

Caution

This project is not affiliated with OpenET and has been developed independently.

Installation

python -m pip install openet-client

If you want support for spatial operations that help with wrapping the geodatabase API, also run

python -m pip install openet-client[spatial]

This will attempt to install geopandas and fiona, which are required for spatial processing. These packages may have trouble (especially on Windows) due to external dependencies. We recommend using a conda environment and the conda packages to simplify that install. In that case, simply use conda to install geopandas to install the necessary dependencies instead of running the above command.

You may also download the repository and run python setup.py install to use the package, replacing python with the full path to your python interpreter, if necessary.

Getting Started

The OpenET Client Python package provides convenience functions to make it easier to request and retrieve data from OpenET’s API, such as tools to match your spatial data to OpenET’s and send requests for raster data and manage the download process.

Core functionality of the package includes:

  • Automatic matching of any geopandas-compatible dataset (including Shapefiles) with OpenET geodatabase API fields to make downloading timeseries data easier

  • Fully automated retrieval and attachment of ET values back to input datasets - provide the dataset and API parameters and receive the results back as a field on the input data

  • Send multiple requests to the raster API and have the client manage waiting for and downloading the data when it’s ready, or have your code continue doing other work and manually trigger a download check later on.

Note

Using this client still requires knowing some information about the OpenET API itself, including the API endpoints you wish to use and the parameters you want to send to the API endpoints. We recommend familiarizing yourself with the documentation for the OpenET API itself to use this Python package. While the API documentation expects you to handle everything related to sending requests to the API and understanding the response, this package handles those tasks for you, but you need to know which requests you want to send.

The OpenET API’s documentation can be found at https://open-et.github.io/docs/build/html/index.html

Using the Client with the OpenET Geodatabase API

The geodatabase API is supported, including functions that allow pulling ET data for spatial objects.

import os
import geopandas
import openet_client

features = "PATH TO YOUR SPATIAL DATA" # must be a format geopandas supports, which is most spatial data
df = geopandas.read_file(features)

client = openet_client.OpenETClient()
client.token = os.environ["OPENET_TOKEN"]
result = client.geodatabase.get_et_for_features(params={
        "aggregation": "mean",
        "feature_collection_name": "CA",
        "model": "ensemble_mean",
        "variable": "et",
        "start_date": 2018,
        "end_date": 2018
    },
    features=df,
    feature_type=openet_client.geodatabase.FEATURE_TYPE_GEOPANDAS,
    output_field="et_2018_mean_ensemble_mean",
    endpoint="timeseries/features/stats/annual"
)

More documentation for this portion of the API will be forthcoming, but note that, like the raster API, you provide a set of parameters that will be sent directly to OpenET based on the endpoint. The function get_et_for_features takes many additional parameters that indicate what kind of data you’re providing as an input, in this case a geopandas data frame. You also can provide different geodatabase feature endpoints.

This function then calculates the centroid of each feature, finds the fields in OpenET that are associated with those centroids, then downloads the ET data for those fields based on the params you provide. It attaches the ET to a data frame as a new field with the name specified in output_field. Note that for large features, it does not currently retrieve ET for multiple fields and aggregate them to the larger area. For that functionality, use the raster functionality.

This function also caches the field IDs for the features to avoid future lookups that use API quota. Rerunning the same features with different params will run significantly faster and use significantly fewer API requests behind the scenes.

Geodatabase API Access Class and Methods

class openet_client.Geodatabase(client)
feature_ids_list(params=None)

The base OpenET Method - sends the supplied params to metadata/openet/region_of_interest/feature_ids_list and returns the requests.Response object

Parameters:

params

Returns:

get_et_for_features(params, features, feature_type, output_field=None, geometry_field='geometry', endpoint='timeseries/features/stats/annual', wait_time=5000, batch_size=40, return_type='joined', join_type='outer')

Takes one of multiple data formats (user specified, we’re not inspecting it - options are geopandas, geojson) and gets its coordinate values, then gets the field IDs in OpenET for the coordinate pair, retrieves the ET data and returns it as a geopandas data frame with the results in the specified output_field

Parameters:
  • params

  • features

  • endpoint – which features endpoint should it use?

  • return_type

    How should we return the data? Options are “raw” to return just the JSON from OpenET, “list” to return a list of dictionaries with the OpenET data, “pandas” to return a pandas

    data frame of the results, or “joined” to return the

    data joined back to the input data. “joined” is the default.

  • join_type – When merging results back in, what type of join should we use? Defaults to “outer” so that records are retained even if no results come back for them. This is also useful behavior when we have multiple timeseries records, such as for monthly results, but it can duplicate input records (not always desirable). To change the behavior, change this to any value supported by pandas.merge or change the return_type so no join occurs.

Returns:

get_et_for_openet_feature_list(feature_ids, endpoint, params, wait_time=5000, batch_size=40)

Retrieve ET for a list of OpenET Feature IDs and return the raw JSON data. To handle retrieving for spatial data you already have, use get_et_for_features instead.

Parameters:
  • feature_ids – a list of strings containing OpenET feature IDs

  • endpoint – The OpenET endpoint to run the request against. No default

  • params – The parameters as specified in the OpenET documentation (https://open-et.github.io)

  • wait_time – How long to wait between requests to avoid hitting a rate limit. Defaults to 5 seconds

  • batch_size – How large of batches should we use by default? Defaults to 40

Returns:

A list of dictionaries as returned from JSON by the OpenET API

get_feature_ids(features, field=None, wait_time=5000)

An internal method used to get a list of coordinate pairs and return the feature ID. Values come back as a dictionary where the input item in the list (coordinate pair shown as DD Longitude space DD latitude) is a dictionary key and the value is the OpenET featureID

Parameters:
  • features

  • field – when field is defined, features will be a pandas data frame with a field that has the coordinate values to use. In that case, results will be joined back to the data frame as the field openet_feature_id.

  • wait_time – how long in ms should we wait between subsequent requests?

Returns:

Using the Client with the OpenET Raster API

Examples

One common need is issuing a raster export command and then waiting to proceed until the image is available and downloaded to the current machine. To do that:

Warning

The current approach will make all exported rasters public in order to be able to automatically download them - do not proceed to use this code if that isn’t acceptable.

import openet_client

# arguments are in the form of a dictionary with keys and
# values conforming to https://open-et.github.io/docs/build/html/ras_export.html
# In the future, geometry may accept OGR or GEOS objects and create the string itself
arguments = {
    'start_date': '2016-01-01',
    'end_date': '2016-03-20',
    'geometry': '-120.72612533471566,37.553211935016215,-120.72612533471566,37.474782294423676,-120.59703597924691,37.474782294423676,-120.59703597924691,37.553211935016215',
    'filename_suffix': 'client_test',
    'variable': 'et',
    'model': 'ensemble',
    'units': 'metric'
}

client = openet_client.OpenETClient("your_open_et_token_value_here")

# note that the path matches OpenET's raster export endpoint
client.raster.export(arguments, synchronous=True)  # synchronous says to wait for it to download before proceeding
print(client.raster.downloaded_raster_paths)  # get the paths to the downloaded rasters (will be a list, even for a single raster)

Batching it

You may also want to queue up multiple rasters, then wait to download them all. To do that, run the raster.export commands with synchronous=False (the default), then issue a call to wait_for_rasters

import openet_client

client = openet_client.OpenETClient("your_open_et_token_value_here")
arguments1 = {}  # some set of arguments, similar to the first example
arguments2 = {}  # same
client.raster.export(arguments1)
client.raster.export(arguments2)
client.raster.wait_for_rasters()  # this will keep running until all rasters are downloaded - it will wait up to a day by default, but that's configurable by providing a `max_time` argument in seconds
print(client.raster.downloaded_raster_paths)  # a list with all downloaded rasters
# or
rasters = client.raster.registry.values()  # get all the Raster objects including remote URLs and local paths
Doing work while you wait + manual control

You might also not want to wait around for the rasters to export, but still have control over the process. Here’s how to manually control the flow

import openet_client

client = openet_client.OpenETClient("your_open_et_token_value_here")
arguments = {}  # some set of arguments, similar to the first example
my_raster = client.raster.export(arguments)

# ... any other code you like here - the OpenET API will do its work and make the raster ready - or not, depending on your place in their queue ...

client.raster.check_statuses()  # check the API's all_files endpoint to see which rasters are ready
if my_raster.status == openet_client.raster.STATUS_AVAILABLE  # check that the raster we want is now ready
    client.raster.download_available_rasters()  # try to download the ones that are ready and not yet downloaded (from this session)

Raster API Class and Methods

class openet_client.raster.Raster(request_result)

Internal object for managing raster exports - tracks current status, the remote URL and the local file path once it exists. Users of this package shouldn’t need to instantiate this object directly in most cases.

download_file(retry_interval=20, max_wait=600)

Attempts to download a raster, assuming it’s ready for download. Will make multiple attempts over a few minutes because sometimes it takes a while for the permissions to propagate, so the first few responses may give a 403 error. We then have a timeout (max_wait) where if we exceed that value, we exit anyway without downloading.

Downloads the file to a tempfile path - the user may move the file after that if they wish.

Parameters:
  • retry_interval – time in seconds between repeated attempts

  • max_wait – How long, in seconds should we wait for the correct permissions before stopping attempts to download.

Returns:

class openet_client.raster.RasterManager(client)

The manager that becomes the .raster attribute on the OpenETClient object. Handles submitting raster export requests and polling for completed exports.

As constructed, could slow down if it handles many thousands of raster exports, or if the all_files OpenET endpoint displays lots of files as options.

Generally speaking, you won’t create this object yourself, but you can set client.raster.wait_interval to the length of time, in seconds, that the manager should wait between polling the all_files endpoint for new exports when waiting for new rasters.

property available_rasters

Which rasters have we marked as ready to download, but haven’t yet been retrieved?

Returns:

check_statuses(rasters=None)

Updates the status information on each raster only - does not attempt to download them.

Parameters:

rasters – the list of rasters to update the status of - if not provided, defaults to all rasters that are queued and not downloaded

Returns:

None

download_available_rasters()

Attempts to download all available rasters individually

Returns:

export(params=None, synchronous=False, public=True, transform=False)

Handles the raster/export endpoint for OpenET. Optionally waits for the raster to be exported and downloaded before proceeding. See documentation examples for usage details.

Parameters:
  • params – A dictionary of arguments with keys matching the raster/export endpoints parameters and values matching the requirements for the values of those keys. If the “geometry” key of this value is a GEOS or an OGR object with a .coords attribute, then the correct geometry string to send to the API will be composed for you. Otherwise, it is your responsibility to match the values with the API’s requirements.

  • synchronous – Whether or not to wait for the raster to export and be downloaded before exiting this function and proceeding

  • public – Whether or not to make the raster public - at this point, keeping this as True is required for all the features of this package to work, but if you just want to use the package to submit a bunch of raster jobs, but not to download those rasters, then you may set this to False.

  • transform – If a GEOS or OGR object with transform and coords attributes is passed in as part of params[“geography”], such as an object from GeoDjango, then you can optionally specify that this function handle a simple transformation to WGS84 (EPSG 4326) for you. For more complicated transformations or datum transformations, it may be best to handle the transformation before running this function. Setting it to False doesn’t control usage of a GEOS/OGR object is used, only if its coordinates are transformed first.

Returns:

Raster object - when synchronous, the local_file attribute will have the path to the downloaded raster on disk - otherwise it will have the status of the raster

property queued_rasters

Which rasters are we still waiting for?

Returns:

wait_for_rasters(uuid=None, max_time=86400)

When we want to just wait until the rasters are ready, we call this method, which polls the all_files endpoint at set intervals and checks which rasters are done.

Then updates the statuses on individual rasters and calls the download functions for the individual rasters and time some are ready to download, but won’t exit until all rasters are available and have had at least one download attempt.

It is recommended to use this after queueing a batch of rasters for running so that they may be exporting all at the same time before waiting - it will wait until all are exported before returning flow control to the calling function. Running this after each raster export will result in much longer runtimes (because exports will not run in parallel).

Parameters:
  • uuid

  • max_time – Maximum time in seconds to wait for all rasters to complete - defaults to 86400 (a day)

Returns:

Raster Timeseries Data

class openet_client.timeseries.RasterTimeSeries(raster_manager)
point_sample(longitude, latitude, start_date, end_date, interval='monthly', make_lookup=False, **params)

A general function to retrieve the timeseries data from OpenET for a specific coordinate and date range. Uses the Raster API endpoint at raster/timeseries/point to retrieve data. This function retrieves the full timeseries between the start and the end date, but convenience functions for retrieving a single value are also available as single_month_point_sample or single_day_point_sample

This function accepts specified keyword arguments for the most common items to specify, but all other parameters to the endpoint can be provided as additional keyword arguments to this function. You do not need to include the keys for lon or lat - if provided - they will be set (or overwritten) by the values provided in the keys longitude and latitude (note that this function uses the full words for longitude and latitude though)

This function returns a list of dictionaries, as provided by the OpenET API. Each dictionary represents a single observation/value and will have a key time, whose value indicates the timepoint of the observation. The dictionaries will also have a second key for the variable returned (defaults to et) - see the OpenET API documentation for options.

Parameters:
  • longitude – Longitude portion of the coordinate to retrieve data from, in decimal degrees. See OpenET API documentation for any additional specifications for this item (param lon to the OpenET API)

  • latitude – Latitude portion of the coordinate to retrieve data from, in decimal degrees

  • start_date – The date to start the sample. Can be a Python standard library datetime.datetime object, an arrow.Arrow object or a string in “YYYY-MM-DD” format.

  • end_date – The date to end the sample. Can be a Python standard library datetime.datetime object, an arrow.Arrow object or a string in “YYYY-MM-DD” format.

  • interval – The time step to use in the timeseries. The OpenET API documentation doesn’t specify valid values here, but monthly: and :code:`daily are both known allowed values When using the monthly timestep, the returned timeseries will use dates for the first of every month within the timeseries. See return below for more details

  • make_lookup

    By default, the API returns a list of dictionaries that each have a key for time and the variable requested (e.g. et). To find specific values in that list you would need to search all objects. If, instead, you want to rely on a known structure of the time values (e.g., that with monthly data, values will look like 2018-01-01 then 2018-02-01), you canm set make_lookup to true. This flag changes the return type of the function into a dictionary, where the keys are the dates and the values are the data in the variable field of the original dictionaries (e.g. the et value).

    For example, if their API returned the following data for a request, by default, this function would return a similar Python representation of the data:

    [
            {
                    "time": "2018-01-01",
                    "et": 30
            },
            {
                    "time": "2018-02-01",
                    "et": 52
            },
    ]
    

    If make_lookup is set to True then this function will instead return the following dictionary

    {
            "2018-01-01": 30,
            "2018-02-01": 52,
    }
    

  • params – Additional keyword arguments that the OpenET API allows can be provided to this function and they will be passed along to the API. Do not provide a keyword params to this function. Instead, provide keyword arguments that match the OpenET API’s parameter names

Returns:

See make_lookup above for return behavior, which is dependent on the value of make_lookup. Either a list of dictionaries (loaded from JSON) by default, or a dictionary when make_lookup == True

single_day_point_sample(longitude, latitude, date, **params)

The point_sample function can return an arbitrary timeseries for a single point. This function instead returns only the ET value specified, or other variable if an argument is provided for the API. Optional additional parameters may sent to the same endpoint as keyword arguments to this function - the same as other functions in this same timeseries module. Returns a single value for the day specified.

If you need to retrieve multiple values, you may stay within your OpenET API quotas better with well-constructed use of the point_sample, but for scattered or small numbers of samples, this function may be easier to use.

Parameters:
  • longitude

    Longitude portion of the coordinate to retrieve data from, in decimal degrees. See OpenET API documentation for any additional specifications for this item (param lon to the OpenET API)

  • latitude – Latitude portion of the coordinate to retrieve data from, in decimal degrees

  • date – The day to obtain the sample for. The date may be a Python standard library datetime.datetime object, an arrow.Arrow object or a string in “YYYY-MM-DD” format.

  • params – Additional keyword arguments that the OpenET API allows can be provided to this function and they will be passed along to the API. Do not provide a keyword params to this function. Instead, provide keyword arguments that match the OpenET API’s parameter names. Note that in this function, you should not provide keyword arguments for interval, lat, lon, start_date, or end_date, as these values will be created automatically by this function.

Returns:

A single value (may be a string, check the type before using) for the variable of interest, in the units returned by the API for the day specified. By default, the API will return values for ET, from the ensemble model, in metric units. To change these parameters, pass the appropriate additional keyword arguments for the API into this function.

single_month_point_sample(longitude, latitude, date, **params)

The point_sample function can return an arbitrary timeseries for a single point. This function instead returns only the ET value specified, or other variable if an argument is provided for the API. Optional additional parameters may sent to the same endpoint as keyword arguments to this function - the same as other functions in this same timeseries module. Returns a single value for the month specified.

If you need to retrieve multiple values, you may stay within your OpenET API quotas better with well-constructed use of the point_sample, but for scattered or small numbers of samples, this function may be easier to use.

Parameters:
  • longitude

    Longitude portion of the coordinate to retrieve data from, in decimal degrees. See OpenET API documentation for any additional specifications for this item (param lon to the OpenET API)

  • latitude – Latitude portion of the coordinate to retrieve data from, in decimal degrees

  • date – The month to obtain the sample for. The date may be a Python standard library datetime.datetime object, an arrow.Arrow object or a string in “YYYY-MM-DD” format.

  • params – Additional keyword arguments that the OpenET API allows can be provided to this function and they will be passed along to the API. Do not provide a keyword params to this function. Instead, provide keyword arguments that match the OpenET API’s parameter names. Note that in this function, you should not provide keyword arguments for interval, lat, lon, start_date, or end_date, as these values will be created automatically by this function.

Returns:

A single value (may be a string, check the type before using) for the variable of interest, in the units returned by the API for the month specified. By default, the API will return values for ET, from the ensemble model, in metric units. To change these parameters, pass the appropriate additional keyword arguments for the API into this function.

Sending Your Own Requests to the API

Client Class and Methods

class openet_client.OpenETClient(token=None)
send_request(endpoint, method='get', disable_encoding=False, **kwargs)

Handles sending most requests to the API - they provide the endpoint and the args. Since the API is in the process of switching from GET to POST requests, we have logic that switches between those depending on the request method

Parameters:
  • endpoint – The text path to the OpenET endpoint - e.g. raster/export - skip the base URL.

  • method – “get” or “post” (case sensitive) - should match what the API supports for the endpoint

  • kwargs – The arguments to send (via get or post) to the API

Returns:

requests.Response object of the results.

The Data Download Cache

Indices and tables