CosmiQ Works GeoData API reference

cw-geodata class and function list

cw_geodata.raster_image.image.get_geo_transform(…)

Get the geotransform for a raster image source.

cw_geodata.vector_label.polygon.affine_transform_gdf(…)

Perform an affine transformation on a GeoDataFrame.

cw_geodata.vector_label.polygon.convert_poly_coords(geom)

Georegister geometry objects currently in pixel coords or vice versa.

cw_geodata.vector_label.polygon.geojson_to_px_gdf(…)

Convert a geojson or set of geojsons from geo coords to px coords.

cw_geodata.vector_label.polygon.georegister_px_df(df)

Convert a dataframe of geometries in pixel coordinates to a geo CRS.

cw_geodata.vector_label.polygon.get_overlapping_subset(gdf)

Extract a subset of geometries in a GeoDataFrame that overlap with im.

cw_geodata.vector_label.graph.geojson_to_graph(geojson)

Convert a geojson of path strings to a network graph.

cw_geodata.vector_label.graph.get_nodes_paths(…)

Extract nodes and paths from a vector file.

cw_geodata.vector_label.graph.process_linestring

cw_geodata.vector_label.mask.boundary_mask([…])

Convert a dataframe of geometries to a pixel mask.

cw_geodata.vector_label.mask.contact_mask(df)

Create a pixel mask labeling closely juxtaposed objects.

cw_geodata.vector_label.mask.df_to_px_mask(df)

Convert a dataframe of geometries to a pixel mask.

cw_geodata.vector_label.mask.footprint_mask(df)

Convert a dataframe of geometries to a pixel mask.

cw_geodata.utils.geo.geometries_internal_intersection(…)

Get the intersection geometries between all geometries in a set.

cw_geodata.utils.geo.list_to_affine(xform_mat)

Create an Affine from a list or array-formatted [a, b, d, e, xoff, yoff]

cw_geodata.utils.geo.split_multi_geometries(gdf)

Split apart MultiPolygon or MultiLineString geometries.

Raster/Image functionality

Image submodule

cw_geodata.raster_image.image.get_geo_transform(raster_src)[source]

Get the geotransform for a raster image source.

Parameters

raster_src (str, rasterio.DatasetReader, or osgeo.gdal.Dataset) – Path to a raster image with georeferencing data to apply to geom. Alternatively, an opened rasterio.Band object or osgeo.gdal.Dataset object can be provided. Required if not using affine_obj.

Returns

transform – An affine transformation object to the image’s location in its CRS.

Return type

affine.Affine

Vector/Label functionality

Polygon submodule

cw_geodata.vector_label.polygon.affine_transform_gdf(gdf, affine_obj, inverse=False, geom_col='geometry', precision=None)[source]

Perform an affine transformation on a GeoDataFrame.

Parameters
  • gdf (geopandas.GeoDataFrame, pandas.DataFrame, or str) – A GeoDataFrame, pandas DataFrame with a "geometry" column (or a different column containing geometries, identified by geom_col - note that this column will be renamed "geometry" for ease of use with geopandas), or the path to a saved file in .geojson or .csv format.

  • affine_obj (list or affine.Affine) – An affine transformation to apply to geom in the form of an [a, b, d, e, xoff, yoff] list or an affine.Affine object.

  • inverse (bool, optional) – Use this argument to perform the inverse transformation.

  • geom_col (str, optional) – The column in gdf corresponding to the geometry. Defaults to 'geometry'.

  • precision (int, optional) – Decimal precision to round the geometries to. If not provided, no rounding is performed.

cw_geodata.vector_label.polygon.convert_poly_coords(geom, raster_src=None, affine_obj=None, inverse=False, precision=None)[source]

Georegister geometry objects currently in pixel coords or vice versa.

Parameters
  • geom (shapely.geometry.shape or str) – A shapely.geometry.shape, or WKT string-formatted geometry object currently in pixel coordinates.

  • raster_src (str, optional) – Path to a raster image with georeferencing data to apply to geom. Alternatively, an opened rasterio.Band object or osgeo.gdal.Dataset object can be provided. Required if not using affine_obj.

  • affine_obj (list or affine.Affine) – An affine transformation to apply to geom in the form of an [a, b, d, e, xoff, yoff] list or an affine.Affine object. Required if not using raster_src.

  • inverse (bool, optional) – If true, will perform the inverse affine transformation, going from geospatial coordinates to pixel coordinates.

  • precision (int, optional) – Decimal precision for the polygon output. If not provided, rounding is skipped.

Returns

A geometry in the same format as the input with its coordinate system transformed to match the destination object.

Return type

out_geom

cw_geodata.vector_label.polygon.geojson_to_px_gdf(geojson, im_path, precision=None)[source]

Convert a geojson or set of geojsons from geo coords to px coords.

Parameters
  • geojson (str) – Path to a geojson. This function will also accept a pandas.DataFrame or geopandas.GeoDataFrame with a column named 'geometry' in this argument.

  • im_path (str) – Path to a georeferenced image (ie a GeoTIFF) that geolocates to the same geography as the geojson`(s). If a directory, the bounds of each GeoTIFF will be loaded in and all overlapping geometries will be transformed. This function will also accept a :class:`osgeo.gdal.Dataset or rasterio.DatasetReader with georeferencing information in this argument.

  • precision (int, optional) – The decimal precision for output geometries. If not provided, the vertex locations won’t be rounded.

Returns

output_df – A pandas.DataFrame with all geometries in geojson that overlapped with the image at im_path converted to pixel coordinates. Additional columns are included with the filename of the source geojson (if available) and images for reference.

Return type

pandas.DataFrame

cw_geodata.vector_label.polygon.georegister_px_df(df, im_fname=None, affine_obj=None, crs=None, geom_col='geometry', precision=None)[source]

Convert a dataframe of geometries in pixel coordinates to a geo CRS.

Parameters
  • df (pandas.DataFrame) – A pandas.DataFrame with polygons in a column named "geometry".

  • im_fname (str, optional) – A filename or rasterio.DatasetReader object containing an image that has the same bounds as the pixel coordinates in df. If not provided, affine_obj and crs must both be provided.

  • affine_obj (list or affine.Affine, optional) – An affine transformation to apply to geom in the form of an [a, b, d, e, xoff, yoff] list or an affine.Affine object. Required if not using raster_src.

  • crs (dict, optional) – The coordinate reference system for the output GeoDataFrame. Required if not providing a raster image to extract the information from. Format should be {'init': 'epsgxxxx'}, replacing xxxx with the EPSG code.

  • geom_col (str, optional) – The column containing geometry in df. If not provided, defaults to "geometry".

  • precision (int, optional) – The decimal precision for output geometries. If not provided, the vertex locations won’t be rounded.

cw_geodata.vector_label.polygon.get_overlapping_subset(gdf, im=None, bbox=None, bbox_crs=None)[source]

Extract a subset of geometries in a GeoDataFrame that overlap with im.

Notes

This function uses RTree’s spatialindex, which is much faster (but slightly less accurate) than direct comparison of each object for overlap.

Parameters
  • gdf (geopandas.GeoDataFrame) – A geopandas.GeoDataFrame instance or a path to a geojson.

  • im (rasterio.DatasetReader or str, optional) – An image object loaded with rasterio or a path to a georeferenced image (i.e. a GeoTIFF).

  • bbox (list or shapely.geometry.Polygon, optional) – A bounding box (either a shapely.geometry.Polygon or a [bottom, left, top, right] list) from an image. Has no effect if im is provided (bbox is inferred from the image instead.) If bbox is passed and im is not, a bbox_crs should be provided to ensure correct geolocation - if it isn’t, it will be assumed to have the same crs as gdf.

Returns

output_gdf – A geopandas.GeoDataFrame with all geometries in gdf that overlapped with the image at im. Coordinates are kept in the CRS of gdf.

Return type

geopandas.GeoDataFrame

Graph submodule

class cw_geodata.vector_label.graph.Edge(nodes, edge_weight=None)[source]

An object to hold edge attributes.

nodes

Node instances connected by the edge.

Type

2-tuple of Node s

weight

The weight of the edge.

Type

int or float

get_node_idxs()[source]

Return the Node.idx for the nodes in the edge.

set_edge_weight(normalize_factor=None, inverse=False)[source]

Get the edge weight based on Euclidean distance between nodes.

Note

This method does not account for spherical deformation (i.e. does not use the Haversine equation). It is a simple linear distance.

Parameters
  • normalize_factor (int or float, optional) – a number to multiply (or divide, if inverse=True) the Euclidean distance by. Defaults to None (no normalization)

  • inverse (bool, optional) – if True, the Euclidean distance weight will be divided by normalize_factor instead of multiplied by it.

class cw_geodata.vector_label.graph.Node(idx, x, y)[source]

An object to hold node attributes.

idx

The numerical index of the node. Used as a unique identifier when the nodes are added to the graph.

Type

int

x

Numeric x location of the node, in either a geographic CRS or in pixel coordinates.

Type

int or float

y

Numeric y location of the node, in either a geographic CRS or in pixel coordinates.

Type

int or float

class cw_geodata.vector_label.graph.Path(edges=None, properties=None)[source]

An object to hold Edge s with common properties.

edges

A list of Edge s

Type

list of Edge s

properties

A dictionary of property: value pairs that provide relevant metadata about edges along the path (e.g. road type, speed limit, etc.)

Type

dict

add_data(property, value)[source]

Add a property: value pair to the Path.properties attribute.

add_edge(edge)[source]

Add an edge to the path.

set_edge_weights(data_key=None, inverse=False, overwrite=True)[source]

Calculate edge weights for all edges in the Path.

cw_geodata.vector_label.graph.geojson_to_graph(geojson, graph_name=None, retain_all=True, valid_road_types=None, road_type_field='type', edge_idx=0, first_node_idx=0, weight_norm_field=None, inverse=False, workers=1, verbose=False)[source]

Convert a geojson of path strings to a network graph.

Parameters
  • geojson (str) – Path to a geojson file (or any other OGR-compatible vector file) to load network edges and nodes from.

  • graph_name (str, optional) – Name of the graph. If not provided, graph will be named 'unnamed' .

  • retain_all (bool, optional) – If True , the entire graph will be returned even if some parts are not connected. Defaults to True.

  • valid_road_types (list of int s, optional) –

    The road types to permit in the graph. If not provided, it’s assumed that all road types are permitted. The possible values are integers 1-7, which map as follows:

    1: Motorway
    2: Primary
    3: Secondary
    4: Tertiary
    5: Residential
    6: Unclassified
    7: Cart track
    

  • road_type_field (str, optional) – The name of the property in the vector data that delineates road type. Defaults to 'type' .

  • edge_idx (int, optional) – The first index to use for an edge. This can be set to a higher value so that a graph’s edge indices don’t overlap with existing values in another graph.

  • first_node_idx (int, optional) – The first index to use for a node. This can be set to a higher value so that a graph’s node indices don’t overlap with existing values in another graph.

  • weight_norm_field (str, optional) – The name of a field in geojson to pass to argument data_key in Path.set_edge_weights(). Defaults to None, in which case no weighting is performed (weights calculated solely using Euclidean distance.)

  • workers (int, optional) – Number of parallel processes to run for parallelization. Defaults to 1. Should not be greater than the number of CPUs available.

  • verbose (bool, optional) – Verbose print output. Defaults to False .

Returns

G – A networkx.MultiDiGraph containing all of the nodes and edges from the geojson (or only the largest connected component if retain_all = False). Edge lengths are weighted based on geographic distance.

Return type

networkx.MultiDiGraph

cw_geodata.vector_label.graph.get_nodes_paths(vector_file, first_node_idx=0, node_gdf=Empty GeoDataFrame Columns: [] Index: [], valid_road_types=None, road_type_field='type', workers=1, verbose=False)[source]

Extract nodes and paths from a vector file.

Parameters
  • vector_file (str) – Path to an OGR-compatible vector file containing line segments (e.g., JSON response from from the Overpass API, or a SpaceNet GeoJSON).

  • first_path_idx (int, optional) – The first index to use for a path. This can be set to a higher value so that a graph’s path indices don’t overlap with existing values in another graph.

  • first_node_idx (int, optional) – The first index to use for a node. This can be set to a higher value so that a graph’s node indices don’t overlap with existing values in another graph.

  • node_gdf (geopandas.GeoDataFrame , optional) – A geopandas.GeoDataFrame containing nodes to add to the graph. New nodes will be added to this object incrementally during the function call.

  • valid_road_types (list of int s, optional) –

    The road types to permit in the graph. If not provided, it’s assumed that all road types are permitted. The possible values are integers 1-7, which map as follows:

    1: Motorway
    2: Primary
    3: Secondary
    4: Tertiary
    5: Residential
    6: Unclassified
    7: Cart track
    

  • road_type_field (str, optional) – The name of the attribute containing road type information in vector_file. Defaults to 'type'.

  • workers (int, optional) – Number of worker processes to use for parallelization. Defaults to 1. Should not exceed the number of CPUs available.

  • verbose (bool, optional) – Verbose print output. Defaults to False.

Returns

nodes, paths

nodeslist

A list of Node s to be added to the graph.

pathslist

A list of Path s containing the Edge s and Node s to be added to the graph.

Return type

tuple of dict s

cw_geodata.vector_label.graph.linestring_to_edges(linestring, node_gdf)[source]

Collect nodes in a linestring and add them to an edge.

Parameters
  • linestring (shapely.geometry.LineString) – A shapely.geometry.LineString object to extract nodes and edges from.

  • node_series (geopandas.GeoSeries) – A geopandas.GeoSeries containing a shapely.geometry.point.Point for every node to be added to the graph.

Returns

edges – A list of Edge s from linestring.

Return type

list

cw_geodata.vector_label.graph.parallel_linestring_to_path(feature)[source]

Read in a feature line from a fiona-opened shapefile and get the edges.

Parameters

feature (dict) – An item from a fiona.open iterable with the key 'geometry' containing shapely.geometry.line.LineString s or shapely.geometry.line.MultiLineString s.

Returns

  • A list of Path s containing all edges in the LineString or

  • MultiLineString.

Notes

This function depends on node_series and valid_road_types, which are passed by an initializer as items in var_dict.

Mask submodule

cw_geodata.vector_label.mask.boundary_mask(footprint_msk=None, out_file=None, reference_im=None, boundary_width=3, boundary_type='inner', burn_value=255, **kwargs)[source]

Convert a dataframe of geometries to a pixel mask.

Notes

This function requires creation of a footprint mask before it can operate; therefore, if there is no footprint mask already present, it will create one. In that case, additional arguments for footprint_mask() (e.g. df) must be passed.

Parameters
  • footprint_msk (numpy.array, optional) – A filled in footprint mask created using footprint_mask(). If not provided, one will be made by calling footprint_mask() before creating the boundary mask, and the required arguments for that function must be provided as kwargs.

  • out_file (str, optional) – Path to an image file to save the output to. Must be compatible with rasterio.DatasetReader. If provided, a reference_im must be provided (for metadata purposes).

  • reference_im (rasterio.DatasetReader or str, optional) – An image to extract necessary coordinate information from: the affine transformation matrix, the image extent, etc. If provided, affine_obj and shape are ignored

  • boundary_width (int, optional) – The width of the boundary to be created in pixels. Defaults to 3.

  • boundary_type ("inner" or "outer", optional) – Where to draw the boundaries: within the object ("inner") or outside of it ("outer"). Defaults to "inner".

  • burn_value (int, optional) – The value to use for labeling objects in the mask. Defaults to 255 (the max value for uint8 arrays). The mask array will be set to the same dtype as burn_value. Ignored if burn_field is provided.

  • **kwargs (optional) – Additional arguments to pass to footprint_mask() if one needs to be created.

Returns

  • boundary_mask (numpy.array) – A pixel mask with 0s for non-object pixels and the same value as the footprint mask burn_value for the boundaries of each object.

  • Note (This function draws the boundaries within the edge of the object.)

cw_geodata.vector_label.mask.contact_mask(df, out_file=None, reference_im=None, geom_col='geometry', affine_obj=None, shape=(900, 900), out_type='int', contact_spacing=10, burn_value=255)[source]

Create a pixel mask labeling closely juxtaposed objects.

Notes

This function identifies pixels in an image that do not correspond to objects, but fall within contact_spacing of >1 labeled object.

Parameters
  • df (pandas.DataFrame or geopandas.GeoDataFrame) – A pandas.DataFrame or geopandas.GeoDataFrame instance with a column containing geometries (identified by geom_col). If the geometries in df are not in pixel coordinates, then affine or reference_im must be passed to provide the transformation to convert.

  • out_file (str, optional) – Path to an image file to save the output to. Must be compatible with rasterio.DatasetReader. If provided, a reference_im must be provided (for metadata purposes).

  • reference_im (rasterio.DatasetReader or str, optional) – An image to extract necessary coordinate information from: the affine transformation matrix, the image extent, etc. If provided, affine_obj and shape are ignored.

  • geom_col (str, optional) – The column containing geometries in df. Defaults to "geometry".

  • affine_obj (list or affine.Affine, optional) – Affine transformation to use to convert from geo coordinates to pixel space. Only provide this argument if df is a geopandas.GeoDataFrame with coordinates in a georeferenced coordinate space. Ignored if reference_im is provided.

  • shape (tuple, optional) – An (x_size, y_size) tuple defining the pixel extent of the output mask. Ignored if reference_im is provided.

  • out_type ('float' or 'int') –

  • contact_spacing (int or float, optional) – The desired maximum distance between adjacent polygons to be labeled as contact. contact_spacing will be in the same units as df ‘s geometries, not necessarily in pixel units.

  • burn_value (int or float, optional) – The value to use for labeling objects in the mask. Defaults to 255 (the max value for uint8 arrays). The mask array will be set to the same dtype as burn_value.

cw_geodata.vector_label.mask.df_to_px_mask(df, channels=['footprint'], out_file=None, reference_im=None, geom_col='geometry', affine_obj=None, shape=(900, 900), out_type='int', burn_value=255, **kwargs)[source]

Convert a dataframe of geometries to a pixel mask.

Parameters
  • df (pandas.DataFrame or geopandas.GeoDataFrame) – A pandas.DataFrame or geopandas.GeoDataFrame instance with a column containing geometries (identified by geom_col). If the geometries in df are not in pixel coordinates, then affine or reference_im must be passed to provide the transformation to convert.

  • channels (list, optional) –

    The mask channels to generate. There are three values that this can contain:

    • "footprint": Create a full footprint mask, with 0s at pixels

      that don’t fall within geometries and burn_value at pixels that do.

    • "boundary": Create a mask with geometries outlined. Use

      boundary_width to set how thick the boundary will be drawn.

    • "contact": Create a mask with regions between >= 2 closely

      juxtaposed geometries labeled. Use contact_spacing to set the maximum spacing between polygons to be labeled.

    Each channel correspond to its own shape plane in the output.

  • out_file (str, optional) – Path to an image file to save the output to. Must be compatible with rasterio.DatasetReader. If provided, a reference_im must be provided (for metadata purposes).

  • reference_im (rasterio.DatasetReader or str, optional) – An image to extract necessary coordinate information from: the affine transformation matrix, the image extent, etc. If provided, affine_obj and shape are ignored.

  • geom_col (str, optional) – The column containing geometries in df. Defaults to "geometry".

  • affine_obj (list or affine.Affine, optional) – Affine transformation to use to convert from geo coordinates to pixel space. Only provide this argument if df is a geopandas.GeoDataFrame with coordinates in a georeferenced coordinate space. Ignored if reference_im is provided.

  • shape (tuple, optional) – An (x_size, y_size) tuple defining the pixel extent of the output mask. Ignored if reference_im is provided.

  • burn_value (int or float) – The value to use for labeling objects in the mask. Defaults to 255 (the max value for uint8 arrays). The mask array will be set to the same dtype as burn_value.

  • kwargs – Additional arguments to pass to boundary_mask or contact_mask. See those functions for requirements.

Returns

mask – A pixel mask with 0s for non-object pixels and burn_value at object pixels. mask dtype will coincide with burn_value. Shape will be (shape[0], shape[1], len(channels)), with channels ordered per the provided channels list.

Return type

numpy.array

cw_geodata.vector_label.mask.footprint_mask(df, out_file=None, reference_im=None, geom_col='geometry', do_transform=False, affine_obj=None, shape=(900, 900), out_type='int', burn_value=255, burn_field=None)[source]

Convert a dataframe of geometries to a pixel mask.

Parameters
  • df (pandas.DataFrame or geopandas.GeoDataFrame) – A pandas.DataFrame or geopandas.GeoDataFrame instance with a column containing geometries (identified by geom_col). If the geometries in df are not in pixel coordinates, then affine or reference_im must be passed to provide the transformation to convert.

  • out_file (str, optional) – Path to an image file to save the output to. Must be compatible with rasterio.DatasetReader. If provided, a reference_im must be provided (for metadata purposes).

  • reference_im (rasterio.DatasetReader or str, optional) – An image to extract necessary coordinate information from: the affine transformation matrix, the image extent, etc. If provided, affine_obj and shape are ignored.

  • geom_col (str, optional) – The column containing geometries in df. Defaults to "geometry".

  • do_transform (bool, optional) – Should the values in df be transformed from geospatial coordinates to pixel coordinates? Defaults to no (False). If True, either reference_im or affine_obj must be provided as a source for the the required affine transformation matrix.

  • affine_obj (list or affine.Affine, optional) – Affine transformation to use to convert from geo coordinates to pixel space. Only provide this argument if df is a geopandas.GeoDataFrame with coordinates in a georeferenced coordinate space. Ignored if reference_im is provided or if do_transform=False.

  • shape (tuple, optional) – An (x_size, y_size) tuple defining the pixel extent of the output mask. Ignored if reference_im is provided.

  • out_type ('float' or 'int') –

  • burn_value (int or float, optional) – The value to use for labeling objects in the mask. Defaults to 255 (the max value for uint8 arrays). The mask array will be set to the same dtype as burn_value. Ignored if burn_field is provided.

  • burn_field (str, optional) – Name of a column in df that provides values for burn_value for each independent object. If provided, burn_value is ignored.

Returns

mask – A pixel mask with 0s for non-object pixels and burn_value at object pixels. mask dtype will coincide with burn_value.

Return type

numpy.array

Utility functions

Geo utility submodule

cw_geodata.utils.geo.geometries_internal_intersection(polygons)[source]

Get the intersection geometries between all geometries in a set.

Parameters

polygons (list-like) – A list-like containing geometries. These will be placed in a geopandas.GeoSeries object to take advantage of rtree spatial indexing.

Returns

A list of geometric intersections between polygons in polygons, in the same CRS as the input.

Return type

intersect_list

cw_geodata.utils.geo.get_subgraph(G, node_subset)[source]

Create a subgraph from G. Code almost directly copied from osmnx.

Parameters
  • G (networkx.MultiDiGraph) – A graph to be subsetted

  • node_subset (list-like) – The subset of nodes to induce a subgraph of G

Returns

G2 – The subgraph of G that includes node_subset

Return type

networkx.MultiDiGraph

cw_geodata.utils.geo.list_to_affine(xform_mat)[source]

Create an Affine from a list or array-formatted [a, b, d, e, xoff, yoff]

Parameters

xform_mat (list or numpy.array) – A list of values to convert to an affine object.

Returns

aff – An affine transformation object.

Return type

affine.Affine

cw_geodata.utils.geo.split_multi_geometries(gdf, obj_id_col=None, group_col=None, geom_col='geometry')[source]

Split apart MultiPolygon or MultiLineString geometries.

Parameters
  • gdf (geopandas.GeoDataFrame or str) – A geopandas.GeoDataFrame or path to a geojson containing geometries.

  • obj_id_col (str, optional) – If one exists, the name of the column that uniquely identifies each geometry (e.g. the "BuildingId" column in many SpaceNet datasets). This will be tracked so multiple objects don’t get produced with the same ID. Note that object ID column will be renumbered on output. If passed, group_col must also be provided.

  • group_col (str, optional) – A column to identify groups for sequential numbering (for example, 'ImageId' for sequential number of 'BuildingId'). Must be provided if obj_id_col is passed.

  • geom_col (str, optional) – The name of the column in gdf that corresponds to geometry. Defaults to 'geometry'.

Returns

A geopandas.GeoDataFrame that’s identical to the input, except with the multipolygons split into separate rows, and the object ID column renumbered (if one exists).

Return type

geopandas.GeoDataFrame