Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -486,6 +486,7 @@ Same-CRS tiles skip reprojection entirely and are placed by direct coordinate al
| [Gradient](xrspatial/morphology.py) | Dilation minus erosion (edge detection) | Standard (morphology) | ✅️ | ✅️ | ✅️ | ✅️ |
| [White Top-hat](xrspatial/morphology.py) | Original minus opening (isolate bright features) | Standard (morphology) | ✅️ | ✅️ | ✅️ | ✅️ |
| [Black Top-hat](xrspatial/morphology.py) | Closing minus original (isolate dark features) | Standard (morphology) | ✅️ | ✅️ | ✅️ | ✅️ |
| [Sieve](xrspatial/sieve.py) | Remove small connected clumps from classified rasters | GDAL sieve | ✅️ | ✅️ | 🔄 | 🔄 |

-------

Expand Down
7 changes: 7 additions & 0 deletions docs/source/reference/zonal.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,13 @@ Regions

xrspatial.zonal.regions

Sieve
=====
.. autosummary::
:toctree: _autosummary

xrspatial.sieve.sieve

Trim
====
.. autosummary::
Expand Down
261 changes: 261 additions & 0 deletions examples/user_guide/48_Sieve_Filter.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,261 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sieve Filter: Removing Small Raster Clumps\n",
"\n",
"Classification outputs often contain salt-and-pepper noise: tiny clumps of 1-3 pixels that don't represent real features. The `sieve` function removes these by replacing connected regions smaller than a given threshold with the value of their largest spatial neighbor.\n",
"\n",
"This is the xarray-spatial equivalent of GDAL's `gdal_sieve.py`, and it pairs naturally with classification functions like `natural_breaks()` or `reclassify()` and with `polygonize()` for cleaning results before vectorization."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import xarray as xr\n",
"import matplotlib.pyplot as plt\n",
"from matplotlib.colors import ListedColormap\n",
"\n",
"from xrspatial.sieve import sieve\n",
"from xrspatial.classify import natural_breaks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generate a Noisy Classified Raster\n",
"\n",
"We'll create a synthetic classified raster with three land-cover classes and scatter some salt-and-pepper noise across it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(42)\n",
"rows, cols = 80, 100\n",
"\n",
"# Build a base classification with three broad zones\n",
"base = np.ones((rows, cols), dtype=np.float64)\n",
"base[:, 40:70] = 2.0\n",
"base[30:60, :] = 3.0\n",
"base[30:60, 40:70] = 2.0\n",
"\n",
"# Add salt-and-pepper noise: randomly flip ~8% of pixels\n",
"noise_mask = np.random.random((rows, cols)) < 0.08\n",
"noise_vals = np.random.choice([1.0, 2.0, 3.0], size=(rows, cols))\n",
"noisy = base.copy()\n",
"noisy[noise_mask] = noise_vals[noise_mask]\n",
"\n",
"# Sprinkle some NaN (nodata) pixels\n",
"noisy[0:3, 0:3] = np.nan\n",
"noisy[77:, 97:] = np.nan\n",
"\n",
"raster = xr.DataArray(noisy, dims=['y', 'x'], name='landcover')\n",
"print(f'Raster shape: {raster.shape}')\n",
"print(f'Unique values (excl. NaN): {np.unique(raster.values[~np.isnan(raster.values)])}')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cmap = ListedColormap(['#2ecc71', '#3498db', '#e74c3c'])\n",
"\n",
"fig, ax = plt.subplots(figsize=(8, 5))\n",
"im = ax.imshow(raster.values, cmap=cmap, vmin=0.5, vmax=3.5, interpolation='nearest')\n",
"ax.set_title('Noisy classified raster')\n",
"cbar = fig.colorbar(im, ax=ax, ticks=[1, 2, 3])\n",
"cbar.ax.set_yticklabels(['Class 1', 'Class 2', 'Class 3'])\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Sieve: Remove Single-Pixel Noise\n",
"\n",
"The simplest use case: set a threshold so isolated pixels are absorbed by their surroundings."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sieved = sieve(raster, threshold=4)\n",
"\n",
"fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n",
"for ax, data, title in zip(axes, [raster, sieved], ['Before sieve', 'After sieve (threshold=4)']):\n",
" im = ax.imshow(data.values, cmap=cmap, vmin=0.5, vmax=3.5, interpolation='nearest')\n",
" ax.set_title(title)\n",
"fig.colorbar(im, ax=axes, ticks=[1, 2, 3], shrink=0.8)\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Connectivity: 4 vs 8\n",
"\n",
"With 4-connectivity (rook), only pixels sharing an edge are considered connected. With 8-connectivity (queen), diagonally adjacent pixels also form part of the same region. This affects which clumps are identified as \"small.\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sieved_4 = sieve(raster, threshold=6, neighborhood=4)\n",
"sieved_8 = sieve(raster, threshold=6, neighborhood=8)\n",
"\n",
"fig, axes = plt.subplots(1, 3, figsize=(18, 5))\n",
"for ax, data, title in zip(\n",
" axes,\n",
" [raster, sieved_4, sieved_8],\n",
" ['Original', '4-connectivity (threshold=6)', '8-connectivity (threshold=6)'],\n",
"):\n",
" im = ax.imshow(data.values, cmap=cmap, vmin=0.5, vmax=3.5, interpolation='nearest')\n",
" ax.set_title(title)\n",
"fig.colorbar(im, ax=axes, ticks=[1, 2, 3], shrink=0.8)\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Selective Sieving with `skip_values`\n",
"\n",
"Sometimes certain class values should never be removed, even if their regions are small. Use `skip_values` to protect specific categories from merging while still allowing other small regions to be cleaned up."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Protect class 3 from sieving\n",
"sieved_skip = sieve(raster, threshold=10, skip_values=[3.0])\n",
"sieved_noskip = sieve(raster, threshold=10)\n",
"\n",
"fig, axes = plt.subplots(1, 3, figsize=(18, 5))\n",
"for ax, data, title in zip(\n",
" axes,\n",
" [raster, sieved_noskip, sieved_skip],\n",
" ['Original', 'threshold=10 (no skip)', 'threshold=10 (skip class 3)'],\n",
"):\n",
" im = ax.imshow(data.values, cmap=cmap, vmin=0.5, vmax=3.5, interpolation='nearest')\n",
" ax.set_title(title)\n",
"fig.colorbar(im, ax=axes, ticks=[1, 2, 3], shrink=0.8)\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Practical Example: Clean Up a Classification\n",
"\n",
"Generate a continuous surface, classify it with `natural_breaks`, and then sieve the result to remove small artifacts."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a smooth surface with some high-frequency variation\n",
"y = np.linspace(0, 4 * np.pi, rows)\n",
"x = np.linspace(0, 4 * np.pi, cols)\n",
"Y, X = np.meshgrid(y, x, indexing='ij')\n",
"surface = np.sin(Y) * np.cos(X) + 0.4 * np.random.randn(rows, cols)\n",
"\n",
"surface_da = xr.DataArray(surface, dims=['y', 'x'])\n",
"classified = natural_breaks(surface_da, k=5)\n",
"\n",
"# Sieve the classification\n",
"cleaned = sieve(classified, threshold=8)\n",
"\n",
"fig, axes = plt.subplots(1, 3, figsize=(18, 5))\n",
"axes[0].imshow(surface, cmap='terrain', interpolation='nearest')\n",
"axes[0].set_title('Continuous surface')\n",
"axes[1].imshow(classified.values, cmap='tab10', interpolation='nearest')\n",
"axes[1].set_title('natural_breaks (k=5)')\n",
"axes[2].imshow(cleaned.values, cmap='tab10', interpolation='nearest')\n",
"axes[2].set_title('After sieve (threshold=8)')\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Threshold Selection\n",
"\n",
"The right threshold depends on pixel resolution and the minimum feature size you care about. Here's a comparison across threshold values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"thresholds = [2, 5, 15, 50]\n",
"fig, axes = plt.subplots(1, len(thresholds), figsize=(5 * len(thresholds), 5))\n",
"\n",
"for ax, t in zip(axes, thresholds):\n",
" result = sieve(classified, threshold=t)\n",
" ax.imshow(result.values, cmap='tab10', interpolation='nearest')\n",
" ax.set_title(f'threshold={t}')\n",
"\n",
"plt.suptitle('Effect of sieve threshold on classified raster', y=1.02)\n",
"plt.tight_layout()\n",
"plt.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbformat_minor": 4,
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
1 change: 1 addition & 0 deletions xrspatial/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@
from xrspatial.hydro import stream_link_d8, stream_link_dinf, stream_link_mfd # noqa
from xrspatial.hydro import stream_order # noqa: unified wrapper
from xrspatial.hydro import stream_order_d8, stream_order_dinf, stream_order_mfd # noqa
from xrspatial.sieve import sieve # noqa
from xrspatial.sky_view_factor import sky_view_factor # noqa
from xrspatial.slope import slope # noqa
from xrspatial.surface_distance import surface_allocation # noqa
Expand Down
Loading
Loading