Cloud Masking: Why a Binary Choice Breaks a Continuous Problem

TL;DR: Clouds and haze form a continuum. Hard thresholds flip pixels off or on and create gaps or contamination. Good cloud masking uses probabilities, cloud type awareness, and geometry. Pair masking with multi-date strategies when you must keep a clear nominal date.

The spectrum problem

Real scenes rarely split into only cloud and no cloud. You will meet thin cirrus, low stratus and fog, broken cumulus fields, stratocumulus decks, bright snow and ice, salt flats, bright sand, sunglint on water, smoke and dust, and adjacency effects near cloud edges. Most modern detectors output a probability per pixel for cloud and shadow. The moment you select a threshold, you inherit a trade-off: coverage versus contamination.

Cloud shadows are another source of ambiguity. They are low signal, not zero. Over dark water and evergreen forests, shadows are easily confused with natural variability unless you account for solar geometry, view angle, terrain, and cloud height.

Cloud types that break simple masks

Deep convective cumulus and cumulonimbus. Bright tops with strong SWIR absorption and sharp texture. Large shadows with long displacement that depends on solar zenith and cloud-top height. Parallax over high clouds can misplace shadow projections if height is wrong.

Stratocumulus decks. Textured sheets with semi-transparent breaks. Edge pixels and subpixel gaps cause adjacency brightening and partial transmission. Conservative dilation removes fringes but erases good pixels. Permissive thresholds leave bright edges that bias reflectance.

Thin cirrus and subvisible cirrus. High altitude ice clouds transmit light and reduce contrast while leaving apparent structure below. On Sentinel-2, the 1.38 µm cirrus band (Band 10) helps, but performance varies with column water vapor and surface altitude. Cirrus can survive strict blue or SWIR tests and still depress indices.

Low stratus and fog. Spectrally similar to bright surfaces, especially over snow, salt pans, and beaches. Thermal bands help on Landsat. Sentinel-2 has no thermal, so reliance shifts to blue, coastal aerosol, SWIR, and texture tests plus terrain context.

Broken cumulus over bright deserts. Bright sand and dry lake beds saturate thresholds. Masks confuse bright land with cloud unless SWIR absorption and texture cues are strong. Dilation near edges can remove valid bright pixels.

Smoke and dust. Aerosol plumes vary from optically thin to opaque. They change blue and red reflectance, reduce contrast, and introduce wavelength-dependent scattering. Smoke plumes over dark water can be mistaken for thin cloud, while dust over bright land can pass as cloud edge.

Snow and ice. High albedo with distinct spectral behavior. NDSI separates snow from cloud, but scene-dependent thresholds and terrain cast shadows that complicate both. In patchy snow the adjacency effect and BRDF changes cause mislabels.

Water with sunglint and foam. Specular reflection and wave foam create bright streaks that trip cloud tests. Glint strength depends on wind, solar geometry, and sensor view angle. Standard deglinting helps, but bright water near cloud edges is still risky.

Cloud shadows in practice

Shadow detection is geometry first. Project each cloud object using solar azimuth and zenith, sensor geometry, and an estimate of cloud-top height. Over mountains you must include a DEM to avoid putting shadows on the wrong slope. Typical errors:

Height underestimation: projects shadows too short and misses the true dark area.
Height overestimation: projects shadows beyond the dark patch and flags valid pixels.
Terrain and parallax: high relief and oblique view angles shift apparent cloud location and elongate shadows unpredictably.

Mitigations include height brackets, search along the solar vector, and spectral checks that prefer low-NIR and low-Green with preserved SWIR texture. On water, look for low NIR with low red and low SWIR to avoid classifying turbid water as shadow.

Bright-surface and look-alike traps

Deserts, beaches, gypsum, salt pans: high reflectance and low texture fool cloud tests. SWIR absorption and edge gradients help.
Urban roofs and concrete: bright, spectrally flat patches resemble optically thick cloud in visible bands.
Snow on conifer forest: mixed pixels toggle between snow and shadow across dates and create flicker in time series.
Glint and whitecaps: glint masks reduce some false positives but do not fix bright water at cloud edges.

Algorithms and what they rely on

Fmask and CFMask (Landsat): spectral tests including thermal, morphological dilation, and shadow projection. Works well with thermal but still scene-dependent.
Sentinel-2 Sen2Cor SCL: scene classification with cloud, cirrus, shadow, snow, and vegetation classes. Performance depends on atmosphere and surface type.
S2Cloudless (probability): gradient boosted or CNN-based probability of cloud derived from multi-band features. Requires a threshold and often some post-processing.

All three use a threshold at some point. The strictness of that threshold controls gaps versus contamination.

Thresholds in practice

Pipelines tend to pick one of three stances:

Conservative threshold: removes thin cloud, edges, and much haze. Quality is high, coverage is low. Time series become gappy, especially in wet seasons.
Permissive threshold: keeps coverage high but admits thin cloud and veiling haze. Indices drop, textures blur, and small changes disappear.
Scene-adaptive threshold: adjusts to illumination, surface type, and atmosphere. Results are better, but tuning must be stable across sensors and seasons. Dilation to remove fringes increases gaps unless carefully limited.

Haze and aerosols are not cloud

Urban haze and pollution. Fine-mode aerosols reduce contrast in blue and green and bias vegetation and water indices downward. Pixels are often still usable if you estimate aerosol optical thickness or use dehazing and harmonization.

Smoke. Spectrally complex with absorbing and non-absorbing phases. Over bright land, smoke can look like thin cloud. Over water, smoke reduces NIR and red but keeps some texture. Mislabeling smoke as cloud creates holes you do not want.

Dust. Coarse-mode scattering increases red and SWIR brightness and changes color ratios. Deserts with dust and thin cloud combined are the hardest cases for fixed thresholds.

The right question is not mask or keep. It is how much weight to give this pixel today, and whether you can supplement it with a nearby clean observation.

Temporal consistency and cross-date clues

Even simple temporal filters help. If a pixel is labeled cloud on a clear day before and after, favor the clear state and downweight the cloudy label. Edge pixels that toggle on and off across dates are a sign to reduce dilation radius or switch to adaptive dilation. Parallax and fast cloud motion cause misalignment; register dates carefully before comparing.

When a binary mask still helps

Binary outputs are useful when you must hide obvious cloud and shadow in viewers, meet regulatory schemas that require a cloud bit, or create seasonal basemaps where long windows absorb gaps. If you publish a binary bit, keep the underlying probabilities available for analysts who need to revisit decisions.

ClearSKY in practice

We are satellite agnostic. We pull from multiple constellations, often hundreds of passes, and let the freshest observations speak louder. SAR carries structure through cloud, optical adds color and indices, and the output is tied to a single date.

For cloud seasons we prioritise same-day observations and then pull from nearby prior days when weather blocks optical coverage. SAR adds structure under cloud and reduces the need for aggressive thresholds. This approach keeps a clear nominal date while avoiding long seasonal windows.

Quick view: cloud type vs masking pitfalls

Cloud or condition	What breaks masks	Typical false positives	Useful cues to fix
Deep cumulus and cumulonimbus	Long displaced shadows, parallax, bright tops	Dark water or forest labeled as shadow	Shadow geometry with height brackets, SWIR absorption tests, DEM-aware projection
Stratocumulus decks	Edge pixels, subpixel gaps, adjacency brightening	Bright sand or urban roofs at edges	Moderate dilation, texture and gradient checks, adaptive thresholds
Thin cirrus	Partial transmission with low contrast	Haze or smoke over water	1.38 µm cirrus band on S2, multi-date consistency, spectral slope tests
Low stratus and fog	Spectral confusion with bright surfaces	Snow, salt flats, beaches	Thermal on Landsat, SWIR and coastal aerosol on S2, terrain context
Smoke and dust	Variable spectra and texture	Thin cloud classification over land and water	Blue-to-SWIR ratios, temporal persistence, plume morphology
Snow and ice	High albedo with complex terrain shadows	Cloud over mountains, cloud edges over snow	NDSI, terrain shadow modeling, BRDF-aware thresholds
Water with sunglint	Specular streaks and whitecaps	Cloud over oceans and lakes	Glint modeling, NIR and SWIR checks, wind and geometry awareness

Quick view: approaches

Approach	Coverage	Risk	Good for
Conservative mask	Low	Few artifacts, many gaps	Strict regulatory pipelines, small areas
Permissive mask	High	Haze and edge contamination	Visual basemaps, rapid screening
Scene-adaptive mask	Medium to high	Tuning drift across scenes	Regional programs, mixed surfaces
Probability plus temporal filter	High	Requires housekeeping	Operational monitoring, change detection

Even with strong algorithms, most hard failures happen at cloud edges, over bright land and glint, and under thin cirrus. Recognising these patterns and treating cloud masking as a probabilistic decision rather than an on-off switch is the difference between a clean time series and a flickering one.