Anomaly Detection and Approximate Similarity Searches of Transients in Real-time Data Streams
Authors
P. D. Aleo, A. W. Engel, G. Narayan, C. R. Angus, K. Malanchev, K. Auchettl, V. F. Baldassare, A. Berres, T. J. L. de Boer, B. M. Boyd, K. C. Chambers, K. W. Davis, N. Esquivel, D. Farias, R. J. Foley, A. Gagliano, C. Gall, H. Gao, S. Gomez, M. Grayling, D. O. Jones, C. -C. Lin, E. A. Magnier, K. S. Mandel, T. Matheson, S. I. Raimundo, V. G. Shah, M. D. Soraisam, K. M. de Soto, S. Vicencio, V. A. Villar, R. J. Wainscoat
Abstract
We present LAISS (Lightcurve Anomaly Identification and Similarity Search), an automated pipeline to detect anomalous astrophysical transients in real-time data streams. We deploy our anomaly detection model on the nightly ZTF Alert Stream via the ANTARES broker, identifying a manageable $\sim$1-5 candidates per night for expert vetting and coordinating follow-up observations. Our method leverages statistical light-curve and contextual host-galaxy features within a random forest classifier, tagging transients of rare classes (spectroscopic anomalies), of uncommon host-galaxy environments (contextual anomalies), and of peculiar or interaction-powered phenomena (behavioral anomalies). Moreover, we demonstrate the power of a low-latency ($\sim$ms) approximate similarity search method to find transient analogs with similar light-curve evolution and host-galaxy environments. We use analogs for data-driven discovery, characterization, (re-)classification, and imputation in retrospective and real-time searches. To date we have identified $\sim$50 previously known and previously missed rare transients from real-time and retrospective searches, including but not limited to: SLSNe, TDEs, SNe IIn, SNe IIb, SNe Ia-CSM, SNe Ia-91bg-like, SNe Ib, SNe Ic, SNe Ic-BL, and M31 novae. Lastly, we report the discovery of 325 total transients, all observed between 2018-2021 and absent from public catalogs ($\sim$1% of all ZTF Astronomical Transient reports to the Transient Name Server through 2021). These methods enable a systematic approach to finding the "needle in the haystack" in large-volume data streams. Because of its integration with the ANTARES broker, LAISS is built to detect exciting transients in Rubin data.
Concepts
The Big Picture
Imagine trying to find a single misprinted page inside every book ever published, while new books arrive by the thousands every night. That’s roughly the challenge facing astronomers searching for rare cosmic explosions in modern sky surveys.
The Zwicky Transient Facility (ZTF) is a wide-field camera in California that scans the entire visible sky every few nights, generating about a million alerts per night. Each alert flags a point of light that changed: something brightened, faded, or flickered where it wasn’t before. Hidden in that flood are supernovae, stars being torn apart by black holes, and phenomena we haven’t yet named.
The problem isn’t finding things that go bump in the night. The problem is finding the right bumps.
A team of researchers built LAISS (Lightcurve Anomaly Identification and Similarity Search), an automated pipeline that sifts through these nightly alerts and delivers a short list of 1 to 5 genuinely unusual objects, ready for telescope follow-up. In astronomy, these short-lived events are called “transients”: objects that appear, change dramatically, and fade over days to months. Most are ordinary. LAISS hunts the ones that aren’t.
Key Insight: LAISS combines machine learning anomaly detection with millisecond-speed similarity search to find rare cosmic transients in live data streams. It has already turned up hundreds of objects that nobody knew existed.
How It Works
LAISS has two main engines. The first is an anomaly detection model, a random forest classifier (many decision trees that vote together) trained to flag transients that look unusual compared to the broad population of known objects. The second is an approximate similarity search that finds the closest known analogs to any flagged event, in milliseconds, across a database of thousands of light curves.

For each transient, LAISS extracts two broad categories of features. Statistical light-curve features (a light curve is a graph of brightness over time) capture the shape of how a source brightens and fades: rise time, peak magnitude, color evolution, variability statistics. Contextual host-galaxy features describe the environment: stellar mass, star formation rate, galaxy morphology. Between the event itself and its surroundings, the system gets a fairly complete picture.
The classifier sorts anomalies into three types:
- Spectroscopic anomalies: rare-class transients whose chemical fingerprints look unusual, such as superluminous supernovae or tidal disruption events
- Contextual anomalies: events in unexpected environments, like a core-collapse supernova in a supposedly “dead” elliptical galaxy
- Behavioral anomalies: objects with peculiar light-curve shapes, including those powered by collisions between ejected material and surrounding gas shells, producing multiple peaks or extended plateaus
Once LAISS flags a candidate, approximate nearest-neighbor algorithms (techniques that trade a small amount of exactness for enormous speed) return a ranked list of historical analog transients. Astronomers get an instant prior on what they’re looking at before any spectrum is taken.

The whole pipeline runs on the nightly ZTF Alert Stream through the ANTARES broker, which receives ZTF’s million-per-night alerts and applies filters before they reach scientists. By focusing on extragalactic transients, the system keeps the false positive rate low enough that only a handful of candidates surface each night.
Why It Matters
The scale problem will only get worse. The Vera C. Rubin Observatory in Chile will survey the entire southern sky every few nights, generating far more transient alerts than ZTF.
Only about 1% of transients will ever receive spectroscopic follow-up, the gold standard for classifying what an explosion actually is. That makes automated systems like LAISS a necessity, not a convenience. Without them, genuinely novel events will slip through unnoticed, filed as ordinary supernovae or simply ignored.
The numbers back this up. Running LAISS on historical ZTF data from 2018 to 2021, the team found 325 transients that had never appeared in any public catalog, roughly 1% of all ZTF reports to the Transient Name Server over that period. They also recovered about 50 previously known rare transients that standard pipelines had missed or misclassified: superluminous supernovae, tidal disruption events, and several subtypes driven by ejecta–circumstellar matter collisions.
The similarity search earns its keep when something truly new shows up. If a transient matches no known class, astronomers can still pull up the closest known objects in milliseconds and use them to plan follow-up observations. That turns an unknown into a starting point.
Bottom Line: LAISS found 325 previously uncatalogued transients and ~50 rare events that other pipelines had missed. It runs every night in real time and is already built for the coming wave of Rubin Observatory data.
IAIFI Research Highlights
LAISS puts machine learning anomaly detection and approximate nearest-neighbor search directly into a live observational astronomy workflow, not as a post-hoc analysis step but as part of nightly operations.
Random forest classifiers and millisecond-latency similarity search can run at scale on messy, heterogeneous scientific data. The filtering is tight enough that only the most promising candidates reach human experts.
By recovering rare transient classes (tidal disruption events, superluminous supernovae, interaction-powered explosions), LAISS expands what astronomers can observe of extreme astrophysical processes: black hole accretion, massive stellar death, and circumstellar matter physics.
LAISS is integrated into the ANTARES broker and built for Rubin Observatory compatibility, so it's ready for the LSST era. The full paper is available at [arXiv:2404.01235](https://arxiv.org/abs/2404.01235).
Original Paper Details
Anomaly Detection and Approximate Similarity Searches of Transients in Real-time Data Streams
2404.01235
P. D. Aleo, A. W. Engel, G. Narayan, C. R. Angus, K. Malanchev, K. Auchettl, V. F. Baldassare, A. Berres, T. J. L. de Boer, B. M. Boyd, K. C. Chambers, K. W. Davis, N. Esquivel, D. Farias, R. J. Foley, A. Gagliano, C. Gall, H. Gao, S. Gomez, M. Grayling, D. O. Jones, C. -C. Lin, E. A. Magnier, K. S. Mandel, T. Matheson, S. I. Raimundo, V. G. Shah, M. D. Soraisam, K. M. de Soto, S. Vicencio, V. A. Villar, R. J. Wainscoat
We present LAISS (Lightcurve Anomaly Identification and Similarity Search), an automated pipeline to detect anomalous astrophysical transients in real-time data streams. We deploy our anomaly detection model on the nightly ZTF Alert Stream via the ANTARES broker, identifying a manageable $\sim$1-5 candidates per night for expert vetting and coordinating follow-up observations. Our method leverages statistical light-curve and contextual host-galaxy features within a random forest classifier, tagging transients of rare classes (spectroscopic anomalies), of uncommon host-galaxy environments (contextual anomalies), and of peculiar or interaction-powered phenomena (behavioral anomalies). Moreover, we demonstrate the power of a low-latency ($\sim$ms) approximate similarity search method to find transient analogs with similar light-curve evolution and host-galaxy environments. We use analogs for data-driven discovery, characterization, (re-)classification, and imputation in retrospective and real-time searches. To date we have identified $\sim$50 previously known and previously missed rare transients from real-time and retrospective searches, including but not limited to: SLSNe, TDEs, SNe IIn, SNe IIb, SNe Ia-CSM, SNe Ia-91bg-like, SNe Ib, SNe Ic, SNe Ic-BL, and M31 novae. Lastly, we report the discovery of 325 total transients, all observed between 2018-2021 and absent from public catalogs ($\sim$1% of all ZTF Astronomical Transient reports to the Transient Name Server through 2021). These methods enable a systematic approach to finding the "needle in the haystack" in large-volume data streams. Because of its integration with the ANTARES broker, LAISS is built to detect exciting transients in Rubin data.