Samantha Chappell
Oct 23, 2019

Visualizing 1.7 Billion Stars in the Galaxy at the Speed of Light

Try HeavyIQ Conversational Analytics on 400 million tweets

Download HEAVY.AI Free, a full-featured version available for use at no cost.


At a data conference on the University of Southern California campus, I came across the OmniSci booth and was immediately struck by a geo heat map of building footprints in New York City. During the following conversation about the OmniSci platform, its real-time visualization and data query abilities, it came up that I previously worked in academic astronomy. OmniSci just so happened to have their hands on Gaia data (space observatory with the goal of creating a map of our Galaxy). From there, the conversation quickly turned into a discussion about collaborating to visualize stellar positions and draw insights from this star data.

Mapping Stellar Positions with Gaia

Located 1.5 million km from the Earth (past the orbit of the Moon), Gaia travels with the Earth, constantly on the other side of the planet from the Sun. While shooting a telescope into space is a challenge, it’s worth it to be able to observe around the clock and not have to worry about a planet’s atmosphere getting in the way.

Gaia’s main mission is determining star positions in the Milky Way with the Gaia galaxy map. Its latest galaxy data release contains 1.7 billion stars, a majority of which have observations regarding velocity and full three-dimensional positions. Fun fact: while 1.7 billion data points is certainly impressive, this only accounts for about 1% of stars in our galaxy. Most of the 100 billion stars are less bright than our Sun or are behind gas and dust, making them too dim to be detected.

The Gaia dataset is more than just stellar positions. 1.3 billion stars have velocities in Right Ascension and Declination (x and y in the plane of the sky) and parallax (proxy for distance, or a z position), 1.4 billion have measured radial velocities (velocity towards/away from the Sun). Gaia has fit radius, temperature, extinction (how much dust is in front of it), and luminosity for over 70 thousand stars. The table below summarizes the dataset completeness:

The Dataset in 7 Features

Here are some useful definitions to keep in mind while reading this blog post:

  • Proper Motion: Velocity in RA and Dec (x and y), in units of degrees over time. Velocities are fit by tracking the positions of stars over time.
  • Radial Velocity: Speed toward/away from the Solar System. Measured by comparing observed stellar spectra (light passed through a prism) to known profiles of stars. The magnitude and direction of the difference between spectra is due to the Doppler Effect (light shifted to longer or shorter wavelengths).
  • Parallax: A proxy for distance. Similar to how our eyes determine distance, where a star is detected changes as the Earth travels through its orbit around the Sun. The greater this angular difference, the shorter the distance.

Note: Due to the specifics of Gaia’s data pipeline, a more involved treatment is required to calculate accurate distances from this dataset. For 3D stellar positions and star data analysis, check back for a future blog post.

  • Luminosity: How much light a star emits, measured in terms of the Sun’s luminosity.
  • Magnitude: Apparent brightness of a star, dependent on a star’s luminosity, distance, and how much gas and dust is between us and the star. This is what is directly observed (luminosity is produced by a fit).
  • Effective Temperature: Characteristic temperature of a star, in units of Kelvin. Effective temperature of the Sun is 5,777 K. Fun fact: the temperature of a star varies across its layers and only the innermost layers are hot enough to undergo fusion, powering the star as a whole.
  • Extinction: Magnitude of gas and dust between us and a star.

Creating Stellar Insights with an Interactive Galaxy Map

With the galaxy visualization dashboard constructed, we can examine the star data in context and draw insights.

Galaxy Rotation

Figure 1

By highlighting the negative/positive radial velocity bins, we can see the rotation of the disk and central bulge of the Galaxy (figure 1). Different radial velocity bins also show co-moving parts of the Galactic disk, showing the effect of spiral arms and the complicated way stars orbit in the Milky Way. Overall stellar motions, while still not well understood, are what supports the Galaxy and its structure. Similarly, different bins in velocity in RA and Dec also demonstrate global motions. The lasso tool can be used to examine velocity trends (figure 2) and thus large scale motion and rotation in the Magellanic Clouds (the satellite galaxies below our Galactic disk). While these dwarf galaxies are small and far less massive, we can still see and categorize organized, global motions that determine the galaxies’ shapes.

Figure 2

Galaxy Evolution

Figure 3

Looking at the dimmest stars (log, base 10, of solar luminosity), we see that these stars are cooler than the Sun (5,777 Kelvin) and are smaller (radii in terms of log solar radius). This shows that dimmer stars in the Galaxy are mostly main sequence stars, like the Sun, but smaller in mass. Stars begin and spend roughly 90% of their lifetime on the main sequence, fusing hydrogen into helium at their core. On the main sequence, everything scales with mass: smaller mass means a cooler temperature, smaller radius, and a dimmer star.

We can also see that in terms of stellar positions, this population of main sequence stars is spread out, not as confined to the disk (figure 3). This demonstrates the population is composed of older stars. Star formation is generally restricted to the disk of the Galaxy. If stars live long enough, they can travel away and their positions and velocities are not determined by the physics of the disk. This phenomena is also seen in the less peaked distributions in RA and Dec velocities. Further study of this population’s motions would produce further insights into stellar dynamics within the Galaxy and the physics that determine that phenomena.

Star Formation

Figure 4

In contrast to the dimmest stars, the most luminous stars are more confined to the disk (where there are also gaps due to gas and dust, things that go hand-in-hand with star formation). Interestingly, while this population does have stars with temperatures greater than that of the Sun, most still have lower temperatures (figure 4). This suggests a population dominated by red giants, stars that have moved based on the main sequence and are larger and cooler due to the inner workings of their fusion.

With the presence of stars hotter than the Sun, we know that this population also has some fraction of main sequence stars that are more massive than the Sun and potentially, some supergiants (red giants but more massive). These more massive stars burn brighter and quicker than the Sun, making them indicators of more recent star formation. A more detailed fit would be required to determine ages, but already we can get insights about the Milky Way’s history and narrow in on populations of interest.

Gas and Stardust

Figure 5

The largest extinction bins show the location of large amounts of gas and dust (figure 5). Gas and dust redden and deflect light, causing stars to appear dimmer. We can see from positions that gas and dust are confined to the Galactic disk. Though gas and dust makes star visualization more difficult, they are a part of and the result of star formation and death. This suggests large amounts of star deaths, and probably large, massive stars in the Galactic disk and bulge. A closer examination of the substructures in the gas/dust distribution (filament like structures) would produce even more insights into the star formation and death history of the

Star and Galaxy Data Analysis

Due to its size and range, Gaia’s dataset is both incredibly important and challenging to examine. While traditional galaxy data analysis tools may take light years to interact with, by utilizing OmniSci and the computational power of GPUs, we are able to handle both the scale of galaxy data and zoom in on details and subpopulations of interest with ease. Immerse, when used as galaxy map software, allows us to gain insights and point the way for future work into the motion, composition, and history of the Milky Way, the Galaxy we call home. In my next blog post, I’ll take a deeper dive into stellar positions and velocities by using a clustering algorithm to analyze stellar dynamics and Galactic physics.

Samantha Chappell

Samantha Chappell is a data scientist with a background in astronomy (MS from UCLA). She believes everything is data and has worked with a range of datasets, from stars, to parking tickets, and scraping perfume websites. She has experience with data mining, Bayesian statistics, and hypothesis testing. And she is the author of multiple publications, including in Science.