Google Summer of Code : OpenAstronomy
This summer, I was delighted to have the opportunity to work with the Astropy community, a sub-organisation under the umbrella organisation, OpenAstronomy. The project that I aimed to tackle was originally titled “Seamless Combination of SkyCoord, Table, WCS, and FITS”. The goal was to reinforce the foundation of Astropy as a single cohesive package and not just a collection of functionally independent modules; this being possible by achieving seamless interoperability between the three special and powerful constituents of Astropy: coordinates, time and units, and the underlying Astropy Table writers. This mostly zeroed in on working with the
~astropy.io.fits Table writer: the FITS standard being rigorously detailed and intricate while simultaneously providing immense flexibility to the astronomical community.
I managed to solve #3685, which dealt with storing
~astropy.units.Quantity as a normal column to FITS, prior to being selected for GSoC. This was a fairly simple task and it allowed me to gain a little insight into the actual work for the summer and a better understanding of the Astropy codebase.
One important thing to notice regarding the venerable issue #3000 is that it was opened 3 years ago, around the time when the FITS WCS Paper IV “Representations of Time Coordinates in FITS” was published. This long-standing issue has been a major concern to the community for the past 3 years, since solving it requires a detailed understanding of :
- The FITS Standard
- World Coordinate System (WCS)
- Astronomical Time (SOFA)
- FITS WCS Papers, particularly Paper IV, FITS Time Standard
~astropy.time.TimeClass and its mapping to the FITS standard
Thus, most of my efforts this summer were to solve this issue, and I feel extremely happy saying that I successfully did so. Moreover, with the help of my mentors and the community, I managed to take this project a step further for which I received an overwhelmingly positive response.
My mentors and I had decided during the proposal writing time that solving #5626 would be a secondary task and will be done only if time permits. As predicted, #3000 itself took most of the summer and hence #5626 will be solved later on.
Project Details and Code
Time as a dimension in astronomical data presents challenges in its representation in FITS files. The standard has therefore been extended to describe rigorously the time coordinate in the
World Coordinate System framework.
Time is intrinsically a coordinate (space-time as (x, y, z, t)) and hence the FITS time paper formulates the representation of the time axis, or possibly multiple time axes, into the FITS World Coordinate System (WCS).
astropy.time.Time is the Astropy class that represents Astronomical Time as:
>>> t = Time([“1999-01-01T00:00:00”], format=‘isot’, scale=‘utc’, location=(1, 2, 3))
This involves the following object attributes (metadata), listed in decreasing order of importance :
- Precision (for string formats)
The data components comprise of:
- Time values in the specified format or (jd1, jd2)
The FITS standard is rather long and detailed (22 journal pages), and does not precisely map to the Astropy
Time object (though mostly it is a reasonable match). Abiding by the rules set by the FITS standard requires mapping of these data components and object attributes to the appropriate FITS table columns and keywords. Thus, a well defined protocol has been developed to allow the storage of Time columns in FITS, while allowing the object to “round-trip” through the file with no loss of data or attributes. Allowing
Time columns to be written as time coordinate columns in FITS tables thus involves storing time values in a way that ensures retention of precision.
For further details, refer to Astropy Native Objects. This is the documentation that I have written, in order to guide the user regarding the new feature.
In general, a “mixin” column may contain multiple data components as well as object attributes beyond the standard Column attributes like
The following are the pull requests that I worked on during the summer :
- Allow Time to be written as a normal column to FITS files, that round-trips on reading
- Expanding FITS column to accept coordinate type keywords
- Using “serialize_method“ instead of astropy_native for writing
I would specially like to point out that the PR #6176 (listed first in the above list) managed to surpass all records set in Astropy till date. With a whopping number of 315 comments on this PR, it is now the highest commented PR in Astropy (GitHub slowed down drastically due to this, and I finally got a chance to meet the angry unicorn). This PR was a crucial part of this project and most of my time during the summer was invested in making this PR a beauty. The FITS time standard was devised in order to describe time coordinates in an unambiguous and complete manner. However, the complexity of this standard proves to be an impediment for it to be consistently adopted by the entire community. This PR is an attempt towards achieving this consistency so that time can be stored and interpreted with absolute certainty. This feature frees the users from worrying about data representation so that they can focus on their specific work. A detailed explanation (accompanied by intermittent frustration) can be found here.
The second listed PR #6359 was also pretty gruelling because here I had to tackle the issue of understanding
~astropy.io.fits, which is an extremely intricate, beautiful yet abstruse module almost a decade old (originally as PyFITS). This difficulty was also faced while dealing with #6176. Hands down, the people who wrote it have an extreme level of patience.
The last PR affects dev and was necessary to provide the right read/write defaults for the user. The discussion for the same is in the issue #6427.
- Allow Quantity to be written as a normal column to a FITS file
- Allow Quantity to be written as a normal column to VOTable
- TimeFITS changes to handle corner cases
- Extending the add_column(s) API by adding a “name(s)“ parameter #5911
- Allow FITS tables with time columns (not written by Astropy) to be read by io.fits
- HIERARCH ASTROPY convention to store “FORMAT“ of TIME columns
- Addition of “TCAPF” time column keyword to handle astropy formats; enabling 100% round-tripping
Out of these, the first is an extension to the issue #3000 which dealt with saving Time to FITS and reading it back. This PR deals with reading arbitrary FITS files conforming to the FITS Time Standard (which involves various aspects). Thus, it is now possible to read time coordinate columns from a large chunk of astronomical datasets (Chandra, XMM, HST and more). This is an important contribution to Astropy and also first of its kind, as such a generalised read feature covering a large subset of the astronomical datasets is not supported by any other software (that we know of). It is ready for a final review and will soon be merged.
The other two PRs are actually conflicting ones. We need to make a decision as to whether to include a new Astropy FITS keyword to store the format attribute of Time (since it has no mapping) which would require us to add it into the FITS registry (so that it won’t be used for anything else) or to use the HIERARCH convention to create a separate Astropy namespace for Astropy specific metadata. This would require the agreement of the entire community and hence will probably take time to merge in.
We sought help from quite a lot of people to come up with the best possible solution for each issue and hence have taken the astronomical community’s interest into mind while working towards this project.
FITS Time Standard
Time on all scales and precisions known in astronomical datasets is to be described in an unambiguous, complete, and self-consistent manner. This is achieved by this standard. For more details and fun facts refer here. Also, I have extensively described the FITS Time Standard and its details within the Astropy documentation FITS table with time columns.
World Coordinate System
For this you can directly refer to FITS WCS Paper.
The successful completion of the intended work has opened up new avenues to extend my work. I will be working on them after the GSoC period ends and I hope to contribute more to this community.
A few of these include:
- Use GreenBank convention to store vector location
- Make use of TCRVL (reference value) and TCDLT (delta time) for calculation of Time coordinate values
- Simplify reading of time coordinate keywords: avoid repetitive keyword checks
- Store “Mixins” using ECSV technique
- Extend this to SkyCoord and EarthLocation; these don’t have a precise standard
It is overwhelming for me to try and write the perfect token of appreciation for my mentors, because words cannot express my heartfelt gratitude and respect for them. I have been showered with warmth and appreciation this entire summer and it is with a heavy heart that I say now that it is about to end.
The most rewarding part of this journey? That would definitely be the fact that Tom, Moritz and Marten will be much more than just mentors for me. To know such amazing people, learn from them, laugh with them, make them hear my jibber-jabber (I tend to do that :\) has been more than an honour. I hope we stay in touch and get to work again soon.