INF2186: My favourite metadata tool

My favourite metadata tool remains perpetually open in a browser tab at work: Klokan Technologies’ Bounding Box Tool, an easy-to-use utility that generates bounding box coordinates (in other words, the latitude and longitude values that enclose a given space) for given areas on-the-fly. This metadata is important for capturing the spatial extent of items and describing them in a consistent manner. For every paper map I catalogue into original MARC records, I record the extent in the 034 and 255 fields, and for every geospatial dataset in the MDL data inventory, the bounding box is entered into its own field in the metadata record. Given the different storage requirements of these two databases, it is very convenient that users are offered a choice of 12 (!) different syntax encoding schemes for capturing coordinates – I personally use MARC VTLS (which pops the coordinates into the appropriate subfields for quick copying and pasting) and CSV for these respective applications.

While I’ve been using the Bounding Box Tool for several years, it was only in this course that I learned the term “syntax encoding scheme”, and the flexibility that Klokan continues to develop into it makes it a fantastic tool for anyone working with geospatial resources and catalogues.

INF2186: Geospatially Speaking

A few months after I was hired to work at MDL, I started taking undergraduate classes in geographic information systems with the fantastic Don Boyes. My work on the Don Valley Project taught me about one particular subset of tools in ArcGIS, and through Prof. Boyes’ classes, I learned so much about representation, cartography, analysis, and decisionmaking using GIS. I also learned much more about why geospatial metadata is really, really important to those working with GIS, and these lessons are resonant on a daily basis in the reference work I do in the library. Geospatial datasets are frequently packaged with their accompanying metadata, and for very good reasons.

What does geospatial metadata tell us? Like many other metadata schemas that we reviewed in class, the core metadata elements typically tell us who produced the data, when, and for what reason. These identifying elements help us assess the reliability of the data contained therein, and allow us to pursue any questions we may have with the authoring party. They also describe how the data were created or collected, which allows users to evaluate their accuracy and precision for the purposes of analysis or selecting between multiple datasets. For example, a raster elevation dataset compiled from information collected by satellite would be more accurate and precise than the historic factory outlines I digitized from georeferenced maps in my previous post. However, given that no other digital maps of industry in the Lower Don Watershed exist, our datasets remain valuable, as long as the means of their production are understood.

Some additional metadata elements describe particular spatial parameters of datasets. The first of these properties, the dataset’s spatial extent, is typically autogenerated by GIS software – it is used to generate a bounding box that visually describes the distribution and limits of the information contained within it, and can be used to help assess whether a dataset is useful given one’s area of interest. The other two important parameters are critical for the proper use of a dataset: if spatial reference information is missing, it can be quite a challenge to get a layer in alignment with others on a map, as one can be forced to guess. Without definitively knowing the projection and datum of a geospatial dataset, one cannot be confident that the data observed on the map is actually associated with a given place on the ground.

Geospatial datasets also frequently contain attribute information, linking locations on the ground to quantitative or qualitative data about these locations. A good data dictionary contains information about these attributes and the different values that can be expected within them, and is very useful for understanding the contents of a dataset and what can be done with it. This attribute information can be contained within the metadata file itself, or as an accompanying document of text.

There are many, many other characteristics that are described within complete geospatial data metadata files, including licensing (very important when distributing data within an academic library!), update history and frequency, and spatial resolution, to name just a few. Metadata is typically stored according to FGDC or ISO-19115/19139 standards, and many GIS programs contain XML metadata parsers and editors. Knowing how to create, evaluate, and maintain metadata is very useful for anyone working in an environment where geospatial data is being used – it helps develop a culture of trust between people and towards the information they work with, and increases the quality of research and analysis work (while reducing the duplication of efforts) if data can be effectively searched, discovered, and evaluated. It’s also good for us to think critically about the data we produce, peruse, and consume anyways – metadata is of great use to such reflection.

In my subsequent posts, I will discuss what this course helped learn about metadata discovery and interoperability, as there is much I didn’t know about the systems I work with on a daily basis.

INF2186: My First Metadata

I first learned the word “metadata” in 2008, when I was first hired at what is now known as the Map & Data Library at the University of Toronto as a research assistant on the Don Valley Historical Mapping Project. After mastering the basics of georeferencing and digitizing in ArcGIS, I dutifully warped approximately one hundred historic maps of Toronto into place on the digital street grid, traced the course of the Don River and the wharves of the changing waterfront, and labelled six hundred points with the names and addresses of business and factories operating in the late 19th and early 20th centuries. As the project wrapped up one year after I started, we published several thematic digital map layers in two different formats, allowing viewers to overlay the environmental and industrial history of the lower Don Valley on the contemporary city in ArcGIS or Google Earth. However, how and why should a reader of these maps trust the data in them, considering a) very few of the locations mapped still exist in the present, and b) there is sometimes a mismatch between the location of features mapped at different times? (For a greater explanation and visualization of this problem, please see pages 50-52 of Marcel Fortin and Jennifer Bonnell’s chapter “Reinventing the Map Library”, part of their excellent edited volume Historical GIS Research in Canada, available as a free ebook from the University of Calgary Press.)

As it was up to me to communicate the contingencies of this research process and the individual sources themselves to users of these geospatial datasets, I learned that the answer to both of these questions lay in the metadata that would accompany them. Through the metadata creation tool included as part of our GeoNetwork installation, I created ISO 19139-compliant metadata for all layers we created containing information about the rationale of the dataset within the broader scope of the project, credits for those who worked on it, bibliographic details about the accompanying sources, and explanations about what given database entries mean in a historical context. As we worked from cartographic and textual sources to assemble our datasets (some of which conflicted with each other), we could not state that a business definitively existed at certain points in time, only that it appeared on the map at certain points in time. These important definitions and their implications were contained in the metadata.

Since the project wrapped up in 2009, I have worked on several more historical GIS projects at UofT, including a complete mapping of the Los Angeles streetcar network in the 1920s and a growing collection of historical hydrography datasets showing the changing shoreline and lost rivers of Toronto. This first project taught me the importance of geospatial metadata in the context of using GIS for historical research. In my next post, I will discuss how frustrating it can be when working in a GIS without metadata.

 

Works cited

Bonnell, Jennifer, and Marcel Fortin. 2014. Historical GIS Research in Canada. Calgary: University of Calgary Press.

INF2186: Metadata and print design

I started design school in 2004, and while I did quite well across my courses, the work that made me happiest was some of the least glamorous: I preferred the “boring” but highly detailed work of print production over art direction. (Hey, good production designers are hard to find.) Many assignments were submitted to instructors as oversize colour laser prints, which (like in the previous post) required communicating in standard languages and formats with printer shops. One of the assignments I unexpectedly enjoyed the most involved the art direction and production of a 64-page annual report for a corporation or nonprofit organization, transcribing the previous year’s content into a coherent publication. I realized I had a knack for the formatting of the names of staff members and donors, which took up pages and pages towards the end of the publication. In retrospect, coming up with a scheme for the visual hierarchy on these pages was similar to the development of a metadata schema: one must decide which details must appear on the page (for example, name, title, and location) and differentiate them typographically before embarking on lots of data entry into the software, applying the different type styles to each “field” consistently. One must also consider the limits of such fields and how data entered into them impacts the broader design – for example, when envisioning business cards for members of an organization, it is always smart to start with the individual with the longest name or job title, and ensure that the design accommodates their details before moving onto those with less information to convey in the same amount of space.

It’s amusing to reflect on these similarities now that I have the extra-long job title of “Original Cataloguer & Reference Specialist”. The library’s time-tracking software truncates this to “Original Cat”. Mew!

INF2186: Last century

This is the first in a series of posts for my capstone assignment for INF2186: Metadata Schemas and Applications, a graduate course taught by Prof. Lynne Howarth in the Faculty of Information at the University of Toronto. Over the remainder of the semester, I will be blogging about my past and present experiences working with metadata, something I have spent a lot of time thinking about over the last ten years.

My first part-time job in highschool was in a photolab. Despite being an entirely boring photographer, I was a decent photo printer, and spent much of the next seven years working in photo shops in downtown Toronto, until I was ultimately laid off in 2006 thanks to the near total eclipse of the film industry by digital photography. For the first few years I worked both the counter and machines, and was responsible for the intake of thousands and thousands of rolls of film and other photographic orders, with each task deposited into its own paper envelope. Each of these envelopes required a methodical recording of both the necessary instructions and the identifying details of what was dropped off, allowing the company to better search for the submitted original. The metadata schema for photo processing required we record a customer’s name and phone number, the ISO, format, exposure count, and polarity of the medium dropped off, and the size and quantity of prints required. Customers were given a receipt with a unique identifier pertaining to their order at the end of the transaction. I compulsively filled out all of the fields on my envelopes, instead of leaving a note to look at the customers’ other orders for instructions, and later committed keyboard shortcuts, prices, and service SKUs to memory. As a printer, I really appreciated when others attended to filling these forms out. When it came to filling out these envelopes, I didn’t realize my obsessive tendencies would instill good habits in my metadata-filled future…