This paper details the organization of a "content profile" of a large longitudinal collection of interactive project prototypes of singular provenance. A content profile aims to analyze and summarize aggregate file metadata associated with a collection to aid in digital preservation strategies. Here, we detail the qualitative and quantitative methods used to organize a profile of a 14TB data set containing around 10.5 million files and 5,000 file extensions. The work extends the use of a content profile toward the historical characterization and interpretation of software development records. Additionally, the work prefigures further challenges associated with historical analysis of large, interdisciplinary data sets.
E. Kaltman, R. Lorelli, A. Larson and E. Wolfe, "Organizing a Content Profile for a Large, Heterogeneous Collection of Interactive Projects," 2021 IEEE International Conference on Big Data (Big Data), 2021, pp. 2231-2239, doi: 10.1109/BigData52589.2021.9671904.