Genealogy as a Form of Data Analysis

This paper is loosely based on the Wikipedia article entitled “Data Analysis” and the book Mastering Genealogical Proof.

Genealogists use raw data to accumulate and analyze patterns and trends toward establishing a Genealogical Proof. Evidence in the genealogical community is generally understood as pieces of data that are arranged through collection, sifting, and arranging. Evidence, positive or negative, is acquired through examining and modeling data using generally accepted processes. One such process is to use the computer application Evidentia. Other processes enabling the development of evidence are those used in the legal and forensics professions (e.g., DNA analysis).

Each point of data genealogists use is inspected, cleansed, transformed, and modelled. Most serious genealogists use the Genealogical Proof Standard. While this standard is more qualitative than quantitative, the results are the same, actionable information used to formulate decisions.

The Genealogical Proof Standard follows, simply: Formulating a research question, gathering data sources, considering the information in those sources, formulating evidence from that information, and finally constructing a proof statement. The process is generally iterative since there is no such thing as a final statement of proof in genealogy.

While traditional data analysis is generally thought to be quantitative, there is much similarity to the genealogical research process. The steps in data analysis are analogous to the process used by genealogy professionals. Data analysis begins with a research question, followed by compiling source information, and finally, generating actionable conclusions.

Research Question

Sometimes thought of as a hypothesis, the research question is the beginning of both genealogical research and data analysis. Genealogists formulate a question by asking something such as “Who was Joan Jones’ mother?” Data analysts ask, “How is product A better than product B?” The answers come in basically the same way for both.

Data Collection, Processing, and Cleaning

To answer the research question, both genealogists and data analysts collect, process and classify data relevant to the issue. Almost all data is seen as relevant to analysts, but genealogists often go further, collecting source material relevant not only to the issue, but also surrounding the issue. Data analysts, on the other hand, are more focused on the question itself, locating only data relevant to products A and B.

The difference between traditional and genealogical data analysis is that genealogists have much more fuzzy information to deal with. Items like local and regional history books may include data about their question. Such items are generally not relevant to a data analyst focused on a product research project, unless it involves cultural appropriation, i.e., the Korean car makers’ KIA Tucson vehicle. 😊

Exploratory Analysis

Genealogists often explore different sets of data to glean information and evidence relevant to their questions. Similarly, a traditional data analyst will do the same, focusing more on specific items than general items.

Modelling and Algorithms

There are no “real” algorithms for genealogists to apply to their data findings. There is, however, a Genealogical Data Model, which was constructed to help genealogists apply their data to real-world projects. The Genealogical Data Model was originally constructed to be a basis for software, but since it was completed, no software has used the GDM (except for The Master Genealogist, which used large parts of it).

Data Products and Communications

Genealogists use a proof model to present data and their formulation of the evidence they’ve compiled. A traditional data analyst uses a tool such as business intelligence software to present their findings. The only real difference between the two is that they present findings in a different way.

NPM

Surname Saturday: Richard Mellen

Looking at the directory entry for Richard Mellen in Robert Charles Anderson’s The Great Migration Directory, I found a reference to Ernest Flagg’s Genealogical Notes on the Founding of New England.

In Flagg’s book I see two pages of information about the first couple of generations of Richard’s family. Much of the material is copied from Thomas Bellows Wyman’s Genealogies and Estates of Charlestown (see volume 2).

What I find interesting is that there is no reference to Simon Mellen, an alleged son of Richard’s. Wyman’s Genealogies include Simon in the entry for Richard, but he was basing his conclusion on the assumption that there was a direct familial relationship between the two. Many online trees contain a connection between the two, but I do not think that there is any factual evidence to say one way or the other. I have covered both families separately and together in separate places with extensive research into each. See Richard Mellen, a 3-Generation Study, and the Simon Mellen genealogy, for further information.

The only reason I included Simon in my coverage of Richard’s family was to make a point about the possibility they were related. My educated guess is still that they are not related as father and son. They may have been brothers or cousins, but we still do not know how.

Thoughts?

NPM

Review: Ancestors and Descendants of Daniel Burbank by H. D. Burbank

Burbank, Henry DeLore. The Ancestors and Descendants: Lieut. Daniel & Mary (Marks) Burbank, Williamstown, Massachusetts. West Jordan, Utah: H. D. Burbank. Privately Printed, 1983.

This 562-page tome covers mostly the descendants of Daniel Burbank, born 4 April 1736, died 27 September 1802, and his wife Mary Marks, born 18 July 1740, died 25 February 1808.

The first four generations from John1, born about 1611, died 3 April 1683, the immigrant ancestor, are mostly covered in Sedgley and tread the same ground. In fact, Mr. Sedgley is credited as the inspiration for the current volume. I’ve also covered most of the information included in my paper of 10 years ago.

The main advantage of this volume is its extensive coverage of the above line from Daniel5 to many present-day descendants. There is an abundance of biographical and local history anecdotes. Many details of the families connected to the main line of descent are also given.

There are no real sources given, although the reader may be able, on examination of the text, figure out where a statement came from. The numbering system used is easy to follow. An every-name index is included, in the same size of type as the body of the book, which makes locating a particular person or family easy.

NPM

Review: Genealogy of the Burbank Family by George Burbank Sedgley

Sedgley, George Burbank. Genealogy of the Burbank Family and the Families of Bray, Wellcome, Sedgley (Sedgeley) and Welch. Farmington, Me.: Printed by the Knowlton & McLeary Co., 1928.

Sedgley goes far deeper into the Burbank fmily than I do in my paper of ten years ago, John Burbank of Rowley, Massachusetts and Some of His Descendants.

His research into the family’s early generations is exceptional, covering original records such as deeds and town papers. Extensive extracts and transcriptions are given as well as discussions of their content. While few explicit sources are listed in Sedgley, he provides enough hints for the reader to start tracking them down in their respective repositories. I, on the other hand, relied more on published sources which are open to error and are possibly less reliable.

One of the primary differences between Sedgley’s book and my paper is that Sedgley accepts as given fact the marriage of Lydia Burbank, who was born on 7 April 1644, to Abraham Foster about 1655. I noted that the whole family group is suspect because of the age differences between the two individuals.

Another difference is the discussion of Mary Burbank’s family in Arundel, Maine. Mary was born about 1733 and married John Fairfield in 1751. Sedgley orders the children in a different manner than as laid out in my listing. I agree with Sedgley’s statement that good records on Arundel families are hard to find. My own family has origins there, so knowing where to look is important.

Overall, I see this genealogy as a good starting point in the research of the Burbank family. It is well written and explains a great number of sources.

NPM

Review: J. Horace Round’s Family Origins and other Studies

 

I’ve been reading Family Origins, by J. Horace Round.[1] It is an interesting book, not only because it discusses an area of genealogy which I’m interested in learning more about. It has an interesting introduction about “historical genealogy,” a subject that has gotten a lot of press recently as “historical biography.”

Mr. Round (1854–1928), an Englishman, was a prolific author of texts on early British genealogies, focusing primarily in Family Origins on those ranging back to Norman times and the conquest of England by William the Conqueror. In this particular introduction, Round describes a “new” school of genealogists who take pride in sourcing their research and citing it. He also discusses the historical bases for genealogical research in England, with passing reference to American genealogy. These discussions pre-date even Donald Lines Jacobus, the premiere American genealogist of the 20th century.

Family Origins dissects, deconstructs, and straightens out various pedigrees going back to Norman and medieval times. Knowledge of archaic Latin and French may be helpful in reading some of the quoted passages, however; but due to Round’s explication of the texts, it may also be unnecessary.

The text goes into some detail on the importance of not only names, but also places. The importance of place in historical genealogy, as Mr. Round practiced it is that one must know the place where the name originated, as it was often taken from the place where the people lived. In the case of the peerages Mr. Round discusses, these places are sometimes in Normandy, part of the France of the time, on which he focuses much of his research.

An example of Mr. Round’s diligence in the study of genealogy is the following quote from page 107:

“It is … of real importance for the critical study of genealogy, to collect and set on record, cases in which evidence has been forged or falsely alleged to exist, for the purpose of affording proof of a wholly fictitious pedigree.”

The statement here quoted pre-dates even E. S. Mills and the Genealogical Proof Standard as goals against which we work. It goes directly to the goals of the Board for Certification of Genealogists’ ethics, which it also pre-dates.

Round seems to take delight in demolishing various pedigrees found in the Burke peerages and their brethren. He also takes on other genealogists’ work and dissects them live, in front of the reader as if he were there discussing them with you. That is the kind of genealogical writing I like at the moment. This book makes good bed-time reading, so the lessons can sink in and be absorbed.

1. Round, J. Horace. Family Origins and other Studies. London, 1930. Reprint Baltimore, Md.: Clearfield Company, 1998.