Genealogy as a Form of Data Analysis

This paper is loosely based on the Wikipedia article entitled “Data Analysis” and the book Mastering Genealogical Proof.

Genealogists use raw data to accumulate and analyze patterns and trends toward establishing a Genealogical Proof. Evidence in the genealogical community is generally understood as pieces of data that are arranged through collection, sifting, and arranging. Evidence, positive or negative, is acquired through examining and modeling data using generally accepted processes. One such process is to use the computer application Evidentia. Other processes enabling the development of evidence are those used in the legal and forensics professions (e.g., DNA analysis).

Each point of data genealogists use is inspected, cleansed, transformed, and modelled. Most serious genealogists use the Genealogical Proof Standard. While this standard is more qualitative than quantitative, the results are the same, actionable information used to formulate decisions.

The Genealogical Proof Standard follows, simply: Formulating a research question, gathering data sources, considering the information in those sources, formulating evidence from that information, and finally constructing a proof statement. The process is generally iterative since there is no such thing as a final statement of proof in genealogy.

While traditional data analysis is generally thought to be quantitative, there is much similarity to the genealogical research process. The steps in data analysis are analogous to the process used by genealogy professionals. Data analysis begins with a research question, followed by compiling source information, and finally, generating actionable conclusions.

Research Question

Sometimes thought of as a hypothesis, the research question is the beginning of both genealogical research and data analysis. Genealogists formulate a question by asking something such as “Who was Joan Jones’ mother?” Data analysts ask, “How is product A better than product B?” The answers come in basically the same way for both.

Data Collection, Processing, and Cleaning

To answer the research question, both genealogists and data analysts collect, process and classify data relevant to the issue. Almost all data is seen as relevant to analysts, but genealogists often go further, collecting source material relevant not only to the issue, but also surrounding the issue. Data analysts, on the other hand, are more focused on the question itself, locating only data relevant to products A and B.

The difference between traditional and genealogical data analysis is that genealogists have much more fuzzy information to deal with. Items like local and regional history books may include data about their question. Such items are generally not relevant to a data analyst focused on a product research project, unless it involves cultural appropriation, i.e., the Korean car makers’ KIA Tucson vehicle. 😊

Exploratory Analysis

Genealogists often explore different sets of data to glean information and evidence relevant to their questions. Similarly, a traditional data analyst will do the same, focusing more on specific items than general items.

Modelling and Algorithms

There are no “real” algorithms for genealogists to apply to their data findings. There is, however, a Genealogical Data Model, which was constructed to help genealogists apply their data to real-world projects. The Genealogical Data Model was originally constructed to be a basis for software, but since it was completed, no software has used the GDM (except for The Master Genealogist, which used large parts of it).

Data Products and Communications

Genealogists use a proof model to present data and their formulation of the evidence they’ve compiled. A traditional data analyst uses a tool such as business intelligence software to present their findings. The only real difference between the two is that they present findings in a different way.

NPM

Crafting a research question

Many good genealogy programs can help you get started crafting a good research question with their to-do features. RootsMagic has a good one, illustrated below.

The questions to ask before adding a new task are:

  • Who are you going to research?
  • What do you want to learn about the person?
  • Where was the person you are researching?
  • When was the person there?

Additionally, you might ask: Why was the person there at that time? This might seem like an existential question, but it is a good idea to add context to your family history.

These five questions get you started on the way to learning more about your ancestor.

The who is simple enough. The what can include any number of items like where/when were they born, when did they immigrate, where did they emigrate to, and who did they live with/marry/divorce, and so on. Where and when are a bit more complex due to the possible lack of information.

For instance, Lydia Peirce Gorton was born on 28 January 1822. I’ve got her birth date but no birthplace. I want to know where she was born, so I ask, “Where were Lydia Peirce Gorton’s parents, Daniel and Lydia (Peirce) Gorton, when Lydia was born in 1822?” The who, what, and when parts of this question are answered, but the best part is still unanswered: “where”?

The records I’ve got so far say different things, that she was born in Massachusetts, born in New York, born in Vermont. Most likely she was born in Vermont, though. I can make this hypothesis because her older brother was born there, and a few original records say so. This leads me to focus my question even more on Vermont records. Massachusetts records are very complete for the time and there is no indication her siblings were born there. New York state records on the other hand, are problematic, so they will have to wait for a while.

In this particular question, I ask why weren’t the parents in the records for Lydia’s potential birthplace? Were they there, just not recorded anywhere? These questions lead me to ask about the area where they may have been, to find out more about possible record sources. I also learn about the culture in that area, why the records may not exist, and what the economic conditions were during that period.

The process of crafting a specific question to be answered is key to great research. Answering the question is done during the research phase of the project. I’ll write more about the research project later this month.

Thoughts?

NPM

5 Steps to Great Research

There are five related steps to take to get good results from your research. We create a specific question to be answered, a research plan using the question, a research log, and a research report. Optionally we create a biographical sketch from information in the research report.

Steps to Create a Research Question

First, we craft a question to answer. Use these four elements: who, where, when, and what, to focus on specific items that you want to learn more about. Being as specific as you can goes a long way toward getting reliable results from your research.

Steps to Create a Research Plan

Next, we examine the research question and gather more information about the subject we are interested in. We find sources relevant to the person, place, and time span involved. Sources such as locality guides, histories, and archives catalogs can provide good results for further searches.

Steps to Create a Research Log

After we have looked into each of the record types in the research plan, we can start actively searching for the best records available to us. We want to focus on relevant records that are likely to answer the research question. Prioritize the research items to gather information from the easiest to the hardest and organize your research plan accordingly.

Steps to Create a Research Report

When we have completely researched the question, we can then create the research report. I am a fan of the write as you cite method. This means when I am researching, I am also drafting parts of the research report. It is not a step back, but it is not a speedy process either. Take time to really look at the records and save time in the long run so you do not have to go back and revisit them.

Steps to Create a Biographical Sketch

The final element of great research is to make a biographical sketch. There are many ways to create a sketch. I have written a few posts about this topic, but one of the recommended ways is to use the NEHGS Register style. Whole books have been written about writing a family history sketch, so I will leave that choice to you.

Working Wednesday: Gigging at Fiverr

I just put up two gigs on Fiverr (see below) and so far, so good.

The vibe is like how GenLighten operated five years ago before it changed to a subscription model and shut down. There you were able to put up an offering for all to view and buy. At Fiverr you do the same in a comparable way.

The experience at Fiverr is much better, though, as you can see more stats about how well your offers are being received. You can also create links for marketing, which you could not easily do at GenLighten.

Right now, the competition in the Genealogy category seems all right. Most entries in the genealogy / family history section are good, and some not so much. I can’t comment on the quality of the deliverables, though, since I’m a seller, not a buyer.

Here are my two current gig listings:
Full-scale genealogy research
Obituary search

I am thinking about adding a third gig, for proofreading, editing, and writing family histories.

Thoughts?

NPM


One Way to Cite U.S. Census Images with Zotero

Using the manuscript item type, enter the Author as Last, First and then in the Archive field the City, County, State and other location information. Following that, enter the Census title in the Location in Archive field.

For example:

Author Last: Mellen

Author First: John

Date: 1752

Archive: Cambridge, Massachusetts, page 269, NARA micropublication M252, roll 20

Location in Archive: 1800 U.S. Census

The first citation will look like:

John Mellen (n.d.), 1800 U.S. Census, Cambridge, Massachusetts, page 269, NARA micropublication M252, roll 20.

The subsequent citations will be simply the surname, unfortunately. You can work around this by copy / pasting more of the citation to differentiate between different persons of the same name in the document. In the above example the result would look like

Mellen, 1800 U.S. Census, Cambridge.

Also, the “(n.d.)” field in the first citation is not necessary and can be removed with the possibility of breaking the link to the main data in Zotero. An alternative is to use the persons birth year in the Date field to differentiate similarly named persons.

NPM