Home >> Madeline 2.0 Documentation

Madeline 2.0 Pedigree Drawing Engine Documentation

The Madeline 2.0 Pedigree Drawing Engine is a pedigree drawing program designed to handle large and complex pedigrees with an emphasis on readability and aesthetics. The program reads input files specified on the command line and generates pedigree drawings without user interaction. Pedigree output in scalable vector graphics (SVG) format can be viewed in browsers with native SVG rendering support such as Firefox 1.5+ and Opera 9.0+, or in vector graphics editors such as Inkscape.

How the program works

Like its predecessor Madeline 0.935, Madeline 2.0 uses a recursive algorithm to draw nuclear families. However unlike its predecessor, the new program is much better at handling complex pedigrees with multiple descent trees, individuals with multiple mates, or pedigrees with consanguinous matings.

For complex pedigrees we use a hybrid algorithm in which consanguinous loops (CLs) are drawn as cyclic graphs whenever possible. We resort to acyclic graphs when matings can no longer be connected without line crossings. We apply a similar approach to avoid line crossings in matings between far-flung descendants of different founding groups (DFGs). In both cases, we reorder siblings within nuclear families so that mated individuals are as close to their respective CL or DFG spouses as possible. This approach ensures that within any given nuclear family, up to two reordered CL or DFG-mated siblings — the leftmost and the rightmost respectively — can have direct non-crossing lines connecting to their spouses. Spouses in remaining CL or DFG matings are drawn using dashed icons at additional locations on the graph as required to avoid crossing lines. Multiple mates and twin groups can constrain sibling reordering, preventing the drawing of non-crossing lines connecting spouses. Small circled numbers can be displayed to show birth order in reordered sibships.

Madeline 2.0 Pedigree Drawing Engine vs. Madeline 0.935

The Madeline 2.0 Pedigree Drawing Engine is currently a non-interactive program that is executed from a shell environment. It produces pedigree drawing output in SVG format without user interaction. A number of flags can be passed to the program on the command line in order to customize the output.

Madeline 0.935 features an interpreter and command language which allows the program to be used either interactively or in batch mode. The program has functionality for converting data between formats, querying pedigree data sets, and also drawing pedigrees. However, the pedigree drawing capabilities in Madeline 0.935 are limited.

For the present time, you may wish to use one version or the other depending on your specific need.

We intend to expand the capabilities of Madeline 2.0 in the future by adding an interpreter and command language similar that used in Madeline v. 0.935. Eventually Madeline 2.0 will become much more powerful than Madeline 0.935.

Because we believe that the drawing tool is a unique and useful tool, we have decided to provide this foundational release to the research community before completing all the enhancements planned. We hope you enjoy our work!

Command Line Arguments

The the program is executed from a shell prompt by typing madeline2. General usage is:

madeline2 [option]... [file]...

The program accepts both short-hand flags prefixed by a single dash, such as "-c", and longer flags prefixed by double dashes, such as "--color".

Currently available options are as follows:

-b --bw

This is an override flag. By default, a pedigree with 2 or more "Affected... columns will be printed in color because color shading provides better contrast when an icon is split into pie-shaped sections. Use this flag to force such pedigrees to be printed in black and white instead.

-c --color

This is an override flag. By default, a pedigree with a single "Affected column will be printed in black and white because this is typically what is used in publications and printed media. Use this flag to force such pedigrees to be printed in color instead.

-d --debug

Print run-time progress messages. This is primarily of interest to developers. The set of messages printed may vary from release to release (and maybe even on different days of the week if you check the code out of the SVN repository!).

-e --embedded

Produce an XML file that can be embedded in another XML document. Use this flag when creating web-based services consisting of compound XHTML+SVG documents.

-f --font

Specifies the font to be used for the display of pedigree labels on the drawing. The default is "sans".

This flag may be especially useful for Unicode text consisting of labels with extended Latin or non-Latin text. At the time of this writing, the Firefox browser in particular does not handle font substitutions correctly, which can result in the display of square boxes when a font is missing glyphs.

-z --font-size

Set the font size (in points) to be used for the display of labels on drawings.

-h --help

Prints brief help documentation and then exits.

-l --labels

Specifies the path to a file containing a list of labels to be displayed on the pedigree drawing.

-L --Labels

Specify labels to be displayed on the pedigree using a single string containing space-delimited labels, i.e.:

madeline2 -L "IndividualID DOB D7S1204 D7S889" fam012-chr07.data

Be sure to enclose the labels string in single or double quotes as shown above.

-n --noiconlabels

Specifies that affection status levels (codes) will not be printed on the icons.

The default is to print the code representing the affection level. We believe this is the right choice for lab work. However, it is not the usual choice for publication-ready drawings where only two levels (affected and unaffected) are to be presented.

-N --nolabeltruncation

This option prevents truncation of wide labels on pedigree drawings.

Normally, wide labels are truncated and shown using ellipsis with "..." at the end. With this option, wide labels are shown in their entirety. Warning: This may result in overlapping labels which are unsuitable in drawings intended for publication, but can nevertheless be quite useful on pedigree drawings that are used internally as part of a lab's workflow process.

-o --outputprefix

Specifies the output prefix to be prepended to file names. Output file names are based on the FamilyIds present in the pedigree table.

-s --sort

Specifies a field in the pedigree table to use for sorting siblings.

By default, siblings are sorted by date of birth, DOB if that field is present. When DOB is not present, siblings within sibships are sorted on IndividualId instead. However, you can change this behaviour by specifying a column of your choice.

-v --version

Prints the version of the program and exits.

Core Fields

For pedigree construction, Madeline minimally requires the following fields of information:

FamilyId
IndividualId
Gender
Father
Mother

Optional "Core" Fields

These fields are optional, but are often thought of as belonging to the "core" set:

Affected
Sampled
DZTwin
MZTwin
DOB
Proband
Deceased
Consultand
Carrier

Additional Optional "Affected" Fields

With Madeline 2.0, you can easily display more than one "Affected..." column. Every column beginning with the word "Affected" is automatically treated as an affection status column. Each icon is then divided into pie-shaped sections and shaded accordingly. The number of "Affected..." columns determines the number of pie-shaped sections. Typical column names might include:

Affected_Glaucoma
Affected_NPS
Affected_Heart

Other Optional Fields

A pedigree data table may contain as many columns as you want. In most cases, Madeline 2.0 determines the type of data in each column automatically. Columns may be either phenotype columns or genotype columns. Here are some samples:

BMI
AgeDX
D12S1234
D12S779
D12S834

Pedigree File Formats

Madeline 2.0 handles flat files as well as a wide array of common XML-based file formats. In all of these files, data must be encoded using the Unicode UTF-8 transformation format, of which ASCII is a subset. Note that UTF-8 is the standard encoding format for XML and that Madeline will handle any of the scripts and symbol blocks present in the Unicode Standard.

File formats are described below:

Madeline Flat File Format

This is basically the same as in the old version of Madeline, except that now the default column names have been modified in some cases to be more meaningful.

There are two parts to the data file: a header containing the column labels, and a rectangular data body containing the data.

Column labels are simply separated by white space or lines at the top of the file.

At least one blank line must separate the header from the data body.

The data body must be byte rectangular. This just means that each line must be exactly the same number of bytes in length.

Individual columns are simply separated by one or more white spaces.

Here's an example:

FamilyID
IndividualID
Gender
Father  
Mother  
MZTwin  
DZTwin 
Affected
Sampled
Proband
Deceased
DXAge
DOB
D17S1301
D17S1304
D17S674

E0078  S00646  F  U0078A  U0078B  .  .  A  Y  Y  .  57     1948-07-04  149/153  154/162  116/116 
E0078  S00675  M  U0078A  U0078B  .  .  A  Y  .  .  60-62  1941-09     149/149  154/162  116/122 
E0078  S00795  F  U0078C  S00646  .  .  I  Y  .  .  .      1972-02-29  153/153  154/158  116/122 
E0078  U0078A  M  .       .       .  .  I  N  .  Y  .      1914        .        .        .       
E0078  U0078B  F  .       .       .  .  I  N  .  .  .      1923-03-15  .        .        .       
E0078  U0078C  M  .       .       .  .  U  N  .  .  .      1947-11-21  .        .        .

The Madeline flat file format is a very convenient choice for reasonably small files, for test files, and for data sets that contain only ASCII characters. For larger data sets and for data sets which contain non-ASCII Unicode characters, one of the supported XML formats described later is preferrable.

Column Type Qualifiers

Column labels may be followed by an optional one-letter column-type qualifier. Older versions of Madeline often required column type qualifiers. If you need to maintain compatibility with older versions of Madeline --such as v. 0.935 or v. 0.936-- retaining column type qualifiers certainly does no harm. However in Madeline 2+ column type qualifiers are rarely required as the program normally identifies column types automatically. One exception where column type identifiers are required occurs when paired allele columns occur in place of genotype columns. This case is treated below.

Paired Allele Columns

The single letter "A" column type qualifier is used to indicate paired allele columns. Paired allele columns must always exist as identically-named pairs. The paired allele columns are then automatically combined into single genotype columns when the data file is read by Madeline. Below the same data reproduced earlier is now shown in a data file with six allele columns instead of three genotype columns. Note the use of column type qualifiers in the header:

FamilyID
IndividualID
Gender
Father  
Mother  
MZTwin  
DZTwin 
Affected
Sampled
Proband
Deceased
DXAge
DOB
D17S1301 A
D17S1301 A
D17S1304 A
D17S1304 A
D17S674 A
D17S674 A

E0078  S00646  F  U0078A  U0078B  .  .  A  Y  Y  .  57     1948-07-04  149 153  154 162  116 116 
E0078  S00675  M  U0078A  U0078B  .  .  A  Y  .  .  60-62  1941-09     149 149  154 162  116 122 
E0078  S00795  F  U0078C  S00646  .  .  I  Y  .  .  .      1972-02-29  153 153  154 158  116 122 
E0078  U0078A  M  .       .       .  .  I  N  .  Y  .      1914        .   .    .   .    .   .   
E0078  U0078B  F  .       .       .  .  I  N  .  .  .      1923-03-15  .   .    .   .    .   .   
E0078  U0078C  M  .       .       .  .  U  N  .  .  .      1947-11-21  .   .    .   .    .   .

Paired allele columns occur in the Linkage ( http://linkage.rockefeller.edu/soft/linkage/) file format which is also used by the Genehunter (http://linkage.rockefeller.edu/soft/gh/) program. In order to read data from these legacy file formats easily, Madeline 2+ has the ability to read paired allele columns. Duplicate column names are not allowed in any other context.

Tab Delimited Format

In a tab-delimited file, the very first row is assumed to contain the column labels (with optional column type identifiers). Here is an example of data formatted into a tab-delimited file. "\t" is used to represent the tab character:

FamilyID\tIndividualID\tGender\tFather\tMother\tMZTwin\tDZTwin\tAffected\tSampled\tProband\tDeceased
\tDXAge\tDOB\tD17S1301\tD17S1304\tD17S674
E0078\tS00646\tF\tU0078A\tU0078B\t.\t.\tA\tY\tY\t.\t57\t1948-07-04\t149/153\t154/162\t116/116
E0078\tS00675\tM\tU0078A\tU0078B\t.\t.\tA\tY\t.\t.\t60-62\t1941-09\t149/149\t154/162\t116/122
E0078\tS00795\tF\tU0078C\tS00646\t.\t.\tI\tY\t.\t.\t.\t1972-02-29\t153/153\t154/158\t116/122
E0078\tU0078A\tM\t.\t.\t.\t.\tI\tN\t.\tY\t.\t1914\t.\t.\t.\t
E0078\tU0078B\tF\t.\t.\t.\t.\tI\tN\t.\t.\t.\t1923-03-15\t.\t.\t.\t
E0078\tU0078C\tM\t.\t.\t.\t.\tU\tN\t.\t.\t.\t1947-11-21\t.\t.\t.\t

Currently only tabs are supported as the delimiter character. Although hardly a “human readable” format, the tab-delimited format can be useful for importing data into Madeline from other software.

Madeline-XML Format

This is just one of several XML file formats supported by Madeline. The first line of a Madeline XML format file should contain an XML header:

<?xml version="1.0" standalone="no"?>

A Madeline-XML file then begins with the following tag:

<madeline-xml>

Tables are indicated with the table tag:

<table>

Tables are organized by row and rows are indicated by row tags:

<row>

Data elements are indicated by the single-letter “d” tags:

<d>

Column labels must appear on the first row of the table. The following is an excerpt of a Madeline XML table:

<?xml version="1.0" standalone="no"?>
<madeline-xml>
     <table>
	<row>
		<d>familyid</d>
		<d>individualid</d>
		<d>gender</d>
		<d>father</d>
		<d>mother</d>
		<d>affected</d>
		<d>dztwin</d>
		<d>mztwin</d>
		<d>姓</d>
		<d>名字</d>
		<d>村</d>
		<d>\u4e61</d>
		<d>省</d>
	</row>
	<row>
		<d>uni0002zh</d>
		<d>indv0001</d>
		<d>男</d>
		<d>.</d>
		<d>.</d>
		<d>a</d>
		<d>.</d>
		<d>.</d>
		<d>周</d>
		<d>祿山</d>
		<d>.</d>
		<d>.</d>
		<d>.</d>
	</row>
	...
     </table>
</madeline-xml>

Instances of multiple data tables within a single file are supported. For example, you could place a pedigree table and a corresponding map table together into a single Madeline-XML file. As shown here, it is much more convenient to embed Unicode characters in XML files.

OASIS OpenDocument XML Format

Madeline recognizes the Organization for the Advancement of Structured Information Standards (www.oasis-open.org) OpenDocument XML standard for spreadsheet data. This format can be produced by OpenOffice.org v. 2.0 or newer as well as by other software.

As with the Madeline-XML format, column labels must appear on the first row of the table. Multiple tables -which appear as individual "workbooks" in OpenOffice.org- can be included in a single file.

W3C XHTML Table Format

Madeline can read data directly from tables in XHTML files. This means, for example, that you can put sample data tables on a web page and Madeline can read the data directly from the URL of the web page. When reading XHTML files, Madeline simply ignores tags that are not related to the data tables themselves. As before, column labels must appear on the first row of the table. The following is an excerpt from an XHTML data table:

 <table>
	<tbody>
		<tr>
			<td>familyid</td>
			<td >individualid</td>
			<td >gender</td>
			<td >father</td>
			<td >mother</td>
			<td >affected</td>
			<td >dztwin</td>
			<td >mztwin</td>
			<td >姓</td>
			<td >名字</td>
			<td >村</td>
			<td >\u4e61</td>
			<td >省</td>
		</tr>
		<tr>
			<td>uni0002zh</td>
			<td>indv0001</td>
			<td>男</td>
			<td>.</td>
			<td>.</td>
			<td>a</td>
			<td>.</td>
			<td>.</td>
			<td>周</td>
			<td>祿山</td>
			<td>.</td>
			<td>.</td>
			<td>.</td>
		</tr>
		...
	</table>
</html>

Microsoft OpenXML Format

Madeline is also capable of reading the OpenXML workbook format developed by Microsoft for it's Office suite. This is now an ECMA standard.

Extending Madeline to Recognize Other XML Formats

Madeline can be easily extended to recognize additional XML formats. The XMLTagManager class in XMLTagManager.h stores a vector of XMLTagName objects which define the relevant tags needed to describe data tables in XML. It is trivial to define relevant tags and add new XMLTagName objects to the existing vector. Please email us at madelinesoftware@umich.edu if you would like to have a widely-used XML format added into the source code distribution in the next release of the program.

Data Compression And Archive Formats

Madeline automatically recognizes and decompresses data files that have been compressed using one of the following common formats:

PKZIP Format
GZIP Format
BZIP2 Format

Recognition is based on scanning files, not on file extensions, so it does not matter how files are named.

It also does not matter which compression format is used. For example, an OpenOffice.org “.ods” spreadsheet file is a PKZIP archive containing, among other things, the spreadsheet data in a file called “content.xml”. Madeline can read a “content.xml” data file as an uncompressed XML file, or even as gzipped or bzip2'd file just as easily as it can read it from a PKZIP archive.

Other compression formats such as 7-zip, rar (a proprietary format used on Windows), or sit (used on Apple Macintosh computers) are not supported. Other archive formats, such as tar, are not supported.

Madeline's behaviour with regard to each of the supported formats is described below.

PKZIP Format

The PKZIP format is a combination of an archive and compression format. This allows multiple source files to be archived into a single zip file.

OpenOffice.org OASIS spreadsheet files are actually just PKZIP archives containing a number of individual XML files. These files have an “.ods” file extension. When opening a .ods file, Madeline automatically reads the main data file called “content.xml” and ignores all other files in the archive.

If Madeline opens up a PKZIP archive and does not find a file called “content.xml”, then the program trys to read the first file in the archive, regardless of name, as the data table. It is therefore possible for you to include a manifest or other supplementary files in non-OASIS zip archives used with Madeline, as long as you make sure that the first file is always the data file itself.

From a command line, you can use the zip utility to compress or decompress PKZIP files.

GZIP Format

The gzip format is a common compression format used on *nix systems. From a command line, you can use the gzip utility to compress or decompress gzip files.

BZIP2 Format

The bzip2 format achieves a higher compression ratio than the gzip format. From a command line, you can use the bzip2 utility to compress or decompress bzip2 files.

Accessing Data Files over a Network

Data files in any of the supported files can be accessed over the internet via HTTP and secure HTTPS. Simply provide the URL of the data file as an argument to the program. Be sure to include “http://” or “https://” as a prefix to the file name so that Madeline can identify the network file properly.

Accessing Data From a MySQL Database

Madeline can also read data directly from a MySQL data table.

Data Elements

A “data element” is any single item of data, such as the contents of a single cell in a data table. A data element might be an individual or family identifier, a clinical measurement on a patient, a date of birth, a genotype, or some other form of information.

Because there are different kinds of data, different “containers” may used to hold differing kinds of data. In object-oriented software design, these containers are called “classes”. Classes are both containers for data as well as sets of methods which operate on the data.

For example, a “date” class might have a method that allows one to subtract today's date from a date of birth in order to arrive at a person's current age (in days). Such a method is only useful when operating on dates. It would not make sense to use such a method when operating on genotypes. For this reason, the method is associated with dates only as part of a date class.

Classes of Data

Madeline implements a virtual base class called “Data” which represents a generic container for a data element. From this base class, a number of concrete data classes are derived. Some of the derived classes, such as String, are themselves sub-classed to create more specialized data containers, as shown below:

Data:
  Boolean:
    Gender
    LivingDead
    Proband
    Sampled
    Consultand
    Carrier
  Number
  Date
  String:
    Affected
    Genotype
    Haplotype

Boolean, Number, Date, and String are all derived directly from Data. The Gender class is subclassed from Boolean, while Affected, Genotype, and Haplotype are subclassed from the String class.

All concrete data classes share certain characteristics. For example, all data elements can hold missing values. Madeline interprets a reasonable set of default “indicator values” as being missing values. For example, an isolated dot (period or full-stop character), “.”, is interpreted as being a missing value by all of the classes, while a label such as “0/0” is interpreted as being a missing genotype only by the Genotype class. The user can also easily define additional missing value indicators.

In addition to missing value indicators, some classes interpret certain non-missing values as having special meanings. For example, the Gender class automatically interprets “M”, “m”, and “男” to mean “male”, while “F”, “f”, and “女” are understood as indicating “female”. Here again the user can easily define additional indicators as required. For example, if a data set were coded using “homme” for males and “femme” for females, the user could easily add these definitions to the list of indicator values already known to the program.

Note that the program is a modern Unicode-based program. As illustrated by the Chinese “男” and “女” gender labels above, the user is not limited to ASCII-based encodings, but instead can use any symbols or scripts defined in Unicode.

When sorting a list of data values, missing values always appear at the end of the list.

The individual data classes are described below.

The Boolean Class

The Boolean class is used to hold true, false, or missing values.

“T”, “t”, and “真” are recognized by default as indicating “true”.

“F”, “f”, and “假” are recognized by default as indicating “false”.

When sorting a list of Boolean values, false values are sorted before true values.

The Gender Class

The Gender class, derived from Boolean, is used to hold male, female, or missing (gender unknown) values.

“M”, “m”, “♂”, “男” and “雄” are recognized by default as indicating “male”.

“F”, “f”, “♀”, “女” and “雌” are recognized by default as indicating “female”.

When Gender values are cast to Boolean, males are false and females are true.

When sorting a list of Gender values, males are sorted before females.

The LivingDead Class

Named after the fashion of a 1968 horror movie, the LivingDead class is a Boolean-derivative that tracks whether an individual is deceased or living.

“Y” and “y” are recognized as coding for the “deceased” state.

“N” and “n” are recognized as coding for the “living” state.

In practice missing entries are assumed to be living.

The Proband Class

The Proband class is also a Boolean-derived class that tracks whether an individual is a proband in a pedigree.

“Y” and “y” are recognized as coding for the “is a proband” state.

“N” and “n” are recognized as coding for the “not a proband” state.

In practice missing entries are assumed to not be probands.

The Number Class

The number class is used to hold discrete numeric values, ranged numeric values, and missing numeric values. Discrete values can be flagged as being approximations (by preceding them with “~”), but the use of ranged numeric values for approximations is preferred.

Ranged numeric values are indicated by square brackets, such as “[27 → 32]”.

Ranged numeric values are permitted because they often occur in medical research. For example, a patient's intraocular pressure (IOP) may have been recorded as a range of values by the examining doctor or nurse. Rather than lose this information, Madeline allows you to store ranged values directly. Note that ranges include the endpoints.

While the program displays ranges using an arrow “→” as the range separator, the program treats the letter “r” and, in some cases just a dash character “-” as alternate separator characters because these are easier to type from the keyboard. For example, entering “[27 r 32]” is equivalent to “[27 → 32]”.

In some situations, such as when values need to be used for additional statistical processing, the mean of the range will be used. For this reason, the program may display the mean of a range in addition to the range itself, i.e. “x\u0304=0.75 [-1 → 2.5]”.

Mathematical operations are permitted on ranged values just as they are permitted on discrete values. It is however up to the user to insure that mathematical operations on ranged values are meaningful. For example, a range [1.5 → 3.2] representing centimeters may be converted to inches by multiplying by 2.54. In contrast, taking the square root of the range [-2 → 3] results in a domain error which sets the result to MISSING because the Number class does not implement complex numbers.

Numbers may be entered using ordinary Arabic-Indic digits or number forms from other scripts. For example, entering Arabic “\u0661\u0662\u0663” is equivalent to entering “123”.

When sorting a list of Numbers, discrete values appear before ranged values. Ranged values with equivalent means are themselves sorted according to the width of the range, with narrower ranges appearing first. For example, in the following sorted list the mean value of all entries is 0.75:

1.      0.75
2.      x\u0304 = 0.75 [-0.5 → 2] 
3.      x\u0304 = 0.75 [-1 → 2.5]
4.      x\u0304 = 0.75 [-2 → 3.5]

The Date Class

The Date class shares a number of features with the Number class. Dates are internally stored as Julian day numbers, and proleptic Julian dates are supported. Dates are delimited by curly braces and normally displayed in ISO-8601 compatible “YYYY-MM-DD” format. For example, {1927-12-31} represents Saturday, December 31, 1927.

The program treats the Gregorian calendar as having begun on Friday, October 15, 1582, the previous day having been Thursday, October 4, 1582, the last day of the Julian calendar. Note that many European countries --even some of the Catholic ones-- did not actually immediately adopt the calendar after it was promulgated by Pope Gregory XIII.

Ranged dates are supported. Ranged dates are a huge convenience as they make it both possible and easy to store incomplete dates.

For example, a proband may only recall that her “mother was born in 1927”. A date entered into the program as {1927} is automatically recognized as the ranged date {1927.01.01 → 1927.12.31}. Similarly, a date entered as {1927/05} is automatically recognized as the entire month of May, 1927: {1927.05.01 → 1927.05.31}.

Numerous other variations are possible. For example, a veteran only recalls that he began experiencing symptoms “after WWII but before the Korean War” : this could be recorded as “{1945.05.08 r 1950.06.25}”.

Like the Number class, the Date class understands non-Western number forms in common use in the world today.

The Date class also has foundational infrastructure to support other calendar systems, such as the Gregorian-based Thai Buddhist calendar system and the Islamic Hijri system. Full support for these and other calendar systems may be implemented in future versions of the program.

As with the Number class, when dates are sorted in a list, discrete dates appear before ranged dates, and narrower ranges appear before wider ranges when the mean of the ranges is equivalent.

The String Class

The String class is used to store string elements, such as IDs and names.

The Affected Class

The Affected class is used to store encoded affection status values. These values are stored as strings. Although any number of categorical levels of affection status may be stored, the program interprets certain affection labels as representing meaningful states of “affected”, “unaffected”, or “missing”.

By default, the program treats “A” and “a” as representing the “affected” state and “U” and “u” as representing the “unaffected” state.

Note that it is both possible and useful to encode other affection states. For example, a study team may decide to use “RA” to mean “reported affected” when a study participant reports a family member as being affected but there is no blood sample, medical record, or doctor's report confirming this. When the time for analysis arrives, the team's statistician can easily tell Madeline to treat all “RA” values as meaning “affected”, “unaffected”, or “missing”, depending upon the goals and requirements of the analysis.

Sorted lists of Affected values are sorted alphanumerically just as Strings are. This gives the user the flexibility to declare numerous affection states, such as the “RA” state given above. Such flexibility would not be possible if Affected values were treated like the boolean values of the Boolean or Gender classes.

The Genotype Class

The Genotype class is used to store genotypes. Genotypes are recognized as a pair of numbers separated by a forward-slash character, e.g. “102/108”.

SNP genotypes coded using A,C,G, and T are also recognized. SNP alleles are stored internally using A=1, C=2, G=3 and T=4. The program will have the facility to export SNP genotypes in either alpha or numeric format as required.

When creating a sorted list, genotypes are ordered numerically by first allele, then by second allele. In a mixed list, SNP genotypes sort before numerically-equivalent non-SNP genotypes. This occurs because the system distinguishes SNP genotypes as being different from other genotypes. The following list --which mixes SNP and non-SNP genotypes and should not occur in practice-- illustrates this:

1.      A/A
2.      A/C
3.      C/C
4.      2/2
5.      C/G
6.      2/3

The Haplotype Class

The Haplotype class is used to store haplotypes. The following symbols, consistent with Gonçalo Abecasis' program Merlin, are recognized for encoding recombination information:

   : No recombination information available.
   | No recombination.
   / Maternal recombination.
   \ Paternal recombination.
   + Recombination from both sides.

Consistent with Merlin, the first allele always represents the allele from the mother and the second allele that from the father.

When creating a sorted list, Haplotypes are sorted numerically by first allele, followed by the recombination symbol sorted in the order shown in the table above, and then by the second allele. For example:

1.      1:2
2.      1:4
3.      1|2
4.      1/2
5.      1\2
6.      1+8
7.      2:1
8.      2|1

The implementation of the Haplotype class in Madeline is not yet complete.

Types of Individuals

For the most part, you can use any prefix you like on the identifier of an individual. However, the following characters have special meanings when used as the first letter of an IndividualId:

Prefix	Description	Visual Appearance
^	Indicates a marriage with no offspring.
&	Indicates a marriage with no offspring due to infertility.
@	Indicates a spontaneous abortion, ectopic pregnancy, or terminated pregnancy.
!	Indicates a virtual individual inserted by the program. Examples are unrecorded fathers or mothers who nevertheless need to be shown on pedigree drawings.

Glossary

A

Ancestral Founder. An ancestral founder is a founder at the top of a descent tree. Algorithmically, the ancestral founders are the parents of an individual who is missing all four grandparents.

Attached Individual. An attached individual is an individual who is connected to parents or spouse(s) and children. Opposite of unattached individual.

B

Bilineal Pedigree. A bilineal pedigree is a pedigree in which two descent trees are joined together by at least one mating.

C

Circle. A circle represents a female on a pedigree drawing.

Consanguinity. Consanguinity is a mating between two members of the same descent tree. See consanguinous loop.

Consanguinous Loop. A consanguinous loop is a mating between two members of the same descent tree. Common consanguinous loops are uncle-neice matings and first cousin matings.

Consultand. A consultand is an individual who has sought genetic counseling or testing. Distinguish from proband.

D

Descent Tree. A descent tree is a v-shaped tree of descendants originating from a pair of ancestral founders.

Diamond. A diamond represents an individual of unknown gender on a pedigree drawing. A diamond enclosing a numeral represents an aggregate of sibs shown in a collapsed representation on a pedigree drawing.

Dizygotic Twin. Dizygotic twins are two or more full sibs who share the same date of birth. In human studies, these are also called fraternal twins. In animal studies, these are litter mates.

E

F

Female. A female is represented by a circle on a pedigree drawing.

Founder. A founder is an individual with no parents.

G

H

Half Sib. A half sib is a sib who shares only one parent with his brothers or sisters.

I

Icon. An icon is a circle or square or diamond representing an individual on a pedigree drawing.

Individual. An individual is a person in a pedigree, represented on a pedigree drawing by a circle if female, a square if male, and a diamond if gender is unknown.

J

K

L

Loop. See consanguinous loop.

M

Male. A male is represented by a square on a pedigree drawing.

Marriage. See mating.

Mate. A mate is one of the partners in the union of a male and female which produces offspring.

Mating. A mating is the union of a male and female which produces offspring. This is often, but not exclusively, a marriage.

Monozygotic Twin. Monozygotic twins are two or more genetically identical individuals derived from the splitting of a single zygote.

N

Non-founder. A non-founder is an individual who is connected to his parents. Opposite of founder.

Nuclear Family. A nuclear family consists of two parents and one or more offspring.

O

Offspring. An offspring is the child of a parent.

P, Q

Parent. A parent is an individual with one or more offspring.

Pedigree. A pedigree is a collection of genetically related individuals organized into one or more descent trees.

Proband. A proband is the first affected member of a pedigree coming to medical attention. Distinguish from consultand.

R

S

Sib, Sibling, Sibship. A sib or sibling is a full brother or sister in the sibship (group of siblings) of a simple nuclear family. In a collapsed representation, an aggregate of two or more sibs may be shown by a single diamond enclosing a numeral indicating the number of sibs.

Singleton. A singleton is a synonym for an unattached individual. A singleton pedigree is a pedigree which contains a single unattached individual.

Spouse. See mate.

Square. A square represents a male on a pedigree drawing.

T

Terminal Individual. A terminal individual is an attached individual who has no offspring. Only terminal individuals can have gender status set to unknown or missing.

Twin. A twin is either a monozygotic twin or dizygotic twin. Madeline also recognizes the case of twins of unknown zygosity who are identified automatically based on having the same date of birth.

U, V, W

Unattached Individual. An unattached individual is an individual who is connected to no one, neither spouse nor children. Opposite of attached individual.