Encoded in UTF-8 Unicode Valid CSS! Valid XHTML 1.0!

Madeline Version 0.935 Tutorial

by Edward H. Trager <ehtrager@umich.edu> (June 2004)

© 2004 by the Regents of the University of Michigan ALL RIGHTS RESERVED


Tutorial
Contents

Introduction

This tutorial will take you through the entire process of preparing and analysing a data set using Madeline.

The data used in this tutorial are based on a set of real data provided by Dr. Charles Krafchak of the Kellogg Eye Center in Ann Arbor that have been intentionally modified to better facilitate the didactic goals of this tutorial. The real data were used in the PPCD3 study by Shimizu et al. (A Locus for Posterior Polymorphous Corneal Dystrophy (PPCD3) Maps to Chromosome 10, American Journal of Medical Genetics (in press), 2004.

All the files mentioned in this tutorial can be found in the tutorial subdirectory of the software distribution. A number of the files have been placed in separate subdirectories for clarity of presentation. In order to work through the whole tutorial, copy each file as needed into a separate working directory of your own creation. This tutorial assumes that you are comfortable working from a UNIX/Linux command line environment.

Preparing Genetic Maps

The GeneticMaps subdirectory of the tutorial directory contains two lists. For chromosome 10, there is a list of 26 markers (chr10markers.list). For chromosome 20, there is a list of 27 markers (chr20markers.list).

A quick and convenient way to obtain reasonably good genetic maps for these markers is to use the Marshfield Clinic's Build Your Own Map online resource. After entering the desired chromosome number in the online form, simply copy and paste the list of markers into the form and press Submit Form.

Marshfield Build Your Own Map screenshot

The Marshfield server runs Crimap against their data and returns comprehensive sex-averaged, male, and female maps to your browser. Copy and paste the results into a file. Repeat the process for the chromosome 20 markers. In the GeneticMaps subdirectory, we have saved these two files as MarshfieldChr10FrameworkMap.txt and MarshfieldChr20FrameworkMap.txt.

Now we can use Madeline's convert command to convert these files to Madeline format. Here is the command and results for chromosome 10:

M>convert marshfield file 'MarshfieldChr10FrameworkMap.txt' to 'chr10.map'
Converting input file "MarshfieldChr10FrameworkMap.txt"
to Madeline-formatted output files,
"chr10.map" and "chr10.map.mfh" ...
================
Converting file
================
This is a map of chromosome 10 markers ...
Converting Marshfield map ...
chr10.map created ...
================
Recognizing file
================
HEADER block spans lines 1 to 4.
DATA block spans lines 6 to 31.
Skipping a total of 5 lines at top.
There are 4 non-empty header lines and 26 data lines.
Data records are 102 bytes long.

 # . Field Name  Start End   Length Prec. Space Type
---- ----------- ----- ----- ------ ----- ----- -----
  1. CHROMOSOME      1     2     2     0     2 N
  2. ORDINAL         5     6     2     0     1 N
  3. MARKERNAME      8    15     8     0     7 C
  4. POSITION       23    28     6     2     1 N
  5. THETA          30    36     7     5     4 N
  6. DISTANCE       41    45     5     2     5 N
  7. POSITION_F     51    56     6     2     1 N
  8. THETA_F        58    64     7     5     4 N
  9. DISTANCE_F     69    73     5     2     5 N
 10. POSITION_M     79    84     6     2     1 N
 11. THETA_M        86    92     7     5     4 N
 12. DISTANCE_M     97   101     5     2     1 N
Binary recognition header file ("chr10.map.mfh") written.
This appears to be a MAP TABLE which can be opened using:

        load "chr10.map.mfh"

M>

Madeline converts the map table to chr10.map and also creates a binary .mfh header file to go along with it. chr10.map is a human-readable text file whereas chr10.map.mfh is a binary index file that Madeline uses as a guide to optimize table access. Madeline typically requires that a .mfh accompany all data files.

Notice how Madeline also provides you with recombination fractions in addition to the inter-marker distances. The recombination fraction and inter-marker distance on any given row in the table refer to that fraction and distance respectively between the current marker and the marker that follows on the next row in the map:

CHROMOSOME N  ORDINAL    N  MARKERNAME C
POSITION   N  THETA      N  DISTANCE   N
POSITION_F N  THETA_F    N  DISTANCE_F N
POSITION_M N  THETA_M    N  DISTANCE_M N

10   1 D10S249          2.13 0.11168    11.36       4.63 0.05320     5.34       0.00 0.16739    17.41 
10   2 D10S591         13.49 0.05488     5.51       9.97 0.01530     1.53      17.41 0.09397     9.51 
10   3 D10S189         19.00 0.10013    10.15      11.50 0.12218    12.47      26.92 0.07249     7.30 
10   4 D10S547         29.15 0.08662     8.75      23.97 0.10950    11.13      34.22 0.06837     6.88 
10   5 D10S191         37.90 0.07737     7.80      35.10 0.14281    14.69      41.10 0.01060     1.06 
10   6 D10S548         45.70 0.06365     6.40      49.79 0.08458     8.54      42.16 0.04260     4.27 
10   7 D10S197         52.10 0.05300     5.32      58.33 0.08468     8.55      46.43 0.02129     2.13 
10   8 D10S213         57.42 0.03216     3.22      66.88 0.04289     4.30      48.56 0.02119     2.12 
10   9 D10S208         60.64 0.03186     3.19      71.18 0.05320     5.34      50.68 0.01070     1.07 
10  10 D10S1780        63.83 0.02139     2.14      76.52 0.04270     4.28      51.75 0.00000     0.00 
10  11 D10S578         65.97 0.04250     4.26      80.80 0.06375     6.41      51.75 0.02129     2.13 
10  12 D10S220         70.23 0.01599     1.60      87.21 0.02139     2.14      53.88 0.01060     1.06 
10  13 D10S567         71.83 0.01070     1.07      89.35 0.02149     2.15      54.94 0.00000     0.00 
10  14 D10S539         72.90 0.02667     2.67      91.50 0.05320     5.34      54.94 0.00000     0.00 
10  15 D10S1790        75.57 0.05181     5.20      96.84 0.05102     5.12      54.94 0.05320     5.34 
10  16 D10S561         80.77 0.00000     0.00     101.96 0.00000     0.00      60.28 0.00000     0.00 
10  17 D10S1652        80.77 0.01729     1.73     101.96 0.01280     1.28      60.28 0.02129     2.13 
10  18 D10S581         82.50 0.08545     8.63     103.24 0.10702    10.87      62.41 0.07346     7.40 
10  19 D10S537         91.13 0.22424    24.14     114.11 0.26860    30.01      69.81 0.16695    17.36 
10  20 D10S583        115.27 0.08904     9.00     144.12 0.13367    13.70      87.17 0.04299     4.31 
10  21 D10S192        124.27 0.04448     4.46     157.82 0.06699     6.74      91.48 0.02159     2.16 
10  22 D10S597        128.73 0.09619     9.74     164.56 0.14630    15.07      93.64 0.04309     4.32 
10  23 D10S190        138.47 0.09001     9.10     179.63 0.11661    11.88      97.96 0.06099     6.13 
10  24 D10S587        147.57 0.10176    10.32     191.51 0.09570     9.69     104.09 0.11064    11.25 
10  25 D10S217        157.89 0.12762    13.05     201.20 0.09599     9.72     115.34 0.15746    16.30 
10  26 D10S212        170.94       .        .     210.92       .        .     131.64       .        . 

For Chromosome 20, notice that Marshfield reports D20S482 as being a cryptic duplicate of GATA149E11 which is reported in the map. It is therefore necessary to replace the word "unknown" in the D number column with "D20S482" prior to running Madeline's convert command:

Chromosome 20

    Some of your markers are 'cryptic' duplicates.
    Listed below: these duplicates with the marker on the map

      'cryptic' duplicate                 marker on the map
    GATA51D03      D20S482            GATA149E11     Unknown        

        Comprehensive genetic map (distances in Kosambi cM)

     Marker         D number        Sex-averaged         Female       

  1  AFM248yc5      D20S117                 2.83               0.00 ...  
                                    3.42               0.51           
  2  AFMa131wf1     D20S199                 6.25               0.51 ...  
                                    2.72               1.09           
  3  AFMa175vb1     D20S842                 8.97               1.60 ...  
                                    0.56               0.00           
  4  AFM308we1      D20S193                 9.53               1.60 ...  
                                    1.67               1.61           
  5  AFM234tf10     D20S889                11.20               3.21 ...  
                                    0.92               0.00           
  6  GATA149E11     Unknown                12.12               3.21 ...  
                                    4.53               3.21           
  7  AFM023ta1      D20S95                 16.65               6.42 ...
  .      .             .             .       .          .       . 
  .      .             .             .       .          .       . 
  .      .             .             .       .          .       . 

Now the two map files can be combined into one file. In a text editor, simply copy the rectangular table from chr20.map and paste it at the bottom of chr10.map. Alternatively, here's one way you could obtain the same result using simple UNIX commands:

  cat chr10.map > chr10.20.map
  grep "^20" chr20.map >> chr10.20.map

If you want, you can run UNIX utilities or programs without ever quitting Madeline using the system command, like this:

  M> system 'cat chr10.map > chr10.20.map ; grep "^20" chr20.map >> chr10.20.map'

Note: To temporarily leave Madeline to complete interactive tasks outside of Madeline, just do this:

  M> system 'bash'
 

... which starts a (Linux) bash shell. Just type "exit" when you are ready to leave Bash and return to Madeline.

Now within Madeline simply run the recognize command on the merged map file, "chr10.20.map". This creates the .mfh guide file that is required before we can open and manipulate the table:

M>recognize 'chr10.20.map'
Starting to recognize file "chr10.20.map" to "chr10.20.map.mfh" ...
HEADER block spans lines 1 to 4.
DATA block spans lines 6 to 58.
Skipping a total of 5 lines at top.
There are 4 non-empty header lines and 53 data lines.
Data records are 102 bytes long.

 # . Field Name  Start End   Length Prec. Space Type
---- ----------- ----- ----- ------ ----- ----- -----
  1. CHROMOSOME      1     2     2     0     2 N
  2. ORDINAL         5     6     2     0     1 N
  3. MARKERNAME      8    15     8     0     7 C
  4. POSITION       23    28     6     2     1 N
  5. THETA          30    36     7     5     4 N
  6. DISTANCE       41    45     5     2     5 N
  7. POSITION_F     51    56     6     2     1 N
  8. THETA_F        58    64     7     5     4 N
  9. DISTANCE_F     69    73     5     2     5 N
 10. POSITION_M     79    84     6     2     1 N
 11. THETA_M        86    92     7     5     4 N
 12. DISTANCE_M     97   101     5     2     1 N
Binary recognition header file ("chr10.20.map.mfh") written.
This appears to be a MAP TABLE which can be opened using:

        load "chr10.20.map.mfh"
M>

As Madeline suggests, let's try the load command followed by a list map command:

M>load 'chr10.20.map.mfh'
Marker maps based on chr10.20.map.mfh are now installed.
M>list map for chromosome 20

                    Map Position (Kosambi cM)
                  -----------------------------
Ch Or Marker Name Sex-avg.   Female     Male
-- -- ----------- --------- --------- ---------
20  1 D20S117        2.8300    0.0000    5.4800
20  2 D20S199        6.2500    0.5100   11.4800
20  3 D20S842        8.9700    1.6000   16.2600
20  4 D20S193        9.5300    1.6000   17.3400
20  5 D20S889       11.2000    3.2100   19.4800
20  6 D20S482       12.1200    3.2100   21.2600
20  7 D20S95        16.6500    6.4200   27.0000
20  8 D20S115       21.1500   11.7800   30.6300
20  9 D20S189       30.5600   23.4400   37.7600
20 10 D20S186       32.3000   25.8400   38.8200
20 11 D20S604       32.9400   26.9100   38.8200
20 12 D20S66        34.2200   29.0500   38.8200
20 13 D20S910       35.5100   29.0500   42.0300
20 14 D20S852       36.5800   30.1200   43.1000
20 15 D20S104       37.6500   32.2600   43.1000
20 16 D20S98        37.6500   32.2600   43.1000
20 17 D20S875       38.7200   34.3900   43.1000
20 18 D20S118       39.2500   35.4600   43.1000
20 19 D20S912       46.7100   49.2100   44.2900
20 20 D20S195       50.8100   56.2800   45.4700
20 21 D20S107       55.7400   64.9700   46.6600
20 22 D20S119       61.7700   74.7500   49.0200
20 23 D20S178       66.1600   81.2700   51.4100
20 24 D20S196       75.0100   96.8200   53.7700
20 25 D20S100       84.7800  112.0300   58.0400
20 26 D20S171       95.7000  120.9100   71.5600
20 27 D20S173       98.0900  120.9100   76.0100
M>

OK, we now have genetic maps ready for use! The next step is to prepare the pedigree data.

Preparing the Pedigree Data

Conceptually, the pedigree data can be divided into three parts:

In well-designed database systems, these three types of data are often stored in separate tables. A good system will provide ways for you to compile the data into files required for analysis. Madeline also provides functions to compile and merge these data components. These data and the functions to handle the data are described below.

Preparing the Data Family & Phenotype Data

In this example, the family structure, affection status, and age of diagnosis data are contained in the familystructure.data file in the FamilyStructure subdirectory. This file contains columns for family ID, individual ID (STUDYID), gender, father, mother, monozygotic and dizygotic twin status, affection status, and age of diagnosis for affected individuals. The single capital letters following the column labels in the header of the file indicate the type of data that the column contains (C=character, X=gender, N=numeric):

FAMID C
STUDYID C
SEX X
FATHER C
MOTHER C
MZTWIN C
DZTWIN C
AFFECTED C
AGEDX N

F0099  S00925         M U0001C         U0001D         . . A 47 
F0099  S00926         F U0001C         U0001D         . . A 38 
F0099  S00951         M U0001C         U0001D         . . A 45 
F0099  S00973         F U0001I         S00981         . . U  .  
  .      .            .    .              .           . . .  .
  .      .            .    .              .           . . .  .
  .      .            .    .              .           . . .  .

Since there are no twins present, the columns for MZTWIN and DZTWIN contain nothing but dots "." as a missing value indicator.

Note: If twins were present, the first twin pair would be coded using "A", the second pairing using "B", and so on ...

In this file, the identifiers of sampled individuals begin with "S" while those of unsampled individuals begin with "U". This is not a requirement of Madeline --the program does not care how you label individuals, as long as labels are unique. This is however a useful labeling convention.

After running the recognize command, one can open the pedigree table and create a drawing of the pedigree:

M>recognize 'familystructure.data'
Starting to recognize file "familystructure.data" to "familystructure.data.mfh" ...
HEADER block spans lines 1 to 9.
DATA block spans lines 11 to 48.
Skipping a total of 10 lines at top.
There are 9 non-empty header lines and 38 data lines.
Data records are 63 bytes long.

The gender field has been identified.
The individual, father, and mother ID fields have been identified.
 # . Field Name  Start End   Length Prec. Space Type
---- ----------- ----- ----- ------ ----- ----- -----
  1. FAMID           1     5     5     0     2 C
  2. STUDYID         8    13     6     0     9 C
  3. SEX            23    23     1     0     1 X
  4. FATHER         25    30     6     0     9 C
  5. MOTHER         40    45     6     0     9 C
  6. MZTWIN         55    55     1     0     1 C
  7. DZTWIN         57    57     1     0     1 C
  8. AFFECTED       59    59     1     0     1 C
  9. AGEDX          61    62     2     0     1 N
Binary recognition header file ("familystructure.data.mfh") written.
This appears to be a PEDIGREE TABLE which can be opened using:

        open "familystructure.data.mfh"

M>open 'familystructure.data.mfh'
  8. AFFECTED has 3 levels.
Pedigree table "familystructure.data.mfh" opened with        38 records
NOTE: Pedigree F0099 has 1 unconnected individual.
Pedigrees reconstructed in 0.0000 seconds
Checking simple Mendelian inheritance in nuclear families... :
==============================================================
Inheritance inconsistency:      PEDIGREE        MOTHER  FATHER  MARKER
--------------------------      --------        ------  ------  ------
==============================================================

================================================
Summary of Mendelian Inheritance Inconsistencies
                   by Marker
================================================
 #      MARKERNAME                              NUCLEAR FAMILIES
----    --------------------------------        ----------------
------------------------------------------------
Inconsistencies present among 0 of 0 markers.
================================================

  1.FAMID      Co__1    4.FATHER     Co__4    7.DZTWIN     Co__7
  2.STUDYID    Co__2    5.MOTHER     Co__5    8.AFFECTED   Co__8+
  3.SEX        Co__3    6.MZTWIN     Co__6    9.AGEDX      Po__1
-----------------------------  --------- --------- ---------
Pedigrees and Individuals       Included  Excluded     Total
-----------------------------  --------- --------- ---------
Pedigrees ...................          1         0         1
Individuals .................         38         0        38
 + In database ..............         38         0        38
 |  + Attached ..............         37         0        37
 |  + Childless spouses .....          0         0         0
 |  + Unattached ............          1         0         1
 + Not in database ..........          0         0         0
M>draw pedigrees for #true
Drawing pedigree F0099, U0001B's subtree (subtree 1 of 1) ...
Printing drawing scaled to 0.78.

1 pedigree in result set.

M>

Notice above how the program identifies what it considers to be core fields --including the affection status field-- with "C", while age of diagnosis, AGEDX, is marked with "P" for phenotype. In this file there are eight core fields and one phenotype field present. If genotype fields were present, they would be designated with "G".

If Madeline has been installed correctly, it should automatically call gv or another Postscript viewing program to view the resulting pedigree drawing, madeline.pedigree.ps:

Family F0099 Pedigree Drawing

This family structure and phenotype data will need to be merged with the genotype data. But first let's take a look at the genotype data:

Preparing the Data Genotype Data

Marker data are normally stored in a table format where each row contains the alleles for one marker typed on one individual. To see all of the genotypes for one individual, you have to scan across multiple rows. In Madeline, this type of table is called a decomposed table (In contrast, a composed table contains all the genotypes for one individual in one row). To make things simple, we have included just two markers in the decomposed.sample file in the DecomposedGenotypeData subdirectory:

STUDYID
MARKERNAME
ALLELE1
ALLELE2

S00925  D10S1652  293   297
S00925  D10S1780  232   238
S00926  D10S1652  293   297
S00926  D10S1780  232   238
S00951  D10S1652  0     0    
S00951  D10S1780  232   232
S00973  D10S1652  293   295
  .        .       .     .
  .        .       .     .
  .        .       .     .  

Madeline can quickly convert a decomposed table to the composed format using the compose command. As always, you must first recognize the table before you can perform any other operation on it. Notice how the program recognizes what type of table you are operating on, and even suggests you use the compose command:

M>recognize 'decomposed.sample'
Starting to recognize file "decomposed.sample" to "decomposed.sample.mfh" ...
HEADER block spans lines 1 to 5.
DATA block spans lines 7 to 60.
Skipping a total of 6 lines at top.
There are 5 non-empty header lines and 54 data lines.
Data records are 35 bytes long.

 # . Field Name  Start End   Length Prec. Space Type
---- ----------- ----- ----- ------ ----- ----- -----
  1. FAMID           1     5     5     0     3 C
  2. STUDYID         9    14     6     0     2 C
  3. MARKERNAME     17    24     8     0     2 C
  4. ALLELE1        27    29     3     0     3 N
  5. ALLELE2        33    35     3     0     0 N
Binary recognition header file ("decomposed.sample.mfh") written.
This appears to be a DECOMPOSED TABLE which can be converted using:

        compose "decomposed.sample.mfh"

M>compose 'decomposed.sample.mfh' to 'composed.sample'
Composing "decomposed.sample.mfh" to "composed.sample" ...
Composed file has been created
(Remember to specify the Madeline ".mfh" files when merging composed
tables with family structure or other tables)
M>

Here's what the resulting composed.sample file looks like:


FAMID C
STUDYID C
D10S1652 C
D10S1780 C

F0099 S00925 293/297   232/238  
F0099 S00926 293/297   232/238  
F0099 S00951 .         232/232  
F0099 S00973 293/295   232/236  
F0099 S00976 285/293   232/232  
  .     .       .         .   
  .     .       .         .   
  .     .       .         .   

Notice above how alleles from the original file have now been combined into genotypes separated by forward slash "/" characters. You can probably guess that the next command to use is called merge:

M>merge 'familystructure.data.mfh' , 'composed.sample.mfh' to 'merged.sample' in physical order
Physical order specified for merging fields
Merging 2 tables to "merged.sample" ...
Building field and record trees ...
Writing 38 records to merged.sample ...
Writing Madeline binary header file "merged.sample.mfh" ...
2 tables merged to merged.sample in     0.00 seconds
(Remember to use the Madeline ".mfh" file when opening the merged table)
M>

And voila! Here's what the merged file looks like:


FAMID C
STUDYID C
SEX C
FATHER C
MOTHER C
MZTWIN C
DZTWIN C
AFFECTED C
AGEDX N
D10S1652 C
D10S1780 C

F0099 S00925 M U0001C U0001D     A 47 293/297 232/238
F0099 S00926 F U0001C U0001D     A 38 293/297 232/238
F0099 S00951 M U0001C U0001D     A 45         232/232
F0099 S00973 F U0001I S00981     U .  293/295 232/236
F0099 S00976 F U0001C U0001D     I .  285/293 232/232
F0099 S00981 F U0001C U0001D     I .  293/297 232/238
F0099 S00989 M U0001C U0001D     I .  285/297 232/232
F0099 S01031 F U0001E S00976     A 39 281/285 232/238
  .     .    .    .      .       .  .    .       .  
  .     .    .    .      .       .  .    .       .  
  .     .    .    .      .       .  .    .       .  

The data --family structure, phenotype, and genotype-- are now in a single file. Since the merge command has already taken care of creating the accompanying .mfh header file, we can now open the data using open.

Preparing the Data Complete Data

Since the merged.sample file that we prepared above contains only two markers, we are now going to switch and use the file complete.data in the CompleteData subdirectory for the rest of the tutorial. The necessary pre-processing steps of composition and merging have already been completed for you, and the complete.data file contains data on all chromosome 10 and chromosome 20 markers. You will only need to run the recognize command since the binary .mfh file has not been provided:

M>recognize 'complete.data'
Starting to recognize file "complete.data" to "complete.data.mfh" ...
...
Binary recognition header file ("complete.data.mfh") written.
This appears to be a PEDIGREE TABLE which can be opened using:

        open "complete.data.mfh"

M>open 'complete.data.mfh'
  8. AFFECTED has 3 levels.
Calculating allele frequencies for  10. D10S1652...
Calculating allele frequencies for  11. D10S1780...
Calculating allele frequencies for  12. D10S1790...
...
Calculating allele frequencies for  61. D20S95...
Calculating allele frequencies for  62. D20S98...
Pedigree table "complete.data.mfh" opened with        38 records
NOTE: Pedigree F0099 has 1 unconnected individual.
Pedigrees reconstructed in 0.0100 seconds
Checking simple Mendelian inheritance in nuclear families... :
==============================================================
Inheritance inconsistency:      PEDIGREE        MOTHER  FATHER  MARKER
--------------------------      --------        ------  ------  ------
INHERITANCE #0001:      F0099   S00926  U0001G  D10S1652
==============================================================

================================================
Summary of Mendelian Inheritance Inconsistencies
                   by Marker
================================================
 #      MARKERNAME                              NUCLEAR FAMILIES
----    --------------------------------        ----------------
 10.    D10S1652                1
------------------------------------------------
Inconsistencies present among 1 of 53 markers.
================================================

  1.FAMID      Co__1   22.D10S220    Go_13   43.D20S171    Go_34
  2.STUDYID    Co__2   23.D10S249    Go_14   44.D20S173    Go_35
  3.SEX        Co__3   24.D10S537    Go_15   45.D20S178    Go_36
  4.FATHER     Co__4   25.D10S539    Go_16   46.D20S186    Go_37
  5.MOTHER     Co__5   26.D10S547    Go_17   47.D20S189    Go_38
  6.MZTWIN     Co__6   27.D10S548    Go_18   48.D20S193    Go_39
  7.DZTWIN     Co__7   28.D10S561    Go_19   49.D20S195    Go_40
  8.AFFECTED   Co__8+  29.D10S567    Go_20   50.D20S196    Go_41
  9.AGEDX      Po__1   30.D10S578    Go_21   51.D20S199    Go_42
 10.D10S1652   Go__1   31.D10S581    Go_22   52.D20S482    Go_43
 11.D10S1780   Go__2   32.D10S583    Go_23   53.D20S604    Go_44
 12.D10S1790   Go__3   33.D10S587    Go_24   54.D20S66     Go_45
 13.D10S189    Go__4   34.D10S591    Go_25   55.D20S842    Go_46
 14.D10S190    Go__5   35.D10S597    Go_26   56.D20S852    Go_47
 15.D10S191    Go__6   36.D20S100    Go_27   57.D20S875    Go_48
 16.D10S192    Go__7   37.D20S104    Go_28   58.D20S889    Go_49
 17.D10S197    Go__8   38.D20S107    Go_29   59.D20S910    Go_50
 18.D10S208    Go__9   39.D20S115    Go_30   60.D20S912    Go_51
 19.D10S212    Go_10   40.D20S117    Go_31   61.D20S95     Go_52
 20.D10S213    Go_11   41.D20S118    Go_32   62.D20S98     Go_53
 21.D10S217    Go_12   42.D20S119    Go_33
-----------------------------  --------- --------- ---------
Pedigrees and Individuals       Included  Excluded     Total
-----------------------------  --------- --------- ---------
Pedigrees ...................          1         0         1
Individuals .................         38         0        38
 + In database ..............         38         0        38
 |  + Attached ..............         37         0        37
 |  + Childless spouses .....          0         0         0
 |  + Unattached ............          1         0         1
 + Not in database ..........          0         0         0
1 INHERITANCE INCONSISTENCY M>

Let's take a look at what's going on here. When we opened the file above, Madeline first calculates allele frequencies using gene counting. Madeline does not take family relationships into account when calculating allele frequencies. Secondly, Madeline examines the file for various types of errors and warning conditions, including simple Mendelian inheritance errors on autosomal markers. As seen above, Madeline has discovered one Mendelian inheritance error and has reported it in red. The program prompt has consequently changed to reflect the number of warning and error conditions encountered:

1 INHERITANCE INCONSISTENCY M>

These warning and error conditions are also summarized in the log file, madeline.err. In addition to the error, there is a note about an unattached individual.

Preparing the Data Mendelian Inheritance Errors

The first order of business is to examine the inheritance error. Fortunately, Madeline makes this task easy. In order to examine the problem, we first toggle off all markers, and then toggle back on only the ones with Mendelian inheritance issues. Here are the commands:

1 INHERITANCE INCONSISTENCY M>toggle off output flags for 10-62
1 INHERITANCE INCONSISTENCY M>toggle on output flags for _IsMendelianInconsistent
1 INHERITANCE INCONSISTENCY M>list fields
  1.FAMID      Co__1   22.D10S220    G       43.D20S171    G
  2.STUDYID    Co__2   23.D10S249    G       44.D20S173    G
  3.SEX        Co__3   24.D10S537    G       45.D20S178    G
  4.FATHER     Co__4   25.D10S539    G       46.D20S186    G
  5.MOTHER     Co__5   26.D10S547    G       47.D20S189    G
  6.MZTWIN     Co__6   27.D10S548    G       48.D20S193    G
  7.DZTWIN     Co__7   28.D10S561    G       49.D20S195    G
  8.AFFECTED   Co__8+  29.D10S567    G       50.D20S196    G
  9.AGEDX      Po__1   30.D10S578    G       51.D20S199    G
 10.D10S1652   Go__1   31.D10S581    G       52.D20S482    G
 11.D10S1780   G       32.D10S583    G       53.D20S604    G
 12.D10S1790   G       33.D10S587    G       54.D20S66     G
 13.D10S189    G       34.D10S591    G       55.D20S842    G
 14.D10S190    G       35.D10S597    G       56.D20S852    G
 15.D10S191    G       36.D20S100    G       57.D20S875    G
 16.D10S192    G       37.D20S104    G       58.D20S889    G
 17.D10S197    G       38.D20S107    G       59.D20S910    G
 18.D10S208    G       39.D20S115    G       60.D20S912    G
 19.D10S212    G       40.D20S117    G       61.D20S95     G
 20.D10S213    G       41.D20S118    G       62.D20S98     G
 21.D10S217    G       42.D20S119    G
1 INHERITANCE INCONSISTENCY M>

Now let's toggle off a few other unneeded fields and draw a pedigree drawing. Notice how AFFECTED has a little plus sign, "+", next to it. This indicates that this field will be used to shade the circles and squares on the pedigree drawing. We can turn off the output flag for this field without affecting the icon flag (represented by the plus "+" sign):

1 INHERITANCE INCONSISTENCY M>toggle off output flags for 1,sex,father,mother,mztwin-affected

NOTE: Core fields will be included in output if required by a
specific "write" format regardless of toggle status.  The
"draw" command, in contrast, will respect the toggle settings
for core fields.
1 INHERITANCE INCONSISTENCY M>list fields
  1.FAMID      C       22.D10S220    G       43.D20S171    G
  2.STUDYID    Co__1   23.D10S249    G       44.D20S173    G
  3.SEX        C       24.D10S537    G       45.D20S178    G
  4.FATHER     C       25.D10S539    G       46.D20S186    G
  5.MOTHER     C       26.D10S547    G       47.D20S189    G
  6.MZTWIN     C       27.D10S548    G       48.D20S193    G
  7.DZTWIN     C       28.D10S561    G       49.D20S195    G
  8.AFFECTED   C    +  29.D10S567    G       50.D20S196    G
  9.AGEDX      Po__1   30.D10S578    G       51.D20S199    G
 10.D10S1652   Go__1   31.D10S581    G       52.D20S482    G
 11.D10S1780   G       32.D10S583    G       53.D20S604    G
 12.D10S1790   G       33.D10S587    G       54.D20S66     G
 13.D10S189    G       34.D10S591    G       55.D20S842    G
 14.D10S190    G       35.D10S597    G       56.D20S852    G
 15.D10S191    G       36.D20S100    G       57.D20S875    G
 16.D10S192    G       37.D20S104    G       58.D20S889    G
 17.D10S197    G       38.D20S107    G       59.D20S910    G
 18.D10S208    G       39.D20S115    G       60.D20S912    G
 19.D10S212    G       40.D20S117    G       61.D20S95     G
 20.D10S213    G       41.D20S118    G       62.D20S98     G
 21.D10S217    G       42.D20S119    G
1 INHERITANCE INCONSISTENCY M>draw pedigrees for _IsMendelianInconsistent
Drawing pedigree F0099, U0001B's subtree (subtree 1 of 1) ...
Printing drawing scaled to 0.79.


1 pedigree in result set.

1 INHERITANCE INCONSISTENCY M>

In this simple example, we have an inheritance problem on only one marker in a single pedigree. While not really necessary here, you can easily imagine the huge convenience that commands like "toggle on output flags for _IsMendelianInconsistent" and "draw pedigrees for _IsMendelianInconsistent" provide when dealing with larger data sets with inheritance issues on multiple markers in numerous families.

Here's the fragment of the pedigree drawing where the inheritance issue occurs. Note how Madeline automatically highlights Mendelian inconsistencies in red:

Fragment of a pedigree drawing highlighting Mendelian
 inheritance errors

Of course for real data, we might have to go back and examine our gels to determine if a mistake was made when calling alleles. In our example here, we'll just go back and look at our original decomposed database table in the DecomposedGenotypeData subdirectory where D10S1652 is shown with allele calls of 281 and 297 for individual S01057:

  .       .        .       .     .
  .       .        .       .     .
  .       .        .       .     .
F0099   S01055  D10S1780  232   240
F0099   S01056  D10S1652  293   299
F0099   S01056  D10S1780  238   240
F0099   S01057  D10S1652  281   297
F0099   S01057  D10S1780  232   232
F0099   S01058  D10S1652  295   297
F0099   S01058  D10S1780  236   238
F0099   S01059  D10S1652  289   293
  .       .        .       .     .
  .       .        .       .     .
  .       .        .       .     .

If we correct this in our data file, then it becomes clear that we will have a system of four alleles and the unsampled father U0001G must be a heterozygote with alleles 281 and 299:

Correcting Mendelian
 inheritance errors

Madeline is extremely useful for detecting and cleaning up many kinds of errors that can occur while preparing data sets for analysis. If this were a real project, we would want to use a program like PedCheck or Merlin to look for additional types of Mendelian inheritance errors, such as unlikely double crossovers, that Madeline does not detect. Naturally Madeline's write command can provide you with the file formats required by these programs.

Preparing the Data Examining an Unnattached Individual

Now let's go back and examine the unattached individual:

M>view data SEX, AFFECTED, AGEDX, _PercentGenotyped for _IsUnattached

F0099   S01077  F	U       .       0

1 individual in 1 pedigree matched as follows:

Individuals ..............          1
 + In database ...........          1
 |  + Attached ...........          0
 |  + Childless spouses ..          0
 |  + Unattached .........          1
 + Not in database .......          0
M>

She is unaffected and we have no genotype data on her. Here's a similar command that would reveal the same information, albeit less concisely:

M>view record for _IsUnattached
S01077 in F0099 (rec. no.    24) * unconnected *
CORE FIELDS:
F0099 S01077 F ...... ...... . . U
PHENOTYPE FIELDS:
..
GENOTYPE FIELDS:
....... ....... ....... ....... ....... ....... ....... ....... .......
....... ....... ....... ....... ....... ....... ....... ....... .......
....... ....... ....... ....... ....... ....... ....... ....... .......
....... ....... ....... ....... ....... ....... ....... ....... .......
....... ....... ....... ....... ....... ....... ....... ....... .......
....... ....... ..... ....... ..... ....... ....... .......

1 individual in 1 pedigree matched as follows:

Individuals ..............          1
 + In database ...........          1
 |  + Attached ...........          0
 |  + Childless spouses ..          0
 |  + Unattached .........          1
 + Not in database .......          0
M>

Since we have no genotype data on her, we will ignore her. Will will want to verify that she is not included in output files for analysis.

Creating Files for Analysis

Now that our pedigree data are to the best of our knowledge clean, we are ready to create files for analysis.

First we load the genetic maps table that we prepared earlier:

M>load 'chr10.20.map.mfh'
Marker maps based on chr10.20.map.mfh are now installed.
M>

If you have already opened the corrected pedigree data file, you will see a notice about how the genotype fields are now ordered according to this map. If not, go ahead and open the pedigree file now:

M>open 'complete.data.mfh'
  8. AFFECTED has 3 levels.
Calculating allele frequencies for  10. D10S1652...
...
Calculating allele frequencies for  62. D20S98...
Pedigree table "complete.data.mfh" opened with        38 records
NOTE: Pedigree F0099 has 1 unconnected individual.
Genotype fields now ordered according to current map
Pedigrees reconstructed in 0.0000 seconds
Checking simple Mendelian inheritance in nuclear families... :
==============================================================
Inheritance inconsistency:      PEDIGREE        MOTHER  FATHER  MARKER
--------------------------      --------        ------  ------  ------
==============================================================

================================================
Summary of Mendelian Inheritance Inconsistencies
                   by Marker
================================================
 #      CHR.    RANK    MARKERNAME                              NUCLEAR FAMILIES
----    ----    ----    --------------------------------        ----------------
------------------------------------------------
Inconsistencies present among 0 of 53 markers.
================================================

  1.FAMID      Co__1   22.D10S220    Go_12   43.D20S171    Go_52
  2.STUDYID    Co__2   23.D10S249    Go__1   44.D20S173    Go_53
  3.SEX        Co__3   24.D10S537    Go_19   45.D20S178    Go_49
  4.FATHER     Co__4   25.D10S539    Go_14   46.D20S186    Go_36
  5.MOTHER     Co__5   26.D10S547    Go__4   47.D20S189    Go_35
  6.MZTWIN     Co__6   27.D10S548    Go__6   48.D20S193    Go_30
  7.DZTWIN     Co__7   28.D10S561    Go_16   49.D20S195    Go_46
  8.AFFECTED   Co__8+  29.D10S567    Go_13   50.D20S196    Go_50
  9.AGEDX      Po__1   30.D10S578    Go_11   51.D20S199    Go_28
 10.D10S1652   Go_17   31.D10S581    Go_18   52.D20S482    Go_32
 11.D10S1780   Go_10   32.D10S583    Go_20   53.D20S604    Go_37
 12.D10S1790   Go_15   33.D10S587    Go_24   54.D20S66     Go_38
 13.D10S189    Go__3   34.D10S591    Go__2   55.D20S842    Go_29
 14.D10S190    Go_23   35.D10S597    Go_22   56.D20S852    Go_40
 15.D10S191    Go__5   36.D20S100    Go_51   57.D20S875    Go_43
 16.D10S192    Go_21   37.D20S104    Go_41   58.D20S889    Go_31
 17.D10S197    Go__7   38.D20S107    Go_47   59.D20S910    Go_39
 18.D10S208    Go__9   39.D20S115    Go_34   60.D20S912    Go_45
 19.D10S212    Go_26   40.D20S117    Go_27   61.D20S95     Go_33
 20.D10S213    Go__8   41.D20S118    Go_44   62.D20S98     Go_42
 21.D10S217    Go_25   42.D20S119    Go_48
-----------------------------  --------- --------- ---------
Pedigrees and Individuals       Included  Excluded     Total
-----------------------------  --------- --------- ---------
Pedigrees ...................          1         0         1
Individuals .................         38         0        38
 + In database ..............         38         0        38
 |  + Attached ..............         37         0        37
 |  + Childless spouses .....          0         0         0
 |  + Unattached ............          1         0         1
 + Not in database ..........          0         0         0
M>

Notice how the genotype "G" fields have now been renumbered according to the genetic maps.

Note: If you already had the pedigree table open when you issued the load command, you can use list fields to display the list of fields.

Since we are not going to use age of diagnosis in our analysis, we want to toggle it off. Note that we can use the toggle command without specifying off or on. When we do this, fields that are "on" get turned "off", and vice versa. Also note that we don't really have to capitalize the field names in a command because the interpreter is not sensitive to capitalization:

M>toggle output flag for agedx
Genotype fields now ordered according to current map
M>list fields
  1.FAMID      Co__1   22.D10S220    Go_12   43.D20S171    Go_52
  2.STUDYID    Co__2   23.D10S249    Go__1   44.D20S173    Go_53
  3.SEX        Co__3   24.D10S537    Go_19   45.D20S178    Go_49
  4.FATHER     Co__4   25.D10S539    Go_14   46.D20S186    Go_36
  5.MOTHER     Co__5   26.D10S547    Go__4   47.D20S189    Go_35
  6.MZTWIN     Co__6   27.D10S548    Go__6   48.D20S193    Go_30
  7.DZTWIN     Co__7   28.D10S561    Go_16   49.D20S195    Go_46
  8.AFFECTED   Co__8+  29.D10S567    Go_13   50.D20S196    Go_50
  9.AGEDX      P       30.D10S578    Go_11   51.D20S199    Go_28
 10.D10S1652   Go_17   31.D10S581    Go_18   52.D20S482    Go_32
 11.D10S1780   Go_10   32.D10S583    Go_20   53.D20S604    Go_37
 12.D10S1790   Go_15   33.D10S587    Go_24   54.D20S66     Go_38
 13.D10S189    Go__3   34.D10S591    Go__2   55.D20S842    Go_29
 14.D10S190    Go_23   35.D10S597    Go_22   56.D20S852    Go_40
 15.D10S191    Go__5   36.D20S100    Go_51   57.D20S875    Go_43
 16.D10S192    Go_21   37.D20S104    Go_41   58.D20S889    Go_31
 17.D10S197    Go__7   38.D20S107    Go_47   59.D20S910    Go_39
 18.D10S208    Go__9   39.D20S115    Go_34   60.D20S912    Go_45
 19.D10S212    Go_26   40.D20S117    Go_27   61.D20S95     Go_33
 20.D10S213    Go__8   41.D20S118    Go_44   62.D20S98     Go_42
 21.D10S217    Go_25   42.D20S119    Go_48
M>

We are going to create files for analyzing chromosome 10, so let's turn off all chromosome 20 markers:

M>toggle off output flags for chromosome 20
Genotype fields now ordered according to current map
M>list fields
  1.FAMID      Co__1   22.D10S220    Go_12   43.D20S171    G
  2.STUDYID    Co__2   23.D10S249    Go__1   44.D20S173    G
  3.SEX        Co__3   24.D10S537    Go_19   45.D20S178    G
  4.FATHER     Co__4   25.D10S539    Go_14   46.D20S186    G
  5.MOTHER     Co__5   26.D10S547    Go__4   47.D20S189    G
  6.MZTWIN     Co__6   27.D10S548    Go__6   48.D20S193    G
  7.DZTWIN     Co__7   28.D10S561    Go_16   49.D20S195    G
  8.AFFECTED   Co__8+  29.D10S567    Go_13   50.D20S196    G
  9.AGEDX      P       30.D10S578    Go_11   51.D20S199    G
 10.D10S1652   Go_17   31.D10S581    Go_18   52.D20S482    G
 11.D10S1780   Go_10   32.D10S583    Go_20   53.D20S604    G
 12.D10S1790   Go_15   33.D10S587    Go_24   54.D20S66     G
 13.D10S189    Go__3   34.D10S591    Go__2   55.D20S842    G
 14.D10S190    Go_23   35.D10S597    Go_22   56.D20S852    G
 15.D10S191    Go__5   36.D20S100    G       57.D20S875    G
 16.D10S192    Go_21   37.D20S104    G       58.D20S889    G
 17.D10S197    Go__7   38.D20S107    G       59.D20S910    G
 18.D10S208    Go__9   39.D20S115    G       60.D20S912    G
 19.D10S212    Go_26   40.D20S117    G       61.D20S95     G
 20.D10S213    Go__8   41.D20S118    G       62.D20S98     G
 21.D10S217    Go_25   42.D20S119    G
M>

As a first pass, it makes sense to do a non-parametric multipoint analysis so that we don't have to make any assumptions about the mode of inheritance. There are a number of programs we could use: Genehunter, Allegro, Merlin, and Simwalk2 come immediately to mind. Since we have an extended pedigree, we should probably investigate its complexity before making a decision. Recall that the complexity of a pedigree is simply 2n-f where n=non-founders and f=founders.

Creating Files Investigating Pedigree Complexity

Madeline's query functions operate at the level of the individual. Nevertheless, Madeline provides a number of attributes, such as _NumberInPedigree, _NumberOfFounders, and _Complexity, that tell you about the pedigree as a whole:

M>view data _NumberInPedigree, _NumberOfFounders, _NumberOfNonFounders, _Complexity for _Complexity>=20

F0099   S00925  38      11      27      43
F0099   S00926  38      11      27      43
F0099   S00951  38      11      27      43
F0099   S00973  38      11      27      43
  .       .      .       .       .       .
  .       .      .       .       .       .
  .       .      .       .       .       .
F0099   U0001I  38      11      27      43
F0099   U0001J  38      11      27      43
F0099   U0001K  38      11      27      43

38 individuals in 1 pedigree matched as follows:

Individuals ..............         38
 + In database ...........         38
 |  + Attached ...........         37
 |  + Childless spouses ..          0
 |  + Unattached .........          1
 + Not in database .......          0
M>

Of course here we have but one pedigree, so everyone in our data set matched the query criteria! If we had a column for the proband, PROBAND, then the following query would be preferred, since it would return just one row for each matching pedigree and we would be able to instantly identify complex pedigrees across large data sets:

M>view data _Complexity for _IsProband and _Complexity>=20

Creating Files Creating Files for Simwalk2

Since the complexity of this pedigree is high, we are going to analyse it using Simwalk2. For the Simwalk format, Madeline requires that we execute a write locus file ... separately from the write pedigree file ... command. Here is the first command:

M>write locus file to 'chr10.loc' in simwalk format
Locus file "chr10.loc" has been written.

And here is the second command. Since Simwalk2 is very finicky about file names, we try to keep the names very short:

M>write pedigree file to 'ped.dat' in simwalk format
NOTE: Simwalk batch file, "BATCH2.DAT.ped.dat", has been created.
      Edit this file to change the parameters of your analysis.

NOTE: Simwalk map file "ped.dat.map" has been created.
Writing pedigree data to "ped.dat"
Individual U0001A in pedigree F0099 has missing values for all genotype variables
Individual U0001B in pedigree F0099 has missing values for all genotype variables
Individual U0001C in pedigree F0099 has missing values for all genotype variables
Individual U0001D in pedigree F0099 has missing values for all genotype variables
Individual U0001E in pedigree F0099 has missing values for all genotype variables
Individual U0001F in pedigree F0099 has missing values for all genotype variables
Individual U0001G in pedigree F0099 has missing values for all genotype variables
Individual U0001H in pedigree F0099 has missing values for all genotype variables
Individual U0001I in pedigree F0099 has missing values for all genotype variables
Individual U0001J in pedigree F0099 has missing values for all genotype variables
Individual U0001K in pedigree F0099 has missing values for all genotype variables
-----------------------------  --------- --------- ---------
Pedigrees and Individuals       Included  Excluded     Total
-----------------------------  --------- --------- ---------
Pedigrees ...................          1         0         1
Individuals .................         37         1        38
 + In database ..............         37         1        38
 |  + Attached ..............         37         0        37
 |  |  + With data ..........         26         0        26
 |  |  + Without data .......         11         0        11
 |  |  + Marked for exclusion          0         0         0
 |  + Childless spouses .....          0         0         0
 |  + Unattached ............          0         1         1
 + Not in database ..........          0         0         0
M>

From the summary table it is evident that the unattached individual was not included. The unsampled individuals who are required to maintain the pedigree structure were included even though they have no genotype data.

Running Simwalk

Madeline creates files for Simwalk version 2. We used version 2.83 to prepare this tutorial. As Madeline warns (in blue above), we need to manually edit the control file, BATCH2.DAT. Note that by default Madeline assumes you want to do a non-parametric analysis (batch item #01). At a minimum, we need to tell Simwalk the name of the locus file (batch item #10). It might also be a good idea to change the title to something more informative (batch item #03). Other than these, it looks like Madeline got the rest correct and the defaults are fine:

01                       ! batch item number
3                        ! analysis: 1=Haplotype; 2=LOD; 3=NPL; 4=IBD 5=Mistyping

02                       ! batch item number
33                       ! integer label for this run of the program

03                       ! batch item number
PUT YOUR ANALYSIS TITLE HERE

09                       ! batch item number
ped.dat.map              ! name of map file

10                       ! batch item number
chr10.loc                ! name of locus file

11                       ! batch item number
ped.dat                  ! name of pedigree file

12                       ! batch item number
F                        ! symbol for female (case insensitive)
M                        ! symbol for   male (case insensitive)

13                       ! batch item number
Y                        ! is trait listed in locus and pedigree files?

16                       ! label for affected individuals
A                        ! must match LOCUS and PEDIGREE files

18                       ! batch item number
0                        ! number of quantitative variables in pedigree file

48                       ! batch item number
10000                    ! number of unconditional simulations for p-values

Note that BATCH2.DAT is actually a symbolic link to a uniquely-named file (BATCH2.DAT.ped.dat in this example). This is done so that you can use Madeline to produce multiple Simwalk files (for different chromosomes, for example) in one directory without repeatedly clobbering the control file. The symbolic link will always point to the latest control file that you created.

Once you have edited the control file, start Simwalk by typing simwalk2 at your terminal's command prompt from within the directory containing your Simwalk files:

%> simwalk2

Because of the complexity of the pedigree, expect to wait an hour or two before Simwalk is done.

Examining the Results

With Madeline, you can quickly and easily obtain professional, publication-reay plots of analysis results:

Examining Results Converting Simwalk Results to Madeline Format

Simwalk2 produces a number of brilliantly-named result files. The file called STATS-33.ALL contains the overall results, while the files named STATS-33.001, STATS-33.002, STATS-33.003... and so on contain the individual results for each pedigree. In our example data there is only one pedigree, so we only have STATS-33.001 and STATS-33.ALL which of course in this case provide the same results. (For the impatient, we have provided the Simwalk2 results in the SimwalkResults subdirectory). If you look at the result files, you'll see that we have some pretty juicy results, so we definitely want to plot them.

In the case of Simwalk2, Madeline provides a streamlined way to convert the results files to Madeline format. This works for both parametric and non-parametric Simwalk2 results:

M>convert simwalk file 'STATS-33.ALL' to 'chr10.results'
Converting input file "STATS-33.ALL"
to Madeline-formatted output files,
"chr10.results" and "chr10.results.mfh" ...
================
Converting file
================
This appears to be an NPL analysis of chromosome 10 ...
Graphing files "chr10.results" and "chr10.results.map" created ...
=================
Recognizing files
=================
HEADER block spans lines 1 to 7.
DATA block spans lines 9 to 34.
Skipping a total of 8 lines at top.
There are 7 non-empty header lines and 26 data lines.
Data records are 68 bytes long.

 # . Field Name  Start End   Length Prec. Space Type
---- ----------- ----- ----- ------ ----- ----- -----
  1. MARKERNAME      1    10    10     0     5 C
  1. MARKERNAME      1    10    10     0     5 C
  2. POSITION       16    23     8     4     4 N
  3. STAT_A         28    32     5     3     4 N
  4. STAT_B         37    41     5     3     4 N
  5. STAT_C         46    50     5     3     4 N
  6. STAT_D         55    59     5     3     4 N
  7. STAT_E         64    68     5     3     0 N
Binary recognition header file ("chr10.results.mfh") written.
This appears to be an ANALYSIS RESULTS TABLE which can be opened using:

        graph open "chr10.results.mfh"

--------------------
Associated map file:
--------------------
HEADER block spans lines 1 to 4.
DATA block spans lines 6 to 31.
Skipping a total of 5 lines at top.
There are 4 non-empty header lines and 26 data lines.
Data records are 31 bytes long.

 # . Field Name  Start End   Length Prec. Space Type
---- ----------- ----- ----- ------ ----- ----- -----
  1. CHROMOSOME      1     2     2     0     2 N
  2. ORDINAL         5     6     2     0     4 N
  3. MARKERNAME     11    18     8     0     5 C
  4. POSITION       24    31     8     4     0 N
Binary recognition header file ("chr10.results.map.mfh") written.
This appears to be a MAP TABLE which can be opened using:

        load "chr10.results.map.mfh"

M>

Notice that Madeline creates both an analysis results table and an accompanying map table based on the Haldane map provided in the Simwalk results. Simwalk provides five different statistics called STAT A, B, C, D and E. We are now ready to graph the results.

Examining Results Graphing the Results

For your convenience, in the SimwalkResults subdirectory we have also included the chr10.results and chr10.results.map files created in the previous step.

We can now use the various graph command options to plot the results. Reading from the Simwalk results file or documentation, we see that statistic A is most powerful at detecting linkage to a recessive trait, statistic B is most powerful at detecting linkage to a dominant trait, and statistics C, D and E are more general statistics indicating whether a few founder-alleles are overly represented among the affecteds. Statistic E is the NPL_all statistic from Genehunter and is the one we will plot in this example.

Plotting a graph requires a table with a GraphPositionField for the horizontal axis, and a GraphScoreField for the LOD scores on the vertical axis. Since Madeline assigns a label of "SCORE" to the GraphScoreField by default, this is the first thing we change:

M>?GraphScoreField
"SCORE"
M>GraphScoreField="STAT_E"
M>

We now use the graph load, open, and plot commands:

M>graph load 'chr10.results.map.mfh'
Marker maps based on chr10.results.map.mfh are now installed.
M>graph open 'chr10.results.mfh'
Low=0.00 High=186.20 Range=186.20 Magnitude=1
Stt=0.00 End =190.00 NewRange=190.00
TickBasis=10.00
i=2 MajorTick is now 20.00
i=9 Adj. end: 190.00 rem=10
Low=0.11 High=4.00 Range=3.89 Magnitude=-1
Stt=0.10 End =4.00 NewRange=3.90
TickBasis=0.10
i=0 MajorTick is now 1.00
i=9 Adj. stt: 0.10 rem=0.1
M>graph plot
Graph printed to "madeline.graph.ps"
Calling external viewer using the command "gv madeline.graph.ps" ...
M>

Here is our initial plot:

Initial plot of chromosome 10 NPL results

Notice that simply loading a genetic map using graph load or load (these two commands are equivalent) is all you have to do to get Madeline to place the "raining down" marker names on the graph.

Now let's polish the plot a little. We need to add some white space at the top so that the "raining down" marker names don't intersect with the highest peak. It also wouldn't be a bad idea to provide a more informative title, change the vertical axis label, and end the horizontal axis at 190 instead of 200 to make the graph look more centered.

Let's lookup all the variables and commands related to graphing, so we won't get lost or mistype anything:

M>lookup 'graph'
graph is a command.
GraphAnnotations is an associative array. It accepts character string keys and maps them to character string values.
GraphDrawingFile is an internal variable. Its current value is "madeline.graph.ps".
GraphPositionField is an internal variable. Its current value is "POSITION".
GraphScoreField is an internal variable. Its current value is "STAT_E".
GraphTitle is an internal variable. Its current value is "Multipoint Analysis".
GraphXAxisLabel is an internal variable. Its current value is "Map Position (cM)".
GraphXAxisMajorTick is an internal variable. Its current value is 20.000.
GraphXAxisMaximum is an internal variable. Its current value is 200.000.
GraphXAxisMinimum is an internal variable. Its current value is 0.000.
GraphXAxisMinorTick is an internal variable. Its current value is 10.000.
GraphYAxisLabel is an internal variable. Its current value is "LOD Score".
GraphYAxisMajorTick is an internal variable. Its current value is 1.000.
GraphYAxisMaximum is an internal variable. Its current value is 4.000.
GraphYAxisMinimum is an internal variable. Its current value is 0.000.
GraphYAxisMinorTick is an internal variable. Its current value is 0.500.
M>

So here we go:

M>GraphDrawingFile="chr10.graph.ps"
M>GraphTitle="Chromosome 10 NPL Analysis"
M>GraphYAxisLabel="NPL_all Statistic"
M>GraphXAxisMaximum-=10
M>GraphYAxisMaximum+=1.5
M>

Changing the name of the graph file (first assignment statement above) is optional. Notice the last two assignments. If you are familiar with C, C++, Perl or other programming languages, you may well have seen assignment statements of this form before. It would be just fine to do this:

M>GraphXAxisMaximum=190

... or, if you like typing a lot, even this:

M>GraphXAxisMaximum=GraphXAxisMaximum-10

... but Madeline also provides the "+=" and "-=" (and also "*=" and "/=") assignment operators for expression brevity.

Here then is our revised plot:

Initial plot of chromosome 10 NPL results

There are a number of ways to embellish a plot using other options of the graph command. Refer to the documentation for more information.

We hope you have enjoyed using Madeline in this tutorial. An analysis of the chromosome 20 data is left as an exercise. Enjoy!


2004.07.15.ET. End of document.