![]() HL7 V3 CG GEXPR, R1 HL7 Version 3 Standard: Clinical Genomics; Gene Expression, Release 1 Informative Ballot 1 - May 2012 |
![]() HL7 V3DAM GEXPR, R1 HL7 Version 3 Domain Analysis Model: Clinical Genomics; Gene Expression, Release 1 Last Ballot: Informative Ballot 2 - May 2010 |
![]() ANSI/HL7 V3 CGPED, R1-2007 HL7 Version 3 Standard: Clinical Genomics; Pedigree, Release 1 7/5/2007 |
| Responsible Group | Clinical Genomics Work Group HL7 |
| Co-Chair, Facilitator & Primary Contributor | Amnon Shabo (Shvo), Ph.D. shabo@il.ibm.com IBM Research Lab in Haifa |
| Co-Chair | Kevin S. Hughes, M.D., FACS KSHUGHES@PARTNERS.ORG Massachusetts General Hospital, Partners HealthCare |
| Co-Chair | Mollie Ullman-Cullere MULLMANCULLERE@PARTNERS.ORG Harvard-Partners Center for Genetics and Genomics |
| Co-Chair and RCRIM liaison | Phil Pochon Phil.Pochon@covance.com Covance |
| Past Co-Chair | Scott Whyte scott.whyte@chw.edu Catholic Healthcare West |
| Vocabulary Facilitator | Joyce Hernandez joyce_hernandez@merck.com Merck Research Laboratories |
| Publishing Facilitator | Grant Wood grant.wood@imail.org Intermountain Healthcare Clinical Genetics Institute |
| Meeting Minutes | Marta Jaremek Marta.Jaremek@siemens.com Siemens AG |
| Project Management Facilitator | Rick Haddorff Haddorff.Richard@Mayo.edu Mayo Clinic |
| Vocabulary contributor | Yan Heras Yan.Heras@imail.org Intermountain Healthcare |
| Contributor | Scott Bolte Scott.Bolte@med.ge.com GE Healthcare |
| Implementation guide editor | Joann Larsen Joann.Larsen@kp.org Kaiser Permanente |
| Modeling and Vocabulary Contributor | Perry Mar PMAR@PARTNERS.ORG Partners HealthCare - Clinical Informatics Research and Development |
| Modeling Contributor | Allen Hobbs allen.hobbs@kp.org Kaiser Permanente |
| Modeling Contributor | Michael Miller Rosetta Biosoftware |
| Contributor | Mark Sires msires@gte.net Sires Consulting |
| Modeling & Vocabulary Contributor | Shosh Israel, Ph.D. israels@hadassah.org.il Hadassah University Hospital |
| Modeling & Vocabulary Contributor | Chuanbo Xu Chuanbo_Xu@perlegen.com Perlegen |
| MnM Facilitator (retired) | Charlie Mead, MD MSc Booz Allen Hamilton |
| Storyboard Contributor | Pravat Das Mayo Clinic |
| Storyboard Contributor | Jennifer Fostel, Ph.D. NIH - National Center for Toxicogenomics |
| Storyboard Contributor | Brent Gendleman 5AM Solutions |
| Storyboard Contributor | Jim Holeman HP Nonstop Enterprise Division |
| Storyboard Contributor | Anajane Smith Fred Hutchinson Cancer Research Center |
| Storyboard Contributor | Derek Walker Fred Hutchinson Cancer Research Center |
| Storyboard Contributor | Sue-Jane Wang, Ph.D. US. Food and Drug Administration |
| Storyboard Contributor | Mathieu Wiepert Mayo Clinic |
| Storyboard Contributor | Katie Wittrup Mayo Clinic |
HTML Generated: 2012-04-01T15:36:26
Content Last Edited: 2009-12-02T14:33:13
HL7® Version 3 Standard, © 2007 Health Level Seven® International All Rights Reserved.
HL7 and Health Level Seven are registered trademarks of Health Level Seven International. Reg. U.S. Pat & TM Off.
Use of these materials is governed by HL7 International's IP Compliance Policy.
As has been done with topics in a number of other Normative domains, the Pedigree topic in the Clinical Genomics domain has been removed from the V3 Ballot Web site beginning with the January 2010 ballot. This material has been removed because a Normative version of this material is available in the HL7 V3 Normative Edition, beginning with the 2008 release, and there is currently no ongoing ballots for this topic. This statement has been inserted as a placeholder to direct readers to approved versions of this domain content.
The HL7 Version 3 Normative Edition is available to HL7 members as a free download on the V3 Messaging Standard page of the HL7 web site. Non-members can purchase a copy of the Normative Edition online at the HL7 Store. For those who are not implementing and merely want to review the ballot version of this content, draft content remains available in previous ballot cycle versions of the V3 Ballot Web Site. The Version 3 Ballot Site Archive provides links to previous ballot web sites. Readers wishing to review the domain material are directed to the September 2009 ballot web site or earlier.
The Clinical Genomics SIG have developed thus far three topics - (1) Pedigree (Family History), (2) Genotype and (3) GeneticVariation. The Pedigree Topic has been approved as normative in May 2007 (after being part of the Clinical Genomics DSTU package). The Genotype Topic was approved as DSTU in May 2005 and two updates have been approved since then. The current goal of the Clinical Genomics Working Group is to bring the Genotype Topic to normative as well. However, due to the broad scope of the Genotype DSTU, the decision is to progress to Normative in a step-wise approach so that each focal area of the DSTU will be balloted as a Normative Topic, containing a constrained R-MIM of the DSTU models.
The stepwise approach is based on identifying those areas in the Genotype Topic that are (1) most relevant to our stakeholders and (2) have been actually experimented since the DSTU passed ballot. As a result of these considerations, the group chose the genetic variation area to be the first area to progress to normative. Indeed, the Clinical Genomics domain includes a new topic called "Genetic Variation" whose model is a constraining of the Genotype models. Therefore, readers interested in standard specifications for genetic variation should use the GeneticVariation Topic and refer to the Genotype DSTU only as an overarching model describing how the various types of core genomic data could be associated with phenotypic data.
|
Since its formation, the Clinical Genomics Working Group has been developing HL7 V3 standards to enable the exchange of interrelated clinical and personalized genomic data between interested parties. In many cases the exchange of genomic data is done between disparate organizations (healthcare providers, genetic labs, research facilities, etc.) and acceptable standards are crucial for the usefulness of genomic data in healthcare practice. It is envisioned that the use of genomic data in healthcare practice will become ubiquitous.
The Clinical Genomics domain addresses requirements for the interrelation of clinical and genomic data at the individual level. Much of the genomic data is still generic, for example the human genome is in fact the DNA sequences believed to be the common sequences in every human being. The vision of 'personalized medicine' is based on those correlations that make use of personal genomic data such as the SNPs (Single Nucleotide Polymorphisms) that differentiate any two persons and occur about every thousand bases. Beside normal differences, health conditions such as drug sensitivities, allergies and others could be attributed to the individual SNPs or to differences in gene expression and proteomics.
The emphases of the Clinical Genomics domain are the personalization of the genomic data and the 'intelligent' linking to relevant clinical information. These links are probably the main source from which geneticists (genomicists?) and clinicians could benefit. The cases where genomic data are used in healthcare practice vary in complexity and extent of the data used, since the current testing methods are still very expensive and not widely used. We can see simple testing like identifying genes and mutations as well as full sequencing of alleles and the use of micro-arrays to identify the expression of vast number of genes in each individual. Naturally, the Clinical Genomics Working Group has been focusing on tests that are routinely done in healthcare, while preparing the information infrastructure standard for more futuristic cases.
At a first sight it seems that genomic data sets are yet another type of observations. While this is true of course, there are a few characteristics that might distinguish it from typical observations such as blood pressure or potassium level:
The amount of data: potentially it could be the entire human genome
The personalization of the data is evolving as new discoveries are constantly made
The complexity of the data: not only the DNA sequences (...AGCT...) need to be represented, but also SNPs, annotations (automatic and manual), gene expression, protein translation, and more
The emerging standard formats being used by bioinformatics communities, for example: BSML (Bioinformatic Sequence Markup Language), MAGE-ML ( Microarray and GeneExpression Markup Language)
Various standard organizations and many stakeholders are involved
The clinical-genomic correlations are represented in variety of different ways depending on the point of view (clinical research, pharmaceutical or healthcare)
The core Genotype model is the GeneticLocus model. It consists of various types of genomic data relating to a specific DNA locus including sequencing, expression and proteomic data. Within the GeneticLocus model we have utilized existing bioinformatics markups to represent raw data received from genomic facilities. Examining and constraining these markups is a work in progress and thus this part of the GeneticLocus model is considered informative as well.
The FamilyHistory model is aimed at describing a patient's pedigree with genomic data and thus utilizes the GeneticLocus model (as a CMET) to carry the genomic data for the patient's relatives.
Diagram

Preface:
NOTE: THIS DIM IS NOT UNDER BALLOT AND CONSISTS OF PORTIONS OF THE DEPRECATED DSTU AS WELL AS NEW STRUCTURES TO ACCOMODATE NON-LOCUS SPECIFIC DATA, ALL WITHIN AN UMBRELLA OF "GENOME" AS THE ULTIMATE ORGANIZER. THIS MODEL IS PRESENTED FOR THE PURPOSE OF INTERNAL DISCUSSION ONLY.
In previous ballot cycles, the Clinical Genomics DIM used to be an aggregation of two core models within the DSTU Genotype Topic (GeneticLocus and GeneticLoci), along with the Pedigree Topic model (Family History). Since then, the Pedigree Topic has been already approved as normative and the DSTU Genotype Topic has been frozen and is going to be deprecated. As part of our effort to move the DSTU to Normative, the Clinical Genomics Domain has a new Topic called "Genetic Variation" which has a model that constrains the GeneticLoci and GeneticLocus models and is balloted in the normative track.
The roadmap of the Clinical Genomics Domain is to ballot more areas of the domain as normative and eventually aggregate all normative models into a revised Domain Information Model. The DIM available here contains the Genetic Variation model from Septemeber 2009 along with other portions of the core DSTU models in order to have a 'continuity of standards' in the Clinical Genomics Domain. This domain model will be continuously updated as the normative ballots of the various areas of genomics get finalized.
DIM Walk-Through
General Notes
* The Use of the 'id', 'code' and 'value' Attributes:
The use of these attributes in the various classes depends on the extent to which the data has being personalized and how different are the results from the known genome. It is also different in those classes that encapsulate raw genomic data. For example, in the IndividualAllele class, in the case that the patient's allele was fully sequenced and found to be slightly different than the one registered in GenBank or other reference databases, there is no external code to place in one of those attributes, rather the IndividualAllele class is associated with the Sequence class where the individual sequence could be placed in the value attribute. If it is a new allele indeed, temporary identifiers could be placed in the id attribute until it is registered externally. If, however, that is a known allele, then the 'value' attribute can be populated with the appropriate code from GenBank for example. In this case there isn't much point in populating the Sequence class as it can be retrieved from GenBank, but for self-containment purposes in a specific implementation, it could be that the GenBank sequence will be copied and placed in the Sequence class. As fot the 'id' attribute, it should be used to uniquely identifying that specific instance (possibly using the LSID format). The 'code' attribute should identify the kind of data stored in the 'value' attribute, and the 'value' attribute should hold actual data, for example, a characteristic (e.g., heterozygous) or an external gene code from GenBank, dbSNP. In the 'encapsulating' classes (e.g., Sequence, Expression, etc.) the 'value' attribute should hold the bioinformatics markup itself. In the latter case, the code should hold an indication of the exact bioinformatics format used to populate the 'value' attribute.
* Vocabularies
All vocabularies presented in the model walk-through below should be considered informative part of the this ballot document and in general were imported from the deprecated DSTU and are included here for illustration purposes. The ultimate goal is to have codes drawn from internationally-recoginzed controlled vocabularies such as SNOMED and LOINC. For example, it is possible to use newly-created LOINC codes in the area of genetic testing results for healthcare environments, developed within the HL7 v2 Implementation Guide effort. A few of those LOINC codes are presented below where appropriate, e.g., for DNA variation type, Overall interpretation and others. The use of these value sets is further constrained in the various Topics of the Clinical Genomics Domain as well as in implementation guides for specifc realms or use cases.
Genome:
The Genome class is the highest entry point in terms of the genomic data collected about an individual. Presumably, one can represent the entire genome of a patient for example in this area of the model.
Associations:
The Genome class has two major associations that allow for representation of locus-specific data and other types of data that cannot be tied to specific loci. A locus in our domain is referred to a location on the genome which has the size of a gene.Genetic Loci:
The 'Genetic Loci' portion of the DIM allows for representation of data relating to a set of loci along a genome. The set of loci could be of differnet types, for example, a haplotype (allele or SNP), a genetic profile, a biological pathway, a set of genetic test results which contains results of multiple genes, etc.Genetic Loci Walk-Through:
Entry Point:
GeneticLoci
The entry point is a GeneticLoci class allowing the representation of the type of this loci group (e.g., allele haplotype like in tissue typing of HLA antigens) and an optional code for identifying the loci set (if available).
GeneticLoci Attributes:

LociChoice
A GeneticLoci instance consists of zero to many GeneticLocus CMETs or other GeneticLoci classes. This recursive structure allows the representation of a complex set of genetic loci as comprised of 'sub' genetic loci sets. The actual genetic/genomic data is represented through GeneticLocus instances at any level of nesting required.
The AssociatedObservation class associated with GeneticLoci is a place holder for various observations related to the set of loci that have been observed independently of the parent observation. This is a generic catcher for data that can be placed in any other class in this model. Population of the code & value attributes of this class is controlled by a vocabulary which is still under development.
In cases where there is an interpretation to the entire set of loci data, it is possible to use the interpretationCode attribute of the GeneticLoci class or populate the associated Phenotype model (CMET). Examples of overall interpretation codes can be demonstrated through the follwoing LOINC answer lists. Note that since the interpretation code is a single attribute, the context of each answer list should be represnted by the code attribute. For example, in the case of the first answer list below, "Genetic disease analysis" is the type of Genetic Loci observation while any of the codes on the answer list could be a valid interpretaion, i.e., the overall result of the the analysis.
LOINC "Genetic disease analysis overall interpretation" - 51968-6
LOINC "Genetic disease analysis overall carrier interpretation" - 53039-4
LOINC "Drug efficacy analysis overall interpretation" - 51964-5
LOINC "Drug metabolism analysis overall interpretation" - 51971-0
While the value attribute typically holds a common code of this set of loci, it can also hold raw genomic data for the entire set of loci. For example, if the set of loci is a gene expression assay, the value attribute can hold the part of the MAGE-ML xml that holds data on all loci.
Other Classes
GeneticReportDocument
A report document that summarizes the data of a genetic loci set could be associated with the GeneticLoci class and is of type DOCCLIN which represents in HL7 a CDA type (Clinical Document Architecture). It is possible to actually embed an entire CDA document in the text attribute of the GeneticReportDocument class or just point to this document through the id attribute. Other related reports (e.g., follow-up's, addenda) could be associated as well. For more details about the use of clinical documents, please refer to the CDA and the Medical Records specification in the HL7 V3 Ballot Package.
sequelTo
This is a recursive association to the GeneticReportDocument class, which allows the representation of several documents that relate to one another and to the Genetic loci of course. For example, an addendum or follow-up document to the first summary.
Participants
In general, a group of optional participants are associated with both the GeneticLoci and the GeneticTestOrder classes allowing the recording of participants of the order act as well as the results fulfilling that order:
recordTarget
The record target indicates whose medical record holds the documentation of this act (i.e., the order or the results). This is especially important when the subject of a service is not the patient himself. Note that the subject can be overridden in certain GeneticLocus instances. This could be useful when describing for example genomic data relating to various specimens (healthy and tumor tissues) or relating to virus, as part of a genetic testing of a patient who carries this virus.
author
Author of the data.
performer
Performer of the observations.
verifier
Verifier of the observations.
informationRecipient
To whom the data should be sent.
Genetic Locus:
The GeneticLocus portion of the DIM describes data relating to a genetic locus, which we propose to be the basic unit of genomic information exchange in healthcare. This model is not meant to be a biological model; rather it is aimed at the needs of healthcare with the vision of personalized medicine in mind. Also it could facilitate the needs of clinical research conducted within the healthcare enterprises as well as the needs of clinical trials. This model is the result of the group effort to look for the commonalities in each genomic-oriented storyboard that we've initially explored (i.e., Tissue Typing, Cystic Fibrosis, BRCA and Pharmacogenomics). The entry class GeneticLocus might be further constrained by its main subject (e.g., human, animal, and viral) or by type of genomic data (e.g., DNA, Expression and Proteomics). Those types of constraining are presented in the various Topics of this HL7 Domain, e.g., the Genetic Variation Topic where both the Genetic Loci and Genetic Locus portions of this DIM were constrained to represent variation data.
The Genetic Locus area of the DIM evolved from the work on several use cases that involve genomic data: Tissue Typing, Cystic Fibrosis, BRCA, and Pharmacogenomics. For example, in the tissue typing use case for bone-marrow transplantation (BMT) we have identified the follwoing: messages and documents being exchanged; tissue-typing observations, i.e., the individual tissue-typing observation and the matching observation which indicates the level of matching between two individual tissue-typing observations (e.g., patient and donor). and finally the individual genotype that describes a pair of HLA alleles. In this use case, the latter observations could be described by the Genetic Loci and Locus areas of this DIM.
Genetic Locus Main Characteristics:The entry point to this area is a GeneticLocus observation which could be associated with a pair of alleles on paternal and maternal homologous chromosomes.
Entry Point and Locus/Gene/Allele Classes:
GeneticLocus
Important note: The name 'GeneticLocus' refers to ALL genomic data and aspects of a specific locus along a chromosomal or mitochondrial DNA.
The GeneticLocus class is the entry point of describing locus-level data, e.g., gene, genetic marker, small variation, etc. A genotype commonly stands for an allele pair - from paternal and maternal homologous chromosomes. However, in this model it could be that (1) only one allele is associated with the locus (in cases of insufficient data or interest in one allele only); (2) no alleles are associated with the locus in cases where the locus' alleles have not been determined but there is a need to represent data related to the locus such as expression data, variations and even sequences; and (3) multiple allele are available (in case of tumor tissues where several acquired (somatic) variants are encountered).
GeneticLocus Attributes:

Table 1: GeneticLocus.value

refrence
The reference class represents a significant relationship with another locus. Note that it is possible to either expand the associations of the referred allele, or just indicate its id, assuming that it is detailed elsewhere (and accessible using its id). The association class called reference has a typeCode attribute currently set to REFR but we are developing a new vocabulary with codes like FUNCTIONAL, PHYSICAL, SIGNALING, and METABOLIC_PATHWAY that will be described in subsequent ballot cycles.
The AssociatedObservation class associated with GeneticLocus is a place holder for various observations related to a locus, for example, a Copy Number value that represents the number of copies of this gene or allele. The class has a shadow associated with all other classes thus provding a generic mechanism to hold asscoaited observations controlled by vocabularies. Both GeneticLocus and IndividualAllele share the same vocabulary at this point but might be separated in future versions. The code attribute holds the type of Observation, e.g., COPY_NUMBER and the value holds the actual result. Another example would be code=ZYGOSITY and value could then be either HOMOZYGOTE or HETEROZYGOTE. See tables 3&4 for more possible codes.
Table 3: AssociatedObservation.code

Table 4 lists the vocabulary from which codes are drawn to populate the value attribute. The abstract codes in this vocabulary are the codes from table 3 and thus these two vocabularies will be maintained in synch.
Table 4: AssociatedObservation.value

A new LOINC value set is dedicated to Allelic state and can be used here:
LOINC "Allelic state" - 53034-5
Place the code 53034-5 ("Allelic state") in the code attribute and assign one of the codes in the answer list into the value attribute of the AssociatedObservation class.
Important note: The term 'Individual Allele' doesn't refer necessarily to a known variant of a gene (or any locus), rather it refers to the patient data regarding the locus that might contain personal variations (e.g., rare SNPs with unknown-significance). In addtion, the individual allele could also be a wild type of that allele, i.e., no variations were found that could indetify this allele as one of the known alleles.
The GeneticLocus class is associated with 0 to many alleles represented by the IndividualAllele class. The IndividualAllele class identifies the specific allele instance (using the id attribute) and optionally specifies its external code (if known) and the method by which it was identified.
IndividualAllele Attributes:

Table 5: IndividualAllele.value:

Table 6: IndividualAllele.methodCode:

AssociatedObservation (SHADOW )
This is a shadow of the AssociatedObservation class. Please refer to the description that class in the context of GeneticLocus for more details.
reference
The reference class represents a related allele of a different locus, and still has significant interrelation with the source allele (this is a recursive association of IndividualAllele). See the equivalent class in association with GeneticLocus for more details.
Encapsulating Classes
Sequence
The Sequence class is a generalization of all types of genetic-releated sequences (i.e., DNA, RNA, Protein) preferably encapsulating the raw sequencing results of the DNA, and the derived sequences of the resultant RNA and protein molecules. The Sequence class has a recursive relation that makes it possible to nest an RNA sequence within a DNA sequence, and a protein sequence within an RNA sequence. The relationship type is DRIV (derivation) as the nested Sequence classes are meant to be placeholders for sequences that were computed from the first Sequence class which is the only "encapsulating" class in this path (by 'first' we mean the one that is associated directly with the IndividualAllele class or with the GeneticLocus class in case of non-allelic data sets).
Sequence Attributes:

Table 7: Sequence.methodCode

AssociatedProperty
The AssociatedProperty class is a placeholder for various properties that relate to the parent class (e.g., the Sequence class), which are supposed to be extracted (bubbled-up) from the raw data encapsulated in the parent class. This class is basically a code-value pair allowing the association of multiple properties with the core observation class which sets the context of these property observations in terms of identification and time for example. See discussion about the differences between associated observations versus properties further on. Table 8 lists the vocabulary from which codes are drawn to populate the code attribute while table 9 lists the vocabulary from which codes are drawn to populate the value attribute. The two vocabularies are synchronized in the sense that table 8 codes are the abstract codes in table 9 and each of them defines the vocabulary (nested within the abstract code) used when that abstract code was selected to populate the code attribute.
Table 8: AssociatedProperty.code (associated with Sequence)

Table 9: AssociatedProperty.value (associated with Sequence)

SequenceVariation
The class SequenceVariation is a generalization of all variation types, i.e., in all molecules (DNA, RNA, Protein) and of all types within each molecule (e.g., in DNA: SNP, Mutation, large deletion, etc.).
SequenceVariation Attributes:

Table 10: SequenceVariation.interpretationCode

A more advanced value set for sequence variation interpretation is the following LOINC answer list: (Note that since the interpretation code is a single attribute, the context of each answer list should be represnted by the SequenceVariation code attribute. See an example in the genetic Loci interpretation code)
LOINC "Genetic disease sequence variation interpretation" - 53037-8
Sequence variation interpretation realted to drug efficacy is represented by the following LOINC answer list:
LOINC "Drug efficacy sequence variation interpretation" - 51961-1
Associations:
AssociatedProperty
The class AssociatedProperty is a place holder for various properties that relate to a sequence variation, for example, position, length, region, reference and more. It replaces the distinct observations we had in previous versions of the DSTU Genotype model. This class is basically a code-value pair allowing the association of multiple properties with the core variation class which sets the context of these property observations in terms of identification and time for example. See discussion about the differences between associated observations versus properties further on. Table 11 list the vocabulary from which codes are drawn to populate the code attribute while table 12 lists the vocabulary from which codes are drawn to populate the value attribute. The two vocabularies are synchronized in the sense that table 11 codes are the abstract codes in table 12 and each of them defines the vocabulary (nested within the abstract code) used when that abstract code was selected to populate the code attribute.
Table 11: AssociatedProperty.code (asscoated with SequenceVariation)

Table 12: AssociatedProperty.value (asscoated with SequenceVariation)

A major challenge is to accurately identify the type of sequence variation represented here. A set of DNA sequence variation types is presented here by their LOINC codes. The LOINC code that identifies the following value set is 48019-4 representing the LOINC component "DNA sequence variation type". Thus, it is sufficient to place this LOINC code in the code attribute of the AssociatedProperty class (associated with the SequenceVariation class) and one of the codes from the list below in its value attribute.
LOINC "DNA sequence variation type" - 48019-4
Another example is value set is the value set "Amino acid change type" - LOINC code 48006-1. It is possible to place this LOINC code in the code attribute of the AssociatedProperty class (associated with the SequenceVariation class) and one of the codes from the list below in its value attribute.
LOINC "Amino acid change type" - 48006
Expression
The class Expression is a generalization of all types of expression data (typically DNA-->RNA but also protein). Its code attribute identifies the type of expression data it carries. This class is one of the encapsulating classes, that is, it holds in its value attribute portions of relevant bioinformatics markup (e.g., MAGE-ML for gene expression data), complying with constrained schemas of the full-fledged markups. In such a case, the code attribute holds the exact reference to the contained bioinformatics schema which the value's content should comply with. Note that the association cardinality between this class and its source class IndividualAllele is zero to many. The idea here is to be able to represent gene expression over multiple experiments for the same allele under possibly various clinical environments and expression testing methods. If this association is traversed several times then it's mandatory to populate the id & effectiveTime attributes so that each object of this class will be distinguished clearly and identified uniquely.
Expression Attributes:

AssociatedProperty
The AssociatedProperty class is a place holder for various properties that relate to expression data, for example, normalized intensity, qualitative indication, p-value and more, which are supposed to be extracted (bubbled-up) from the raw expression data encapsulated in the Expression class. This class is basically a code-value pair allowing the association of multiple properties with the core expression class which sets the context of these property observations in terms of identification and time for example. See discussion about the differences between associated observations versus properties further on. Table 13 list the vocabulary from which codes are drawn to populate the code attribute while table 14 list the vocabulary from which codes are drawn to populate the value attribute. The two vocabularies are synchronized in the sense that table 13 codes are the abstract codes in table 14 and each of them defines the vocabulary (nested within the abstract code) used when that abstract code was selected to populate the code attribute.
Table 13: AssociatedProperty.code (associated with Expression)

Table 14: AssociatedProperty.value (associated with Expression)

Table 15: AssociatedProperty.methodCode (associated with Expression)

Other Classes and Proteomics
Polypeptide and DeterminantPeptide
The Sequence class could be associated with its resultant or corresponding polypeptides (represented by the Polypeptide class) as well as with determinant peptides if applicable (represented by the DeterminantPeptide class). Note that the Sequence class has a recursive relation and it is possible to nest an RNA sequence within a DNA sequence, and a protein sequence within an RNA sequence. The Polypeptide could then be associated with the protein sequences or directly with any of the above levels. Also, it is possible to associate the DeterminantPeptide with the Polypeptide class or directly with the Sequence class. Both classes (Polypeptide and DeterminantPeptide) could be associated with several instances of the Phenotype model.
The proteomics classes in this model represent protein data derived from the sequences (by means of computational biology) and is not intended to be a direct observation of some protein. The latter could be represented as regular lab results (using the HL7 Lab specs), which could be referenced in the GeneticLocus instance as if they were phenotype observations.
A common case for the use of proteomics in this model could be as follows: Checking whether an amino acid change would result from the variant; if so - whether the new amino acid change is to an amino acid of a different size or charge state that would likely change the shape of the active region of the protein; how far the change is from the active site; whether the change is in a regulator region, and so forth. These observations could then be associated to phenotypic data.
For example, consider the following case described in OMIM: "Despite the dramatic responses to EGFR inhibitors in patients with non-small cell lung cancer, most patients ultimately have a relapse. Kobayashi et al. (2005) reported a patient with EGFR-mutant, gefitinib-responsive, advanced non-small cell lung cancer who had a relapse after 2 years of complete remission during treatment with gefitinib. The DNA sequence of the EGFR gene in his tumor biopsy specimen at relapse revealed the presence of a second mutation {131550.0006}. Structural modeling and biochemical studies showed that this second mutation led to the gefitinib resistance." (OMIM *131550)
Polypeptide and DeterminantPeptide Attributes:

To Phenotype and Beyond...
Phenotype
The Phenotype CMET is meant to complement or replace the use of the interpretation codes presnet in all core classes of thsi model. While the interpretatioon code attribute could hold a single code, the Phenotype model has the full expressiveness of the Clinical Statement mode. Phenotype is a separate model for modularity reasons - it is possible to make changes in this model with changing any of the Topic model derived from the DIM.The entry point to this model is a choice box that has two 'stub' observations targeted at distinguishing between tow basic types of phenotypes: observed phenotypes and interpretive phenotypes. While the former represents observations made in the subject, the latter is an interpretation based on some evidence. To actually represent a phenotype (whether observed or interpretive), the choice box is associated with the Clinical Statement CMET so that the full expressiveness of that model is available to represent phenotypic data.
pertinentInformation
This ActRelationship class (named pertinentInformation) represents the association of a genomic observation with a number of phenotypic observations. Its mandatory attribute typeCode holds the semantics of what is the type of this association. It is defined as <=PERT which means that any code in the PERT sub-hierarchy of the HL7 ActRelationshipType Vocabulary is permitted here. There is a work under progress to select appropriate codes from the HL7 ActRelationshipType Vocabulary as well as add unique codes to genomics.
Miscellaneous Issues
Association types:
Association types (ActRelationship typeCode) are consistent with the following principles:
Table 16: reference.typeCode

Bioinformatics formats:
In general, we use bioinformatics formats in the model to feature the encapsulation of raw genomic data such as sequencing, expression and proteomic data. To enable the embedding of such data accepted from labs that work with bioinformatics formats, it is possible to assign specific XML portions into the Sequence and Expression value attributes (as well as into SequenceVariation). This encapsulation of 'foreign' markup is made possible due to the use of the HL7 ED (Encapsulating Data) Data Type which is defined as follows: "ED holds data that is primarily intended for human interpretation or for further machine processing which is outside the scope of HL7. ED includes unformatted or formatted written language, multimedia data, or structured information as defined by a different standard (e.g., XML-signatures.)"
The use of the XML bioinformatics markups is restricted, that is, not all tags are allowed, rather only a subset which relates to a specific patient and include the information pertinent to healthcare. The restrictions on those external XML standards are specified elsewhere but a draft of a constrained BSML schema for sequencing data is presented in Appendix C. For more details about the rationale behind this mixture of HL7 and bioinformatics markup, see the section "Coexistence of HL7 Classes and Bioinformatics Markup".
Validation:
The use of external markup in HL7 messages requires that a receiver of an HL7 instance that contains a Genotype instance, will carry out a 'double-validation' process: first step is to validate the instance against the HL7 message specification (of which the Genotype schema is part of) and the second phase is to validate the content of those value attributes against their respective content models. The valid content models of the Sequence and Expression value attributes will be an integral part of the entire Genotype specification, but at this point it is still considered informative.
Associated Properties / Observations and the Harmonization Proposals:
In early versions of the DSTU models we have coped with the reluctance of both the HL7 RIM Harmonization process and the HL7 Clinical Genomics group to nail down common attributes of genomic observations through the addition of new classes and attributes to the RIM, by elaborating on the SequenceVariation and Expression properties and creating two new Observation classes (SequenceVariationProperty and ExpressionProperty) to be placeholders for each of theses properties. For example, the proposed 'length' property of a possible SequenceVariation new RIM class could be represented by an object of the SequenceVariationProperty class which only has code and value attributes. The code will indicate that this observation describes the position of the variation and the value attribute holds the position itself. The assumption is that this observation is an integral part of the parent observation with the same effective time. It could be identified only by going through the source variation object. In contrast, we also had the LocusAssociatedObservation class which is a place holder for associated observations such as copy number, zygosity, dominancy and gene family. These observations are independent observations that do have an id, effective time and method code.
In later vesrions of the DSTU, the associated properties/observation classes were consolidated to two classes: AssociatedProperty and AssociatedObservation. Instead of having specific class names (e.g., SequenceVariationProperty), all core classes now have these two generic classes coming off them. It makes the model simpler (but put the burden on the parsing application to understand the context of each associated property/observation). The basic difference between associated properties and associated observations is that an associated property should have been (and eventually may be) part of the parent class attributes. It's an inherent part of the parent observation and thus doesn't have id, time stamp, method, performer, etc. They 'inherent' all these attributes from their parent. An associated observation, in contrast, is an independent observation and a component of its parent class.
Coexistence of HL7 Objects and Bioinformatics Markup:
When exploring this model one could identify the use of bioinformatics markup such as MAGE for gene expression and BSML for DNA sequencing. Also, a few of the HL7 Classes such as the property classes are overlapping the elements of the bioinformatics markup in a way that it is possible to find a SNP represented in both the Sequence class as well as in the AssociatedProperty class. The question then arises, what are the relationships of the two and how do they coexist? The following are a few points to note about that issue:
Figure 3 shows a conceptual workflow where the above co-existence takes place and executed step-wise. Figure 4 shows an example
taken form the sequencing type of data: the most clinically-significant SNPs are bubbled-up from the raw sequencing data and
being associated with clinical phenotypes. For illustration of the latter scenario, see sample on the EGFR (described in the
samples appendix B), courtesy of the NHANES project carried out in the USA CDC.

| View Revision MarksHide Revision Marks | Return to top of page |