Subsets with logical definition (is-a)
The first example, is those subsets which also contain a logical definition with "is-a" relationships such as medication codes valueset or body site valueset
For medication codes valueset, the definition is as follows:
This definition can be easily translated to Expression Constraint Language like this
<< 410942007 |Drug or medicament (substance)| OR << 373873005 |Pharmaceutical / biologic product (product)| OR << 106181007 |Immunologic substance (substance)|This expression in the latest substrate available (20180131) contains 28928 concepts
Lists of codes
Several FHIR subsets are defined as sets of codes (such as bodysite relative location). The main problem with this approach is that subsets may be incomplete, or be potentially wrong, as hand picked words could have several meanings (e.g. the same term can be used to represent the procedure and the tissue where the procedure is made). These errors are revealed easily by looking to some graphs. In case of the bodysite relative location, the equivalent expression is the following one:
419161000 or 419465000 or 51440002 or 261183002 or 261122009 or 255561001 or 49370004 or 264217000 or 261089000 or 255551008 or 351726001 or 352730000
This graph shows where most of the concepts fall. In this case, all bodysite relative locations are qualifier values, which seems correct.
These kinds of visualizations become more useful, the more concepts the subset has. For example facility codes subset, which contains 79 concepts.
The last graph can be interpreted as follows: Of the 79 concepts contained in the subset 94% (74 out of 79) are a 'site of care', with 4 of the remaining ones being 'community environment' and a single one is a 'hospital environment'. With this kind of visualization, some questions can be raised: That single code in the hospital environment subtree should be referring to itself and all the allowed children? Could we get away with simplifying the subset to the expression "< 276339004 |Environment (environment)|" which contains all the environments known in Snomed? Do the terms annotated in the original FHIR subset with "--OTHER--NOT LISTED" should always be translated into a children or self operation?
Validating subsetsOne of the advantages of this approach is that the graphical representation also allows for a quick review of the quality of the proposed subset. As an example the specimen collection method, which can be defined with the expression constraint
119295008 |Specimen obtained by aspiration| OR 413651001 |Bioptics| OR 360020006 |Extirpation - action| OR 430823004 |Examination of midstream urine specimen| OR 16404004 |Induced| OR 67889009 |Irrigation| OR 29240004 |Autopsy examination| OR 45710003 |Sputum| OR 7800008 |Punctate| OR 258431006 |Scrapings| OR 20255002 |Blushing| OR 386147002 |Smear procedure| OR 278450005 |Finger stick|
This subset contains 13 concepts, and shows the following graph:
One thing that can be quickly seen is that the focus concept (which can be interpreted as the minimum common ancestor) is the Snomed CT root concept. This means that concepts in the subset have no other common parent aside from the Snomed root concept. This serves as a sign that subset definition has potential problems.
A more detailed analysis shows that:
- 5 / 12 concepts are in the procedure hierarchy, which seems fitting as we are talking about collection methods.
- 3 / 12 concepts are in the qualifier value hierarchy
- 2/ 12 concepts are in the specimen hierarchy
- 1 / 12 concept is in the substance hierarchy
- 1 / 12 concept is in the observable entity hierarchy
- 1 concept is inactive and shouldn't be used
- The concept "45710003 |Sputum (substance)|", which refers to the substance, is used to refer to the method of collection. In this case, the concept "37705003 |Collection of sputum (procedure)|" seems way more fitting to the purpose of the subset
- The concept "20255002 |Blushing, function (observable entity)|"refers to an observable entity. It seems that the correct concept could be "225063006 | Flushing cannula (procedure) |"
- Similar to the last one, "258431006 |Scrappings (specimien)|" and "Specimen obtained by aspiration (specimen)" seem to be unfitting for the purpose of the subset. Probably, "56757003 |Scraping (procedure)|" and "14766002 | Aspiration (procedure) |" should be used instead.
- The inactive concept "386147002 | Smear procedure (procedure) |" was made inactive because it was ambiguous. By reviewing the code in the Snomed Browser (see RefSet tab) we can see that code should be "448895004 | Sampling for smear (procedure) |" or "448938001 | Preparation of smear (procedure) |" (or both).
- Regarding the selected qualifier values, the correct approach shouldn't be to add these qualifiers to the set, but to allow the procedures that use this qualifier value. This can be expressed as 71388002|Procedure (procedure)| which 260686004 |Method (attribute)| is any of the three qualifier values. Formally, the expression for these qualifier values would end as:
<71388002|Procedure (procedure)|:260686004 |Method (attribute)|=(16404004 |Induced (qualifier value)| OR 360020006 |Extirpation - action (qualifier value)| OR 7800008 |Punctate (qualifier value)|)Note: Even if Induced and Punctuate terms are not currently used as destination of any attribute in Snomed CT, this expression constraint allows to validate the post-coordination of procedures that use these methods