Mutalyzer HGVS Parser¶
Package to syntax check and convert Mutalyzer HGVS variant descriptions into a dictionary model to easily access descriptions information in a programmatically manner.
Features:
- Accepts HGVS descriptions with multiple variants (one HGVS allele).
- Any description sub-part can be parsed and converted as well.
- Supports common deviations to the HGVS guidelines.
- Command line and library interfaces available.
Quick start¶
Parse and convert a description from the command line:
$ mutalyzer_hgvs_parser -c "NG_012337.1:c.20del"
{
"reference": {
"id": "NG_012337.1"
},
"coordinate_system": "c",
"variants": [
{
"location": {
"type": "point",
"position": 20
},
"type": "deletion",
"source": "reference"
}
]
}
The to_model()
function can be used for the same purpose:
>>> from mutalyzer_hgvs_parser import to_model
>>> model = to_model("NG_012337.1:c.20del")
>>> model['reference']
{'id': 'NG_012337.1'}
Please see ReadTheDocs for the latest documentation.
Installation¶
The software is distributed via PyPI and can be installed with pip
:
pip install mutalyzer-hgvs-parser
Usage¶
This package provides a command line interface.
Syntax check¶
To only check if a description can be successfully parsed.
$ mutalyzer_hgvs_parser 'NG_012337.1(SDHD_v001):c.274G>T'
Successfully parsed:
NG_012337.1(SDHD_v001):c.274G>T
Description model¶
To obtain the model of a description add the -c
flag.
$ mutalyzer_hgvs_parser -c 'NG_012337.1(SDHD_v001):c.274G>T'
{
"reference": {
"id": "NG_012337.1",
"selector": {
"id": "SDHD_v001"
}
},
"coordinate_system": "c",
"variants": [
{
"type": "substitution",
"source": "reference",
"location": {
"type": "point",
"position": 274
},
"deleted": [
{
"source": "description",
"sequence": "G"
}
],
"inserted": [
{
"source": "description",
"sequence": "T"
}
]
}
]
}
Grammar start rule¶
By default, the Mutalyzer
grammar
is used,
with description
as the start (top) rule. It is however possible
to choose a different start rule with the -r
option.
$ mutalyzer_hgvs_parser -r variant '274G>T'
Successfully parsed:
274G>T
The -c
flag can be employed together with a different start rule.
$ mutalyzer_hgvs_parser -c -r variant '274G>T'
{
"location": {
"type": "point",
"position": 274
},
"type": "substitution",
"source": "reference",
"deleted": [
{
"sequence": "G",
"source": "description"
}
],
"inserted": [
{
"sequence": "T",
"source": "description"
}
]
}
Command Line Interface¶
Mutalyzer HGVS variant description parser.
usage: mutalyzer_hgvs_parser [-h] [-c] [-r R] [-g G] [-p] [-i I] [-v]
description
Positional Arguments¶
description | the HGVS variant description to be parsed |
Named Arguments¶
-c | convert the description to the model Default: False |
-r | alternative start (top) rule for the grammar |
-g | alternative input grammar file path (do not use with -c) |
-p | raw parse tree (no ambiguity solving) Default: False |
-i | save the parse tree as a PNG image (pydot required!) |
-v | show program’s version number and exit |
Copyright (c) Mihai Lefter <M.Lefter@lumc.nl>
Grammar¶
The derived EBNF grammar
does not consider all the HGVS nomenclature recommendations. Currently,
the focus is mostly on descriptions at the DNA level. Examples of
descriptions not supported:
LRG_199t1:c.[2376G>C];[3103del]
LRG_199t1:c.2376G>C(;)3103del
NC_000002.12:g.pter_8247756delins[NC_000011.10:g.pter_15825272]
NC_000009.12:g.pter_26393001delins102425452_qterinv
NC_000011.10::g.1999904_1999946|gom
At the same time, the grammar allows for descriptions which are not HGVS compliant, but interpretable, in order to help users reach a normalized description. Examples:
LRG_1:g.20>T
LRG_1:g.20_40>70_80
LRG_1:g.20_23delAATG
NG_012337.1(NM_003002.2):274G>T
Library¶
The library provides a number of functions/classes to parse and convert descriptions.
The to_model()
function¶
The to_model()
function can be used to convert an HGVS description
to a dictionary model.
>>> from mutalyzer_hgvs_parser import to_model
>>> model = to_model('NG_012337.1(SDHD_v001):c.274del')
>>> model['reference']
{'id': 'NG_012337.1', 'selector': {'id': 'SDHD_v001'}}
An alternative start rule for the grammar can be used.
>>> from mutalyzer_hgvs_parser import to_model
>>> model = to_model('274del', 'variant')
>>> model
{'location': {'type': 'point', 'position': 274}, 'type': 'deletion', 'source': 'reference'}
The parse()
function¶
The parse()
function can be used to parse for syntax correctness purposes
an HGVS description. Its output is a lark parse tree.
>>> from mutalyzer_hgvs_parser import parse
>>> parse("LRG_1:100del")
Tree('description', [Tree('reference', [Token('ID', 'LRG_1')]), Tree('variants',
[Tree('variant', [Tree('location', [Tree('point', [Token('NUMBER', '100')])]), Tree('deletion', [])])])])
API documentation¶
Hgvs parser¶
Module for parsing HGVS variant descriptions.
-
class
mutalyzer_hgvs_parser.hgvs_parser.
AmbigTransformer
(visit_tokens: bool = True)¶ Bases:
lark.visitors.Transformer
-
class
mutalyzer_hgvs_parser.hgvs_parser.
FinalTransformer
(visit_tokens: bool = True)¶ Bases:
lark.visitors.Transformer
-
variant
(children)¶
-
variant_predicted
(children)¶
-
variants
(children)¶
-
-
class
mutalyzer_hgvs_parser.hgvs_parser.
HgvsParser
(grammar_path=None, start_rule=None, ignore_white_spaces=True)¶ Bases:
object
HGVS parser object.
Parameters: - grammar_path (str) – Path to a different EBNF grammar file.
- start_rule (str) – Alternative start rule for the grammar.
- ignore_white_spaces (bool) – Ignore or not white spaces in the description.
-
parse
(description)¶ Parse the provided description.
Parameters: description (str) – An HGVS description. Returns: A parse tree. Return type: lark.Tree
-
status
()¶ Print parser’s status information.
-
class
mutalyzer_hgvs_parser.hgvs_parser.
ProteinTransformer
(visit_tokens: bool = True)¶ Bases:
lark.visitors.Transformer
-
P_COORDINATE_SYSTEM
(name)¶
-
extension
(children)¶
-
extension_c
(children)¶
-
extension_n
(children)¶
-
frame_shift
(children)¶
-
p_deletion
(children)¶
-
p_deletion_insertion
(children)¶
-
p_duplication
(children)¶
-
p_equal
(children)¶
-
p_insert
(children)¶
-
p_inserted
(children)¶
-
p_insertion
(children)¶
-
p_length
(children)¶
-
p_location
(children)¶
-
p_point
(children)¶
-
p_range
(children)¶
-
p_repeat
(children)¶
-
p_repeat_mixed
(children)¶
-
p_repeat_number
(children)¶
-
p_substitution
(children)¶
-
p_variant
(children)¶
-
p_variant_certain
(children)¶
-
p_variant_predicted
(children)¶
-
p_variants
(children)¶
-
p_variants_certain
(children)¶
-
p_variants_predicted
(children)¶
-
-
mutalyzer_hgvs_parser.hgvs_parser.
parse
(description, grammar_path=None, start_rule=None)¶ Parse the provided HGVS description, or the description part, e.g., a location, a variants list, etc., if an appropriate alternative start_rule is provided.
Parameters: - description (str) – Description (or description part) to be parsed.
- grammar_path (str) – Path towards a different grammar file.
- start_rule (str) – Alternative start rule for the grammar.
Returns: Parse tree.
Return type: lark.Tree
Convert¶
Module for converting HGVS descriptions and lark parse trees to their equivalent dictionary models.
-
class
mutalyzer_hgvs_parser.convert.
Converter
(visit_tokens: bool = True)¶ Bases:
lark.visitors.Transformer
-
AA
(name)¶
-
COORDINATE_SYSTEM
(name)¶
-
ID
(name)¶
-
INVERTED
(name)¶
-
OFFSET
(name)¶
-
OUTSIDE_CDS
(name)¶
-
P_SEQUENCE
(name)¶
-
SEQUENCE
(name)¶
-
UNKNOWN
(name)¶
-
conversion
(children)¶
-
deletion
(children)¶
-
deletion_insertion
(children)¶
-
description
(children)¶
-
description_dna
(children)¶
-
description_protein
(children)¶
-
duplication
(children)¶
-
equal
(children)¶
-
exact_range
(children)¶
-
extension
(children)¶
-
frame_shift
(children)¶
-
insert
(children)¶
-
inserted
(children)¶
-
insertion
(children)¶
-
inversion
(children)¶
-
length
(children)¶
-
location
(children)¶
-
point
(children)¶
-
range
(children)¶
-
reference
(children)¶
-
repeat
(children)¶
-
repeat_mixed
(children)¶
-
repeat_number
(children)¶
-
substitution
(children)¶
-
uncertain_point
(children)¶
-
variant
(children)¶
-
variant_certain
(children)¶
-
variant_predicted
(children)¶
-
variants
(children)¶
-
variants_predicted
(children)¶
-
-
mutalyzer_hgvs_parser.convert.
parse_tree_to_model
(parse_tree)¶ Convert a parse tree to a nested dictionary model.
Parameters: parse_tree (lark.Tree) – HGVS description. Returns: Description dictionary model. Return type: dict
-
mutalyzer_hgvs_parser.convert.
to_model
(description, start_rule=None)¶ Convert an HGVS description, or parts of it, e.g., a location, a variants list, etc., if an appropriate alternative start_rule is provided, to a nested dictionary model.
Parameters: - description (str) – HGVS description.
- start_rule (str) – Alternative start rule.
Returns: Description dictionary model.
Return type: dict
Contributors¶
Main developers:
- Jeroen F.J. Laros <J.F.J.Laros@lumc.nl> (Author initial version)
- Martijn Vermaat <martijn@vermaat.name> (Author initial version, maintainer 2011 - 2016)
- Mihai Lefter <m.lefter@lumc.nl> (Author current version, maintainer)
Other contributions by:
- Jonathan K. Vis <J.K.Vis@lumc.nl> (Architecture current version)
- Mark Santcroos <m.a.santcroos@lumc.nl> (Bug fix current version)
- Mark Kroon <m.kroon@lumc.nl> (Add feature initial version)
- Gerben Stouten <gstouten@gmail.com>
Find out who contributed:
git shortlog -s -e