Mutalyzer HGVS Parser

https://img.shields.io/github/last-commit/mutalyzer/hgvs-parser.svg https://readthedocs.org/projects/mutalyzer-hgvs-parser/badge/?version=latest https://img.shields.io/github/release-date/mutalyzer/hgvs-parser.svg https://img.shields.io/github/release/mutalyzer/hgvs-parser.svg https://img.shields.io/pypi/v/mutalyzer-hgvs-parser.svg https://img.shields.io/github/languages/code-size/mutalyzer/hgvs-parser.svg https://img.shields.io/github/languages/count/mutalyzer/hgvs-parser.svg https://img.shields.io/github/languages/top/mutalyzer/hgvs-parser.svg https://img.shields.io/github/license/mutalyzer/hgvs-parser.svg

Package to syntax check and convert Mutalyzer HGVS variant descriptions into a dictionary model to easily access descriptions information in a programmatically manner.

Features:

  • Accepts HGVS descriptions with multiple variants (one HGVS allele).
  • Any description sub-part can be parsed and converted as well.
  • Supports common deviations to the HGVS guidelines.
  • Command line and library interfaces available.

Quick start

Parse and convert a description from the command line:

$ mutalyzer_hgvs_parser -c "NG_012337.1:c.20del"
{
  "reference": {
    "id": "NG_012337.1"
  },
  "coordinate_system": "c",
  "variants": [
    {
      "location": {
        "type": "point",
        "position": 20
      },
      "type": "deletion",
      "source": "reference"
    }
  ]
}

The to_model() function can be used for the same purpose:

>>> from mutalyzer_hgvs_parser import to_model
>>> model = to_model("NG_012337.1:c.20del")
>>> model['reference']
{'id': 'NG_012337.1'}

Please see ReadTheDocs for the latest documentation.

Installation

The software is distributed via PyPI and can be installed with pip:

pip install mutalyzer-hgvs-parser

From source

The source is hosted on GitHub, to install the latest development version, use the following commands.

git clone https://github.com/mutalyzer/hgvs-parser.git
cd hgvs-parser
pip install .

Usage

This package provides a command line interface.

Syntax check

To only check if a description can be successfully parsed.

$ mutalyzer_hgvs_parser 'NG_012337.1(SDHD_v001):c.274G>T'
Successfully parsed:
 NG_012337.1(SDHD_v001):c.274G>T

Description model

To obtain the model of a description add the -c flag.

$ mutalyzer_hgvs_parser -c 'NG_012337.1(SDHD_v001):c.274G>T'
{
  "reference": {
    "id": "NG_012337.1",
    "selector": {
      "id": "SDHD_v001"
    }
  },
  "coordinate_system": "c",
  "variants": [
    {
      "type": "substitution",
      "source": "reference",
      "location": {
        "type": "point",
        "position": 274
      },
      "deleted": [
        {
          "source": "description",
          "sequence": "G"
        }
      ],
      "inserted": [
        {
          "source": "description",
          "sequence": "T"
        }
      ]
    }
  ]
}

Grammar start rule

By default, the Mutalyzer grammar is used, with description as the start (top) rule. It is however possible to choose a different start rule with the -r option.

$ mutalyzer_hgvs_parser -r variant '274G>T'
Successfully parsed:
 274G>T

The -c flag can be employed together with a different start rule.

$ mutalyzer_hgvs_parser -c -r variant '274G>T'
{
  "location": {
    "type": "point",
    "position": 274
  },
  "type": "substitution",
  "source": "reference",
  "deleted": [
    {
      "sequence": "G",
      "source": "description"
    }
  ],
  "inserted": [
    {
      "sequence": "T",
      "source": "description"
    }
  ]
}

Parse tree representation

If pydot is installed, an image of the lark parse tree can be obtained with the -i option.

$ mutalyzer_hgvs_parser "274del" -r variant -i tree.png
Successfully parsed:
 274del
Parse tree image saved to:
 tree.png
Parse tree representation.

Command Line Interface

Mutalyzer HGVS variant description parser.

usage: mutalyzer_hgvs_parser [-h] [-c] [-r R] [-g G] [-p] [-i I] [-v]
                             description

Positional Arguments

description the HGVS variant description to be parsed

Named Arguments

-c

convert the description to the model

Default: False

-r alternative start (top) rule for the grammar
-g alternative input grammar file path (do not use with -c)
-p

raw parse tree (no ambiguity solving)

Default: False

-i save the parse tree as a PNG image (pydot required!)
-v show program’s version number and exit

Copyright (c) Mihai Lefter <M.Lefter@lumc.nl>

Grammar

The derived EBNF grammar does not consider all the HGVS nomenclature recommendations. Currently, the focus is mostly on descriptions at the DNA level. Examples of descriptions not supported:

  • LRG_199t1:c.[2376G>C];[3103del]
  • LRG_199t1:c.2376G>C(;)3103del
  • NC_000002.12:g.pter_8247756delins[NC_000011.10:g.pter_15825272]
  • NC_000009.12:g.pter_26393001delins102425452_qterinv
  • NC_000011.10::g.1999904_1999946|gom

At the same time, the grammar allows for descriptions which are not HGVS compliant, but interpretable, in order to help users reach a normalized description. Examples:

  • LRG_1:g.20>T
  • LRG_1:g.20_40>70_80
  • LRG_1:g.20_23delAATG
  • NG_012337.1(NM_003002.2):274G>T

Library

The library provides a number of functions/classes to parse and convert descriptions.

The to_model() function

The to_model() function can be used to convert an HGVS description to a dictionary model.

>>> from mutalyzer_hgvs_parser import to_model
>>> model = to_model('NG_012337.1(SDHD_v001):c.274del')
>>> model['reference']
{'id': 'NG_012337.1', 'selector': {'id': 'SDHD_v001'}}

An alternative start rule for the grammar can be used.

>>> from mutalyzer_hgvs_parser import to_model
>>> model = to_model('274del', 'variant')
>>> model
{'location': {'type': 'point', 'position': 274}, 'type': 'deletion', 'source': 'reference'}

The parse() function

The parse() function can be used to parse for syntax correctness purposes an HGVS description. Its output is a lark parse tree.

>>> from mutalyzer_hgvs_parser import parse
>>> parse("LRG_1:100del")
Tree('description', [Tree('reference', [Token('ID', 'LRG_1')]), Tree('variants',
[Tree('variant', [Tree('location', [Tree('point', [Token('NUMBER', '100')])]), Tree('deletion', [])])])])

API documentation

Hgvs parser

Module for parsing HGVS variant descriptions.

class mutalyzer_hgvs_parser.hgvs_parser.AmbigTransformer(visit_tokens: bool = True)

Bases: lark.visitors.Transformer

class mutalyzer_hgvs_parser.hgvs_parser.FinalTransformer(visit_tokens: bool = True)

Bases: lark.visitors.Transformer

variant(children)
variant_predicted(children)
variants(children)
class mutalyzer_hgvs_parser.hgvs_parser.HgvsParser(grammar_path=None, start_rule=None, ignore_white_spaces=True)

Bases: object

HGVS parser object.

Parameters:
  • grammar_path (str) – Path to a different EBNF grammar file.
  • start_rule (str) – Alternative start rule for the grammar.
  • ignore_white_spaces (bool) – Ignore or not white spaces in the description.
parse(description)

Parse the provided description.

Parameters:description (str) – An HGVS description.
Returns:A parse tree.
Return type:lark.Tree
status()

Print parser’s status information.

class mutalyzer_hgvs_parser.hgvs_parser.ProteinTransformer(visit_tokens: bool = True)

Bases: lark.visitors.Transformer

P_COORDINATE_SYSTEM(name)
extension(children)
extension_c(children)
extension_n(children)
frame_shift(children)
p_deletion(children)
p_deletion_insertion(children)
p_duplication(children)
p_equal(children)
p_insert(children)
p_inserted(children)
p_insertion(children)
p_length(children)
p_location(children)
p_point(children)
p_range(children)
p_repeat(children)
p_repeat_mixed(children)
p_repeat_number(children)
p_substitution(children)
p_variant(children)
p_variant_certain(children)
p_variant_predicted(children)
p_variants(children)
p_variants_certain(children)
p_variants_predicted(children)
mutalyzer_hgvs_parser.hgvs_parser.parse(description, grammar_path=None, start_rule=None)

Parse the provided HGVS description, or the description part, e.g., a location, a variants list, etc., if an appropriate alternative start_rule is provided.

Parameters:
  • description (str) – Description (or description part) to be parsed.
  • grammar_path (str) – Path towards a different grammar file.
  • start_rule (str) – Alternative start rule for the grammar.
Returns:

Parse tree.

Return type:

lark.Tree

Convert

Module for converting HGVS descriptions and lark parse trees to their equivalent dictionary models.

class mutalyzer_hgvs_parser.convert.Converter(visit_tokens: bool = True)

Bases: lark.visitors.Transformer

AA(name)
COORDINATE_SYSTEM(name)
ID(name)
INVERTED(name)
OFFSET(name)
OUTSIDE_CDS(name)
P_SEQUENCE(name)
SEQUENCE(name)
UNKNOWN(name)
conversion(children)
deletion(children)
deletion_insertion(children)
description(children)
description_dna(children)
description_protein(children)
duplication(children)
equal(children)
exact_range(children)
extension(children)
frame_shift(children)
insert(children)
inserted(children)
insertion(children)
inversion(children)
length(children)
location(children)
point(children)
range(children)
reference(children)
repeat(children)
repeat_mixed(children)
repeat_number(children)
substitution(children)
uncertain_point(children)
variant(children)
variant_certain(children)
variant_predicted(children)
variants(children)
variants_predicted(children)
mutalyzer_hgvs_parser.convert.parse_tree_to_model(parse_tree)

Convert a parse tree to a nested dictionary model.

Parameters:parse_tree (lark.Tree) – HGVS description.
Returns:Description dictionary model.
Return type:dict
mutalyzer_hgvs_parser.convert.to_model(description, start_rule=None)

Convert an HGVS description, or parts of it, e.g., a location, a variants list, etc., if an appropriate alternative start_rule is provided, to a nested dictionary model.

Parameters:
  • description (str) – HGVS description.
  • start_rule (str) – Alternative start rule.
Returns:

Description dictionary model.

Return type:

dict

Contributors

Main developers:

Other contributions by:

Find out who contributed:

git shortlog -s -e