Lectin Gene Families Expansion Machanisms

Analysis of lectin gene family expansion through tandem and segmental duplication events in plant genomes using MCScanX and TBtools-II.
Author

Beaven Manjengwa

Published

May 1, 2026

Keywords

gene duplication, tandem duplication, segmental duplication, MCScanX, TBtools, plant lectins, gene expansion

Analysis of Lectin Gene Expansion Patterns

Objective(s) of this section

Analyze patterns of lectin gene family expansion through tandem and segmental duplication events to understand evolutionary mechanisms driving lectin diversity. Lectin families show different expansion patterns.

Methodology overview

  1. Map lectin genes to chromosomal positions using gene coordinates
  2. Identify and analyze gene duplication patterns: (1) tandem duplications and (2) WGS or segmental duplications
  3. Calculate duplication statistics and classify expansion mechanisms

Tools and databases

Essential software tools and databases for that can be used in this analysis.
Tool Tested Version Platform (OS) Purpose
MCScanX1 1.0 Linux Duplication classification
TBtools-II2 2.466 Windows/Linux/macOS Integrated toolkit for genomics analysis
PGDD 2.0 Web database Plant genome duplication database

Analysis Workflows

TBtools-II approach

  1. Prepare input files - genome sequence (.fna) and structural annotation (.gtf/.gff) files
  2. Identify homologous gene pairs using BLASTp (E-value < 1e-10, top 5 matches)
  3. Run MCScanX package in TBtools-II with MCScan and Diamond BLAST dependencies (requires Java environment)
  4. Use MCScanX wrapper and duplicate_gene_classifier to classify duplication events
  5. Analyze resulting *.genetype.tab.xls file for tandem and WGD/segmental duplication types

PGDD approach

  1. Define tandem duplications based on study-specific criteria (e.g., genes of same family on same chromosome with maximum number of intervening genes)
  2. Identify collinear blocks within the plant genome of interest using embedded MCScanX and filter the dataset to exclude genes with Ks values > 1.0 to avoid saturation
  3. Search the filtered data for lectin genes to identify segmental duplications
  4. Calculate expansion statistics for each duplication type

Key Insights

  • Tandem duplication is the primary mechanism driving lectin gene family expansion across different species. Cucumber and rice show 76.8% and 62.8% tandem duplications respectively, while Arabidopsis shows tandem duplications spread across six lectin families.
  • However, others show no expansion like soybean, henceforth there’s no universal pattern you can expect within families and genome-wide level

Key Limitations

  • Genes with high Ks values (>1.0) must be excluded from analysis due to saturation risk
  • Definition of tandem vs segmental duplications depends on arbitrary distance and gene number thresholds

Published Studies

Phaseolus Species3, Rice (Oryza sativa)4, Arabidopsis thaliana5, Cucumber (Cucumis sativus)6, and soybean (Glycine max)7

References

1.
Wang, Y. et al. MCScanX: A toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Research 40, e49–e49 (2012).
2.
3.
4.
Tsaneva, M., De Schutter, K., Verstraeten, B. & Van Damme, E. J. M. Lectin sequence distribution in QTLs from rice (oryza sativa) suggest a role in morphological traits and stress responses. International Journal of Molecular Sciences 20, 437 (2019).
5.
Eggermont, L., Verstraeten, B. & Van Damme, E. J. M. Genome-wide screening for lectin motifs in arabidopsis thaliana. The Plant Genome 10, plantgenome2017.02.0010 (2017).
6.
Dang, L. & Van Damme, E. J. M. Genome-wide identification and domain organization of lectin domains in cucumber. Plant Physiology and Biochemistry 108, 165–176 (2016).
7.
Van Holle, S. & Van Damme, E. Distribution and evolution of the lectin family in soybean (glycine max). Molecules 20, 2868–2891 (2015).