Installation

April 1, 2026 · View on GitHub

ENT3C is a method for qunatifying the similarity of micro-C/Hi-C derived chromosomal contact matrices. It is based on the von Neumann entropy1 and recent work on entropy quantification of Pearson correlation matrices2. For a contact matrix, ENT3C records the change in local pattern complexity of smaller Pearson-transformed submatrices along a matrix diagonal to generate a characteristic signal. Similarity between two contact matrices is quantified as the Pearson correlation between their respective entropy signals.

https://doi.org/10.1093/nargab/lqae076

Summary of ENT3C approach

  1. Loads cooler files and looks for shared empty bins.

  2. ENT3C will first take the logarithm of an input matrix M\mathbf{M}

  3. Next, smaller submatrices a\mathbf{a} of dimension n×nn\times n are extracted along the diagonal of an input contact matrix M\mathbf{M}

  4. nannan values in a\mathbf{a} are set to the minimum value in a\mathbf{a}.

  5. a\mathbf{a} is transformed into a Pearson correlation matrix P\mathbf{P}.

  6. P\mathbf{P} is transformed into ρ=P/n\boldsymbol{\rho}=\mathbf{P}/n to fulfill the conditions for computing the von Neumann entropy.

  7. The von Neumann entropy of ρ\boldsymbol{\rho} is computed as

    S(ρ)=jλjlogλjS(\boldsymbol{\rho})=\sum_j \lambda_j \log \lambda_j

    where λj\lambda_j is the jj th eigenvalue of ρ\boldsymbol{\rho}

  8. This is repeated for subsequent submatrices along the diagonal of the input matrix and stored in the entropy signal S_M\mathbf{S}\_{M}.

  9. Similarity QQ is defined as the Pearson correlation rr between the entropy signals of two matrices: Q(M_1,M_2)=r(S_M_1,S_M_2)Q(\mathbf{M}\_1,\mathbf{M}\_2) = r(\mathbf{S}\_{\mathbf{M}\_1},\mathbf{S}\_{\mathbf{M}\_2}).

explaination of ENT3C

Exemplary epiction of ENT3C derivation of the entropy signal S\mathbf{S} of two contact matrices M_1\mathbf{M}\_1 and M_2\mathbf{M}\_2. ENT3C's was run with submatrix dimension n=300n=300, window shift φ=10\varphi=10, and maximum number of data points in S\boldsymbol{S}, Φ_max=\Phi\_{\max}=\infty, resulting in Φ=147\Phi=147 submatrices. For subsequent scaled Pearson-transformed submatrices, ρ_i\boldsymbol{\rho}\_i, along the diagonal of logM\log{\boldsymbol{M}}, ENT3C computes the von Neumann entropies S(ρ_1),S(ρ_2),,S(ρ_Φ)S(\boldsymbol{\rho}\_1), S(\boldsymbol{\rho}\_2), \ldots, S(\boldsymbol{\rho}\_{\Phi}). The resulting signal S=S(ρ_1),S(ρ_2),,S(ρ_Φ)\mathbf{S} = \langle S(\boldsymbol{\rho}\_{1}), S(\boldsymbol{\rho}\_{2}), \ldots, S(\boldsymbol{\rho}\_{\Phi}) \rangle is shown in blue under the matrix. The first two (ρ_12\boldsymbol{\rho}\_{1-2}), middle (ρ_73\boldsymbol{\rho}\_{73}), and last two submatrices (ρ_146147\boldsymbol{\rho}\_{146-147}) are shown.

Installation

Python (>=3.11)

  1. generate and activate python environment

    python3.11 -m venv .ent3c_venv
    
    source .ent3c_venv/bin/activate
    
  2. install ENT3C:

    pip install ENT3C
    

Linux executable

Pre-built Linux executable is available in the Releases section (v.2.2.0).

  • Download and make file executable.
chmod +x ./path/to/exe/ENT3C_exe
  • For global use add path to ./bashrc file:
export PATH="$PATH:/path/to/exe/"

Julia and Matlab versions

  • 💡 note that Python or executable is recommended.

  • compare_groups is not currently available for MATLAB and Julia implementation.

  • Matlab scripts in matlab_version_ENT3C directory.

  • Julia scripts in julia_version_ENT3C:

    • packages: DataFrames, BenchmarkTools, JSON, Printf, Plots, ColorSchemes, SuiteSparse, HDF5, NaNStatistics, Statistics, Combinatorics, CSV
    • For the Julia implementation, ubuntu's hdf5-tools is also required
    • Initial julia set-up
      • option for automatic global installation --install-deps=yes. (Works with any julia version)
      • predefined julia enviornments for julia versions 1.10.4 or 1.11.2 are defined in project_files/<v.v.v>/Manifest.toml and project_files/<v.v.v>/Project.toml
      • option to load enviornments with --resolve-env=yes and --julia-version=<v.v.v>

Usage

  • CLI (python) usage:

    Usage:
    	ENT3C <command> --config=<path/to/config.json> [options]
    
    	Commands:
            get_entropy        Generates entropy output file <entropy_out_FN> .
            get_similarity            Generates similarity output file <similarity_out_FN> from <entropy_out_FN>.
            run_all            Generates <entropy_out_FN> and <similarity_out_FN>.
            compare_groups     Compare signal groups (requires --group1 and --group2 options)
    
    	Global Options:
            --config=<path>    Path to config JSON file (required for all commands)
    
    	<compare_groups> Options:
        	--group1=<GROUP>        First group name, must correspond to what comes before _BR* in config file.
        	--group2=<GROUP>        Second group name, must correspond to what comes before _BR* in config file.
    
    	Examples:
            ENT3C run_all --config=configs/myconfig.json
            ENT3C get_entropy --config=configs/myconfig.json
            ENT3C get_similarity --config=configs/myconfig.json
            ENT3C compare_groups --config=configs/myconfig.json --group1=H1-hESC --group2=K562
    
  • For linux executable use: ENT3C_exe <command> --config=<path/to/config.json> [options]

  • alternatively run ENT3C in python as:

    import ENT3C
    
    ENT3C_OUT = ENT3C.run_get_entropy("config/myconfig.json")
    
    Similarity = ENT3C.run_get_similarity("config/myconfig.json")
    
    ENT3C_OUT, Similarity = ENT3C.run_all("config/myconfig.json")
    
    EUCLIDEAN = ENT3C.run_compare_groups("config/myconfig.json",group1,group2)
    
    

Julia version

  • initial call for global package installation (see "initial julia set-up"):

    julia ENT3C.jl --config-file=config/config.test.json --install-deps=yes
    
  • after initialization:

    julia ENT3C.jl --config-file=config/config.json
    
  • alternativly load the predefined enviornments for julia 1.10.4 or 1.11.2

    julia ENT3C.jl --config-file=config/config.json --resolve-env=yes --julia-version=<v.v.v>
    

💡 note the matlab and julia implementations will always generate both the entropy and similarity dataframes

Matlab version:

matlab -nodesktop -nosplash -nodisplay -r "ENT3C('config/config.json'); exit"

💡 note the matlab and julia implementations will always generate the entropy and similarity dataframes

Input Parameters

  • all ENT3C parameters are defined in .json files config/config.json. Examples can be found in config directory.

  • Paremeters defined in <config_file>:

    1. The main ENT3C parameter affecting the final entropy signal SS is the dimension of the submatrices SUB_M_SIZE_FIX.

      • "SUB_M_SIZE_FIX": <integer> \dots fixed submatrix dimension.

        • SUB_M_SIZE_FIX can be either be fixed by or alternatively, one can specify CHRSPLIT; in this case SUB_M_SIZE_FIX will be computed internally to fit the number of desired times the contact matrix is to be paritioned into.

        PHI=1+floor((N-SUB_M_SIZE)./phi)

        where N is the size of the input contact matrix, phi is the window shift, PHI is the number of evaluated submatrices (consequently the number of data points in SS).

      • "CHRSPLIT": <integer> \dots number of times into which a N×NN \times N contact matrix is partitioned into which defining SUB_M_SIZE_FIX = floor(N/CHRSPLIT+0.5). If specified, then "SUB_M_SIZE_FIX": null otherwise "CHRSPLIT": null.

    2. "DATA_PATH": </path/to/data> \dots input data path.

    3. input files in format: [<COOL_FILENAME>, <SHORT_NAME>]

      "FILES": [
      	"ENCSR079VIJ.BioRep1.40kb.cool",
      	"G401_BR1",
      	"ENCSR079VIJ.BioRep2.40kb.cool",
      	"G401_BR2"]
      
      • Any biological replicates must be indicated in <SHORT_NAME> using the suffix "_BR%d".

      • Note: ENT3C also takes mcool files as input.

    4. "`OUT_DIR": "<desired_output_directory_name>" \dots output directory. OUT_DIR will be concatenated with OUTPUT/JULIA/ or OUTPUT/MATLAB/.

    5. "OUT_PREFIX": "<desired_output_prefix_>" \dots prefix for output files.

    6. "Resolution": "<integer,integer,...>" e.g. "40e3,100e3" \dots resolutions to be evaluated.

    7. "ChrNr": "<integer,integer,...>" "15,16,17,18,19,20,21,22,X" \dots chromosome numbers to be evaluated.

    8. "NormM": <0|1> \dots input contact matrices can be balanced. If NormM: 1, balancing weights in cooler are applied. If set to 1, ENT3C expects weights to be in dataset /resolutions/<resolution>/bins/<WEIGHTS_NAME>.

    9. "WEIGHTS_NAME": "<name_of_weights>" \dots name of dataset in cooler containing normalization weights.

    10. "phi": <integer> \dots number of bins to the next matrix.

    11. "PHI_MAX": <integer> \dots number of submatrices; i.e. number of data points in entropy signal SS. If set, φ\varphi is increased until ΦΦ_max\Phi \approx \Phi\_{\max}.

Output Files

  1. <OUT_DIR>/<OUTPUT_PREFIX>_ENT3C_similarity.csv \dots will contain all combinations of comparisons. The second two columns contain the short names specified in FILES and the third column Q the corresponding similarity score. OUTPUT/PYTHON/EvenChromosomes_NoWeights_ENT3C_similarity.csv:

    Resolution	ChrNr	Sample1		Sample2		Q
    40000		2		G401_BR1	G401_BR2	0.9978330002118974
    40000		2		G401_BR1	LNCap_BR1	0.4129094106283695
    40000		2		G401_BR1	LNCap_BR2	0.3049196919642929
    .			.		.			.			.
    .			.		.			.			.
    .			.		.			.			.
    
  2. <OUT_DIR>/<OUTPUT_PREFIX>_ENT3C_OUT.csv \dots ENT3C output table. OUTPUT/PYTHON/EvenChromosomes_NoWeights_ENT3C_OUT.csv:

    Name		ChrNr	Resolution	n	PHI	phi	binNrStart	binNrEnd	START	END			S
    G401_BR1	2		40000		500	918	6	0			499			0		20000000	3.7896426915562462
    G401_BR1	2		40000		500	918	6	6			505			240000	20240000	3.789044181663418
    G401_BR1	2		40000		500	918	6	12			511			480000	20480000	3.7918253959272032
    .			.		.			.	.	.	.			.			.		.			.
    .			.		.			.	.	.	.			.			.		.			.
    .			.		.			.	.	.	.			.			.		.			.
    

    Each row corresponds to an evaluated submatrix with fields Name (the short name specified in FILES), ChrNr, Resolution, the sub-matrix dimension sub_m_dim, PHI=1+floor((N-SUB_M_SIZE)./phi), binNrStart and binNrEnd correspond to the start and end bin of the submatrix, START and END are the corresponding genomic coordinates and S is the computed von Neumann entropy.

    • Example of output generated for ENT3C get_entropy --config=config/myconfig.json:
      • EvenChromosomes_NoWeights_40kb_ENT3C_signals.pdf
      • unbalanced 40kb contact matrices for even chromosomes across 5 cell lines. SUB_MATRIX_SIZE was 500:
ENT3C python Output
  1. <OUT_DIR>/<OUTPUT_PREFIX>_Eucl_<group1>vs<group2>.csv \dots Euclidean distance between average z-scores of S over <group1> and <group2>: (here group1=HFFc6, group2=G401). Arranged in descending order of meanSEuclideanmeanS_Euclidean.

    Resolution	ChrNr	START		END			meanS_Euclidean
    40000		6		62360000	82360000	3.3625023926723685
    40000		6		62120000	82120000	3.3546076641065095
    40000		6		61880000	81880000	3.3441925121710026
    .			.		.			.			.
    .			.		.			.			.
    .			.		.			.			.
    
    • Example of first page of output generated for ENT3C compare_groups --config=config/myconfig.json --group1 = HFFc6 group2 = "G401"
      • EvenChromosomes_NoWeights_Eucl_40kb_HFFc6vsG401.pdf
ENT3C python Output

Data used in publication

Both Julia and MATLAB implementations (ENT3C.jl and ENT3C.m) were tested on Hi-C and micro-C contact matrices binned at 40 kb in cool format.

micro-C

Cell lineBiological Replicate (BR)Accession (Experiemnt set)Accession (pairs)
H1-hESC14DNES21D8SP84DNFING6ZFD, 4DNFIBMG8YA3, 4DNFIMT4PHZ1, 4DNFI8GM4EL9
H1-hESC24DNES21D8SP84DNFIIYUGYBU, 4DNFI89L17XY, 4DNFIXP9MVBU, 4DNFI2YHYAJO, 4DNFIULY29IQ
HFFc614DNESphiT3UBH4DNFIN7IIIY6, 4DNFIJZDEIZ3, 4DNFIYBTHGNA, 4DNFIK8UIB5B
HFFc624DNESphiT3UBH4DNFIF5F4HRG, 4DNFIK82YRNM, 4DNFIATCW955, 4DNFIZU6ADT1, 4DNFIKWV6BY2
HFFc634DNESphiT3UBH4DNFIFJL4JIH, 4DNFIONHB78N, 4DNFIG1ZOVIM, 4DNFIPKVL9YI, 4DNFIJM966UR, 4DNFIV8JNJB8

Hi-C

Cell lineBiological Replicate (BR)Accession (Experiemnt set)Accession (BAM)
G4011ENCSR079VIJENCFF649MAY
G4012ENCSR079VIJENCFF758WUD
LNCaP1ENCSR346DCUENCFF977XHB
LNCaP2ENCSR346DCUENCFF204XII
A5491ENCSR444WCZENCFF867DCM
A5492ENCSR444WCZENCFF532XBC
  1. for the Hi-C data, bam files were downloaded from the ENCODE data portal and converted into pairs files using the pairtools parse function3

    pairtools parse --chroms-path hg38.fa.sizes -o <OUT.pairs.gz> --assembly hg38 --no-flip --add-columns mapq --drop-sam --drop-seq --nproc-in 15 --nproc-out 15 <IN.bam>

  2. for the micro-C data, pairs of technical replicates (TRs) were merged with pairtools merge. E.g. for H1-hESC, BR1 (4DNES21D8SP8):

    pairtools merge -o <hESC.BR1.pairs.gz> --nproc 10 4DNFING6ZFDF.pairs.gz 4DNFIBMG8YA3.pairs.gz 4DNFIMT4PHZ1.pairs.gz 4DNFI8GM4EL9.pairs.gz

  3. 40 kb coolers were generated from the Hi-C/micro-C pairs files with cload pairs function4

    cooler cload pairs -c1 2 -p1 3 -c2 4 -p2 5 --assembly hg38 <CHRSIZE_FILE:40000> <IN.pairs.gz> <OUT.cool>

References

  1. Neumann, J. von., Thermodynamik quantenmechanischer Gesamtheiten. Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen. Mathematisch-Physikalische Klasse 1927. 1927. 273-291.
  2. Felippe, H., et. al., Threshold-free estimation of entropy from a pearson matrix. EPL. 141(3):31003. 2023.
  3. Open2C et. al., Pairtools: from sequencing data to chromosome contacts. bioRxiv. 2023.
  4. Abdennur,N., and Mirny, L.A., Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics. 2020.