Step1: Get the input data

For TFdisc, it requires only a gene expression matrix and a list of transcription factors for the corresponding species. To speed up the analysis, 5000 highly variable genes and transcription factors were selected for subsequent analysis. Based on this, the corresponding gene regulatory network was constructed Due to the sparsity of gene expression data, we will conduct data denoising process on the gene expression data and use the imputation data as the input of TFdisc.The main steps include:

1. Import library

[1]:
import pandas as pd
import numpy as np
import pickle
import warnings
warnings.filterwarnings("ignore")
import TFdisc as tc
import os

2. Import data

Here, we load the raw counts gene expression matrix extracted from Seurat or Scanpy, along with the transcription factor (TF) and highly variable gene lists.

[2]:
wt_data = pd.read_csv("./data/wt_data.csv",index_col=0)
wt_data = wt_data.T
TF_list = list(pd.read_csv("./data/TF_list.csv",index_col=0).iloc[:,0])
HVG_list = list(pd.read_csv("./data/HVG_list.csv",index_col=0).iloc[:,0])

3. Construct the gene regulatory network

[7]:
grn_result = TFdisc.grn.TF_grn(wt_data,TF_list)
running time :  273.91751742362976
[8]:
#Save the top 50 connections in the regulatory network.
with open("./data/alltop.pkl", "wb") as tf:
    pickle.dump(a,tf)

4. Data denoising process

In this TFdisc package, we provide multiple methods to remove systematic technical noise, allowing for different choices based on your needs.

If you are using a Seurat object, you can use the SAVER package for denoising, or you can apply KNN denoising.

[ ]:
# knn
imp = seurat@graphs$RNA_nn %*% t(seurat@assays$RNA@counts)
row_sums <- rowSums(seurat@graphs$RNA_nn)
imp <- imp / row_sums
# saver
imp <- saver(seurat@assays$RNA@data, size.factor = 1,estimates.only = TRUE)

If you are only using an expression matrix, you can use the denoising methods available within the TFdisc package.

[9]:
# saver
imp=imputation.imp_SAVER(wt_data,20)
[10]:
imp.to_csv('./data/ave_data.csv')