Step2: Simulation using TFdisc

Using the imputation data and gene regulatory networks obtained in the first step, transcription factor (TF) perturbation simulations can be performed. The main steps include:

1. Import library

[5]:
import pandas as pd
import numpy as np
import pickle
import TFdisc as tc
import warnings
warnings.filterwarnings("ignore")
import os

2. Import data

Here, we load the filtered transcription factor (TF) and highly variable gene lists,along with the imputation data and gene regulatory network.

[2]:
TF_list = list(pd.read_csv("./data/TF_list.csv",index_col=0).iloc[:,0])
HVG_list = list(pd.read_csv("./data/HVG_list.csv",index_col=0).iloc[:,0])
ave_data = pd.read_csv("./data/ave_data.csv",index_col=0)
ave_data = ave_data.T
with open("./data/alltop.pkl", "rb") as tf:
    grn_result = pickle.load(tf)

3. Build predictive models

Construct kernel ridge regression model for each gene.

[6]:
tc.train_model.TF_model(ave_data,list(set(TF_list) | set(HVG_list)),
                     grn_result,save = None,method = "krr",verbose = True,
                     test_size=0.1, model_score=False)
Task Progress: 100%|███████████████████| 5644/5644 [2:51:37<00:00,  1.82s/items]
running time = 10297.27405333519

Construct Random Forest Model for each gene.

[10]:
tc.train_model.TF_model(ave_data,list(set(TF_list) | set(HVG_list)),
                     grn_result,save = None,method = "rf",verbose = True,
                     test_size=0.1, model_score=False)
Task Progress: 100%|███████████████████| 5644/5644 [8:32:07<00:00,  5.44s/items]
running time = 5.573473691940308

4. In silico TF perturbation simulation

[13]:
pre_data = tc.gen_model.combine_predict(ave_data,TF_list,HVG_list,grn_result,"./krr/",rf_premodel="./rf/",
                    TF="Nkx2-1",krr_time=5,rf_time=1,
                    core=30,matrix_err = 10000,min_matrix_err = 0.01)
[27]:
pre_data.to_csv('./data/Nkx2_1_perturb.csv')
[ ]: