Step2: Simulation using TFdisc

Using the imputation data and gene regulatory networks obtained in the first step, transcription factor (TF) perturbation simulations can be performed. The main steps include:

1. Import library

[5]:

import pandas as pd
import numpy as np
import pickle
import TFdisc as tc
import warnings
warnings.filterwarnings("ignore")
import os

2. Import data

Here, we load the filtered transcription factor (TF) and highly variable gene lists,along with the imputation data and gene regulatory network.

[2]:

TF_list = list(pd.read_csv("./data/TF_list.csv",index_col=0).iloc[:,0])
HVG_list = list(pd.read_csv("./data/HVG_list.csv",index_col=0).iloc[:,0])
ave_data = pd.read_csv("./data/ave_data.csv",index_col=0)
ave_data = ave_data.T
with open("./data/alltop.pkl", "rb") as tf:
    grn_result = pickle.load(tf)

3. Build predictive models

Construct kernel ridge regression model for each gene.

[6]:

tc.train_model.TF_model(ave_data,list(set(TF_list) | set(HVG_list)),
                     grn_result,save = None,method = "krr",verbose = True,
                     test_size=0.1, model_score=False)

Task Progress: 100%|███████████████████| 5644/5644 [2:51:37<00:00,  1.82s/items]

running time = 10297.27405333519

Construct Random Forest Model for each gene.

[10]:

tc.train_model.TF_model(ave_data,list(set(TF_list) | set(HVG_list)),
                     grn_result,save = None,method = "rf",verbose = True,
                     test_size=0.1, model_score=False)

Task Progress: 100%|███████████████████| 5644/5644 [8:32:07<00:00,  5.44s/items]

running time = 5.573473691940308

4. In silico TF perturbation simulation

[13]:

pre_data = tc.gen_model.combine_predict(ave_data,TF_list,HVG_list,grn_result,"./krr/",rf_premodel="./rf/",
                    TF="Nkx2-1",krr_time=5,rf_time=1,
                    core=30,matrix_err = 10000,min_matrix_err = 0.01)

[27]:

pre_data.to_csv('./data/Nkx2_1_perturb.csv')

[ ]: