{ "cells": [ { "cell_type": "markdown", "id": "e915e75c-62aa-4e6a-891a-66172319df52", "metadata": {}, "source": [ "# Step2: Simulation using TFdisc" ] }, { "cell_type": "markdown", "id": "6cfb8ef6-ff98-4460-9ed0-e3c89bc4457e", "metadata": {}, "source": [ "Using the imputation data and gene regulatory networks obtained in the first step, transcription factor (TF) perturbation simulations can be performed. The main steps include:" ] }, { "cell_type": "markdown", "id": "953eac0c-9169-4666-915e-c0da2a560ec9", "metadata": {}, "source": [ "#### 1. Import library" ] }, { "cell_type": "code", "execution_count": 5, "id": "6181f876-3564-481f-9a3d-2b6a24cb3d8f", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import pickle\n", "import TFdisc as tc\n", "import warnings\n", "warnings.filterwarnings(\"ignore\")\n", "import os" ] }, { "cell_type": "markdown", "id": "02e9bc33-522d-47d6-bad1-c4aa77889cdb", "metadata": {}, "source": [ "#### 2. Import data" ] }, { "cell_type": "markdown", "id": "8fa9a292-a024-49c4-ba09-9258aa4000d4", "metadata": {}, "source": [ "Here, we load the filtered transcription factor (TF) and highly variable gene lists,along with the imputation data and gene regulatory network." ] }, { "cell_type": "code", "execution_count": 2, "id": "6b7b523c-7376-4934-8221-714ac4bd27bb", "metadata": {}, "outputs": [], "source": [ "TF_list = list(pd.read_csv(\"./data/TF_list.csv\",index_col=0).iloc[:,0])\n", "HVG_list = list(pd.read_csv(\"./data/HVG_list.csv\",index_col=0).iloc[:,0])\n", "ave_data = pd.read_csv(\"./data/ave_data.csv\",index_col=0)\n", "ave_data = ave_data.T\n", "with open(\"./data/alltop.pkl\", \"rb\") as tf:\n", " grn_result = pickle.load(tf)" ] }, { "cell_type": "markdown", "id": "69fdc4ed-e2b2-4b03-9582-f37f401fda96", "metadata": {}, "source": [ "#### 3. Build predictive models" ] }, { "cell_type": "markdown", "id": "fb7404ee-6c0a-4f6b-b960-7dd64667964f", "metadata": {}, "source": [ "Construct kernel ridge regression model for each gene." ] }, { "cell_type": "code", "execution_count": 6, "id": "03c3368c-5f4b-439e-9c87-4460320aeaf5", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Task Progress: 100%|███████████████████| 5644/5644 [2:51:37<00:00, 1.82s/items]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "running time = 10297.27405333519\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "tc.train_model.TF_model(ave_data,list(set(TF_list) | set(HVG_list)),\n", " grn_result,save = None,method = \"krr\",verbose = True,\n", " test_size=0.1, model_score=False)" ] }, { "cell_type": "markdown", "id": "dfd2e326-2a14-4042-91d5-50eda034ac12", "metadata": {}, "source": [ "Construct Random Forest Model for each gene." ] }, { "cell_type": "code", "execution_count": 10, "id": "c15d3169-6d78-4d3b-a9b5-22677fed78c7", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Task Progress: 100%|███████████████████| 5644/5644 [8:32:07<00:00, 5.44s/items]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "running time = 5.573473691940308\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "tc.train_model.TF_model(ave_data,list(set(TF_list) | set(HVG_list)),\n", " grn_result,save = None,method = \"rf\",verbose = True,\n", " test_size=0.1, model_score=False)" ] }, { "cell_type": "markdown", "id": "60242425-4f26-46d8-9954-4132b59525f8", "metadata": {}, "source": [ "#### 4. In silico TF perturbation simulation" ] }, { "cell_type": "code", "execution_count": 13, "id": "181f8bd2-9bb9-4cbd-a879-cd069fb66fe6", "metadata": { "tags": [] }, "outputs": [], "source": [ "pre_data = tc.gen_model.combine_predict(ave_data,TF_list,HVG_list,grn_result,\"./krr/\",rf_premodel=\"./rf/\",\n", " TF=\"Nkx2-1\",krr_time=5,rf_time=1,\n", " core=30,matrix_err = 10000,min_matrix_err = 0.01)" ] }, { "cell_type": "code", "execution_count": 27, "id": "a25eb68c-fcce-42c0-8cf5-034e53a632cf", "metadata": {}, "outputs": [], "source": [ "pre_data.to_csv('./data/Nkx2_1_perturb.csv') " ] }, { "cell_type": "code", "execution_count": null, "id": "0003f5f1-9148-4b77-97b9-c252749d8611", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:TFdisc]", "language": "python", "name": "conda-env-TFdisc-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 5 }