{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e915e75c-62aa-4e6a-891a-66172319df52",
   "metadata": {},
   "source": [
    "# Step1: Get the input data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "187ab6de-bbb5-47c5-a613-a817ed88b322",
   "metadata": {},
   "source": [
    "For TFdisc, it requires only a gene expression matrix and a list of transcription factors for the corresponding species.\n",
    "To speed up the analysis, 5000 highly variable genes and transcription factors were selected for subsequent analysis.\n",
    "Based on this, the corresponding gene regulatory network was constructed\n",
    "Due to the sparsity of gene expression data, we will conduct data denoising process on the \n",
    "gene expression data and use the imputation data as the input of TFdisc.The main steps include:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc4edcd8-49d1-44e7-854e-27fa1fbd14f6",
   "metadata": {},
   "source": [
    "#### 1. Import library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "6181f876-3564-481f-9a3d-2b6a24cb3d8f",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import pickle\n",
    "import warnings\n",
    "warnings.filterwarnings(\"ignore\")\n",
    "import TFdisc as tc\n",
    "import os"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4fd02479-ef6b-4506-8c89-274af9b0d7f1",
   "metadata": {},
   "source": [
    "#### 2. Import data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5674ed05-75c3-4b7b-9baa-ea5ecc296567",
   "metadata": {},
   "source": [
    "Here, we load the raw counts gene expression matrix extracted from Seurat or Scanpy, along with the transcription factor (TF) and highly variable gene lists."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "6b7b523c-7376-4934-8221-714ac4bd27bb",
   "metadata": {},
   "outputs": [],
   "source": [
    "wt_data = pd.read_csv(\"./data/wt_data.csv\",index_col=0)\n",
    "wt_data = wt_data.T\n",
    "TF_list = list(pd.read_csv(\"./data/TF_list.csv\",index_col=0).iloc[:,0])\n",
    "HVG_list = list(pd.read_csv(\"./data/HVG_list.csv\",index_col=0).iloc[:,0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24332377-416b-4c48-917e-5f7aa45b4dcb",
   "metadata": {},
   "source": [
    "#### 3. Construct the gene regulatory network"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "ec59f7ad-6a21-45c1-9208-2e0b466c4f17",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "running time :  273.91751742362976\n"
     ]
    }
   ],
   "source": [
    "grn_result = TFdisc.grn.TF_grn(wt_data,TF_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "058c2b07-e287-4c9a-9e3c-943d021533e3",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Save the top 50 connections in the regulatory network.\n",
    "with open(\"./data/alltop.pkl\", \"wb\") as tf:\n",
    "    pickle.dump(a,tf)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da3af5c3-2fad-4fa3-9709-0b6768dca94e",
   "metadata": {},
   "source": [
    "#### 4. Data denoising process"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dbafde66-a401-482f-a524-b1ffeecfc46a",
   "metadata": {},
   "source": [
    "In this TFdisc package, we provide multiple methods to remove systematic technical noise, allowing for different choices based on your needs."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cbddd1de-10ab-44ea-a56c-92db80a1032c",
   "metadata": {},
   "source": [
    "If you are using a Seurat object, you can use the SAVER package for denoising, or you can apply KNN denoising."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8bbcfed8-e497-40bd-ad1a-11a7cfc0ad79",
   "metadata": {},
   "outputs": [],
   "source": [
    "# knn\n",
    "imp = seurat@graphs$RNA_nn %*% t(seurat@assays$RNA@counts)\n",
    "row_sums <- rowSums(seurat@graphs$RNA_nn)\n",
    "imp <- imp / row_sums\n",
    "# saver\n",
    "imp <- saver(seurat@assays$RNA@data, size.factor = 1,estimates.only = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3939551d-166a-4061-b131-6c4f45d7de92",
   "metadata": {},
   "source": [
    "If you are only using an expression matrix, you can use the denoising methods available within the TFdisc package."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "1e666541-ab11-4055-bab0-4adcea1dc555",
   "metadata": {},
   "outputs": [],
   "source": [
    "# saver\n",
    "imp=imputation.imp_SAVER(wt_data,20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "80c14c65-3931-456d-938b-8ca04f92675a",
   "metadata": {},
   "outputs": [],
   "source": [
    "imp.to_csv('./data/ave_data.csv') "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:TFdisc]",
   "language": "python",
   "name": "conda-env-TFdisc-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}