Batch Analysis¶
Setup¶
%reload_ext autoreload
%autoreload 2
from neuro_py.process import batch_analysis
import pandas as pd
import numpy as np
Section 1: Define the analysis¶
Here, I'm defining the analysis in the notebook, but in a real project, you would define it in a separate .py
file and import it here.
def toy_analysis(basepath, parameter_1=1, parameter_2=2):
results = pd.DataFrame()
results["basepath"] = [basepath]
results["parameter_1"] = parameter_1
results["parameter_2"] = parameter_2
results["random_number"] = np.random.randint(0, 100)
return results
For your project, you will have a .csv
file with the basepaths
you want to analyze. Here, I'm creating a DataFrame
with the basepaths
for the purpose of this notebook.
sessions = pd.DataFrame(dict(basepath=[
r"U:\data\hpc_ctx_project\HP01\day_1_20240227",
r"U:\data\hpc_ctx_project\HP01\day_2_20240228",
r"U:\data\hpc_ctx_project\HP01\day_3_20240229",
]))
You will need to define the path where you want to save the results of your analysis.
It's useful to nest the analysis version in a subfolder (toy_analysis\toy_analysis_v1
) to keep track of the different versions of your analysis.
save_path = r"Z:\home\ryanh\projects\hpc_ctx\toy_analysis\toy_analysis_v1"
Section 2: Run the analysis¶
Finally, you can run your analysis in batch mode. This will loop through the basepaths
and save the results in the specified folder.
The batch_analysis
function is a general function that you can use for any analysis. You just need to pass the function you want to run, the basepaths
you want to analyze, and the save path.
If your analysis fails, running again will start from where it left off.
There is a parallel
option that you can set to True
if you want to run the analysis in parallel. This will speed up the analysis if you have multiple cores.
batch_analysis.run(
sessions,
save_path,
toy_analysis,
parallel=False,
verbose=True,
)
100%|██████████| 3/3 [00:00<00:00, 3007.39it/s]
U:\data\hpc_ctx_project\HP01\day_1_20240227 U:\data\hpc_ctx_project\HP01\day_2_20240228 U:\data\hpc_ctx_project\HP01\day_3_20240229
Section 3: Load the results¶
There is a built in loader that concatenates the results of the analysis into a single DataFrame
.
results = batch_analysis.load_results(save_path)
results
basepath | paramater_1 | paramater_2 | random_number | |
---|---|---|---|---|
0 | U:\data\hpc_ctx_project\HP01\day_1_20240227 | 1 | 2 | 34 |
1 | U:\data\hpc_ctx_project\HP01\day_2_20240228 | 1 | 2 | 30 |
2 | U:\data\hpc_ctx_project\HP01\day_3_20240229 | 1 | 2 | 66 |
Bonus: More complicated results¶
Your results won't always fit nicely into a single DataFrame
. Sometimes you will have multiple data types you need to save.
For example, you might have values for each cell in a DataFrame
and also PSTHs for each cell. Your analysis will store both in a dictionary and you will construct a custom loader in your analysis.
Define the analysis¶
import glob
import os
import pickle
def toy_analysis_2(basepath, paramater_1=1, paramater_2=2):
results_df = pd.DataFrame()
results_df["basepath"] = [basepath]
results_df["paramater_1"] = paramater_1
results_df["paramater_2"] = paramater_2
results_df["random_number"] = np.random.randint(0, 100)
window_starttime, window_stoptime = [-1, 1]
window_bins = int(np.ceil(((window_stoptime - window_starttime) * 1000)))
time_lags = np.linspace(window_starttime, window_stoptime, window_bins)
psths = pd.DataFrame(
index=time_lags,
columns=np.arange(1),
)
psths[:] = np.random.rand(window_bins, 1)
results = {
"results_df": results_df,
"psth": psths,
}
return results
# custom loader
def load_results(save_path, verbose=False):
# check if folder exists
if not os.path.exists(save_path):
raise ValueError(f"folder {save_path} does not exist")
# get all the sessions
sessions = glob.glob(save_path + os.sep + "*.pkl")
results_df = []
psths = []
# iterate over the sessions
for session in sessions:
if verbose:
print(session)
# load the session
with open(session, "rb") as f:
results_ = pickle.load(f)
if results_ is None:
continue
results_df.append(results_["results_df"])
psths.append(results_["psth"])
results_df = pd.concat(results_df, axis=0, ignore_index=True)
psths = pd.concat(psths, axis=1, ignore_index=True)
return results_df, psths
Run the analysis¶
save_path = r"Z:\home\ryanh\projects\hpc_ctx\toy_analysis\toy_analysis_v2"
batch_analysis.run(
sessions,
save_path,
toy_analysis_2,
parallel=False,
verbose=True,
)
100%|██████████| 3/3 [00:00<00:00, 3008.11it/s]
U:\data\hpc_ctx_project\HP01\day_1_20240227 U:\data\hpc_ctx_project\HP01\day_2_20240228 U:\data\hpc_ctx_project\HP01\day_3_20240229
Load the results¶
results_df, psths = load_results(save_path)
display(results_df)
display(psths)
basepath | paramater_1 | paramater_2 | random_number | |
---|---|---|---|---|
0 | U:\data\hpc_ctx_project\HP01\day_1_20240227 | 1 | 2 | 56 |
1 | U:\data\hpc_ctx_project\HP01\day_2_20240228 | 1 | 2 | 32 |
2 | U:\data\hpc_ctx_project\HP01\day_3_20240229 | 1 | 2 | 56 |
0 | 1 | 2 | |
---|---|---|---|
-1.000000 | 0.190685 | 0.490553 | 0.248958 |
-0.998999 | 0.078999 | 0.689063 | 0.40577 |
-0.997999 | 0.094847 | 0.788747 | 0.966084 |
-0.996998 | 0.287616 | 0.804512 | 0.846309 |
-0.995998 | 0.723807 | 0.996373 | 0.850087 |
... | ... | ... | ... |
0.995998 | 0.023565 | 0.136486 | 0.120244 |
0.996998 | 0.298943 | 0.844828 | 0.227437 |
0.997999 | 0.514455 | 0.847778 | 0.782702 |
0.998999 | 0.975054 | 0.795339 | 0.898294 |
1.000000 | 0.122129 | 0.228904 | 0.168518 |
2000 rows × 3 columns