DATA_SOURCE - Tripstoph's Digital Garden

# Protein Folding — data sources for RST application **Purpose:** Document external data sources, conversion rules, and column schemas for the RST protein folding barrier-crossing pipeline. Raw data files are **not** committed; download from the sources below. --- ## Sources ### 1. Feng et al. (2026) — primary validation dataset - **Paper:** Feng et al., "Transition-path times from single-molecule FRET," *Phys. Rev. Lett.* **136**, 108401 (2026). - **Data:** 8 two-state proteins (Villin, WW domain, Protein A, λ repressor, gpW, ADA2h, Protein G, CspTm). τ_fold, τ_TP, N_residues, contact_order (approximate) from Fig. 5 and Fig. 2(b). - **Raw data:** Zenodo [18354860](https://zenodo.org/record/18354860); GitHub [hoisunglab/FRET_TransitionPath](https://github.com/hoisunglab/FRET_TransitionPath). - **Exported CSV:** `feng_etal_2026_protein_folding_data.csv` (place in this folder). **Conversion (τ_fold):** Folding time derived from the reported relaxation rate $k$ at denaturation midpoint: $\tau_{\mathrm{fold}} = 2/k$ (see paper). The script expects τ_fold in ms, τ_TP in µs. **contact_order:** Approximate values consistent with folding-rate trend. For PDB-derived contact order, use structure analysis tools; see script and [[Protein Folding - Code]] for column options. --- ### 2. Reference set (self-consistency) - **File:** `builtin_reference_set.csv` - **Content:** 8 proteins (Protein_A–H), same schema as Feng et al. Used for published n(structure) coefficients and τ_ref calibration. - **Note:** This is **not** an independent lab dataset; it tests self-consistency. For external validation, use a second publication or lab dataset. --- ## Column schema | Column | Units | Description | |--------|-------|-------------| | name | — | Protein identifier | | tau_fold_ms | ms | Folding time | | tau_TP_us | µs | Transition-path time | | N_residues | — | Number of residues (optional) | | contact_order | — | Structure descriptor (optional) | | sequence | — | Amino-acid string (optional, for prediction) | | tau_fold_err_ms | ms | Uncertainty in τ_fold (optional) | | tau_TP_err_us | µs | Uncertainty in τ_TP (optional) | **Sequence validation:** Allowed characters: 20 standard amino acids + B, Z, X. Length in [10, 500]. See [[Protein Folding - Code]]. --- ## References - Feng et al. (2026), Phys. Rev. Lett. 136, 108401 - Zenodo 18354860: https://zenodo.org/record/18354860 - GitHub: https://github.com/hoisunglab/FRET_TransitionPath