# Protein Folding — data sources for RST application
**Purpose:** Document external data sources, conversion rules, and column schemas for the RST protein folding barrier-crossing pipeline. Raw data files are **not** committed; download from the sources below.
---
## Sources
### 1. Feng et al. (2026) — primary validation dataset
- **Paper:** Feng et al., "Transition-path times from single-molecule FRET," *Phys. Rev. Lett.* **136**, 108401 (2026).
- **Data:** 8 two-state proteins (Villin, WW domain, Protein A, λ repressor, gpW, ADA2h, Protein G, CspTm). τ_fold, τ_TP, N_residues, contact_order (approximate) from Fig. 5 and Fig. 2(b).
- **Raw data:** Zenodo [18354860](https://zenodo.org/record/18354860); GitHub [hoisunglab/FRET_TransitionPath](https://github.com/hoisunglab/FRET_TransitionPath).
- **Exported CSV:** `feng_etal_2026_protein_folding_data.csv` (place in this folder).
**Conversion (τ_fold):** Folding time derived from the reported relaxation rate $k$ at denaturation midpoint: $\tau_{\mathrm{fold}} = 2/k$ (see paper). The script expects τ_fold in ms, τ_TP in µs.
**contact_order:** Approximate values consistent with folding-rate trend. For PDB-derived contact order, use structure analysis tools; see script and [[Protein Folding - Code]] for column options.
---
### 2. Reference set (self-consistency)
- **File:** `builtin_reference_set.csv`
- **Content:** 8 proteins (Protein_A–H), same schema as Feng et al. Used for published n(structure) coefficients and τ_ref calibration.
- **Note:** This is **not** an independent lab dataset; it tests self-consistency. For external validation, use a second publication or lab dataset.
---
## Column schema
| Column | Units | Description |
|--------|-------|-------------|
| name | — | Protein identifier |
| tau_fold_ms | ms | Folding time |
| tau_TP_us | µs | Transition-path time |
| N_residues | — | Number of residues (optional) |
| contact_order | — | Structure descriptor (optional) |
| sequence | — | Amino-acid string (optional, for prediction) |
| tau_fold_err_ms | ms | Uncertainty in τ_fold (optional) |
| tau_TP_err_us | µs | Uncertainty in τ_TP (optional) |
**Sequence validation:** Allowed characters: 20 standard amino acids + B, Z, X. Length in [10, 500]. See [[Protein Folding - Code]].
---
## References
- Feng et al. (2026), Phys. Rev. Lett. 136, 108401
- Zenodo 18354860: https://zenodo.org/record/18354860
- GitHub: https://github.com/hoisunglab/FRET_TransitionPath