中文 · English
Personal Genome · A Three-Act Field Report

Waiting, Whiplash, Foundation:
I paid a few thousand RMB to sequence my own genome, then spent two weeks making sense of it

This isn't a tutorial. It's an honest field report. A regular person — not a biologist — paid a few thousand RMB for whole-genome sequencing, fought to get the raw data back, and used AI to analyze his own body step by step. Along the way: long stretches of waiting, a data-retrieval saga that ended in a formal complaint, a toolchain that kept breaking itself, and a research direction that kept swinging back and forth. But what settled out at the end is a traceable, auditable, high-compounding foundation.

Sample meRun locally on a Mac MiniGRCh38 · 46.7×4.61 M variantsSpring 2026

1The long wait: from paying up to the first usable result

This act ate half the time and effort of the whole thing. Most of it wasn't "analysis" — it was "waiting" and "keeping the tools from crashing."

1.1 I couldn't get my own data back — until I filed a complaint

The first hurdle wasn't technical. It was getting my own data. I'd paid the few-thousand-RMB sequencing fee, but asking BGI for the raw data stalled absurdly. I called customer service three times — nothing. As a last resort I emailed BGI Genomics' investor relations with a complaint. The result was telling: one complaint, and the data arrived the next day.

This little episode says something real about consumer genomics: you pay for the sequencing, but "getting your own raw data back" isn't supported by default — you have to fight for it. It's exactly why I later put "data portability" on my checklist for evaluating gene-testing products.

1.2 Files big enough to break your intuition; the machine runs for hours

Once the data arrived, the real grunt work turned out to be file size. The raw FASTQ is tens of GB; the aligned BAM is 53 GB; even the reference genome itself is 3 GB; and the largest annotation database I downloaded (dbNSFP) is 47 GB. Every step — alignment, variant calling — runs for hours.

I ran it all locally on a Mac Mini. One real hardware detail: I bolted on a cooling fan so it could run at full speed without throttling. Apple silicon auto-slows when it overheats to protect itself, which drags multi-hour jobs out even longer; the fan held the temperature down, effectively shortening the wait.

1.3 Setup: why I built 5 mutually-isolated "workrooms"

Bioinformatics tools don't "just work" once installed — they sabotage each other (there's a bloody lesson below). So I used conda to split tools into isolated environments, each holding only what it should:

genomics

Core: bwa (alignment), samtools/bcftools/tabix 1.21, SnpEff (annotation), mosdepth (coverage), Python+pandas

gatk

Variant calling, isolated: GATK4 4.6.1.0 + bundled openjdk 17 + python 3.11

qc

Quality control: verifybamid2 (checks for sample contamination)

ai4s

AI analysis: AlphaGenome and other model libraries

(Later I added a 5th, yhaplo311, for Y-chromosome haplogroup work.) Isolating environments isn't fussiness — it's discipline beaten into me by pain.

1.4 Annotation databases: tens of GB of "translation dictionaries," downloaded slowly

A variant table alone is useless; you need "dictionaries" to translate each variant into meaning. I pulled several large annotation databases:

Add the official GATK reference-genome bundle (3 GB, plus a pile of indexes to build), and just "prepping the raw ingredients" ate most of the first night.

⚠️ 1.5 The time I lost 7 hours: one install quietly downgraded another tool

This is the most representative pitfall of the whole thing, worth telling in detail.

While running the automated pipeline, one step suddenly reported "BAM file corrupted." The BAM is the 53 GB core file; "corrupted" means the previous hours were wasted. I retried and re-ran, over and over, and burned roughly 7 hours unable to get past it.

The real culprit, finally: when I'd earlier installed the annotation tools (SnpEff/VEP), conda — to satisfy their dependencies — silently downgraded samtools from 1.21 to 0.1.19, a 2013-era version. My pipeline uses samtools quickcheck to quickly verify BAM integrity, but the old version doesn't have the quickcheck command at all. Command not found → pipeline mis-judges "BAM is broken" → endless retries. The file had been fine the whole time.

The fix: reinstall samtools 1.21, and pin samtools/htslib/bcftools to 1.21 in conda's pinned file so no future install is ever allowed to touch them again. Those 7 hours bought one iron rule — a toolchain isn't "install and go," it's "install, then defend it from tools sabotaging each other."

1.6 A few more dependency pitfalls that crashed the pipeline mid-run

PitfallSymptomFix
GATK shoved asideopenjdk got bumped to 25, but GATK4 needs 17 — couldn't get it running in the main environment no matter whatBuilt a separate gatk env (GATK4 + bundled jdk17), hard-coded its path in scripts
VEP crashes on Apple siliconThe annotation tool VEP kept segfaulting on osx-arm64Dropped VEP, switched to SnpEff 5.4c
conda activation errors outThe script ran in strict mode (set -u); conda activation referenced an undefined variable and exitedWrapped the conda-activation lines in set +u
macOS zcat won't read .gzApple's built-in zcat looks for the old .Z format, can't open .gzSwitched everything to gunzip -c

1.7 The actual pipeline: from reads to a variant table

Only once the environment was stable could the main flow run (each step taking hours):

  1. Alignment: use BWA-MEM to paste the tens-of-GB of short reads back onto the reference, piece by piece → sort → deduplicate → a 53 GB BAM
  2. Variant calling: GATK4's HaplotypeCaller, run 4-way in parallel per chromosome (ran all night; the VCF came out at 3:38 AM)
  3. Filtering + normalization: hard-filter SNPs and indels separately, then normalize (standardize the representation, 4.91 M → 5.00 M lines)
  4. On BQSR (base quality score recalibration) — after evaluating, I skipped it; this platform's quality offset is tiny, not worth re-running. Honestly recording "why I didn't do something" is part of the work too.

1.8 Run done — now the "physical exam": the QC gate

You can't trust a fresh variant table; it has to pass QC first. A few key metrics all need to be green before going further:

4.61 MPASS variants 2.02Ti/Tv ratio (normal) 0.0006FREEMIX (no contamination) 46.7×mean depth 6.2%hard-filter removal rate
Raw FASTQ ~74 GB
Aligned BAM 53 GB
Variants VCF 170 MB · 4.61 M
Usable conclusions a few dozen · evidence-graded

↑ The whole process is a sharply converging funnel: 74 GB of raw reads, condensed layer by layer, ending as a few dozen plain-language, evidence-backed conclusions. Each layer down is hours of machine time and a pile of unavoidable pitfalls.

2After the VCF: three overseers, and a direction that kept swinging

Once raw data becomes a queryable variant table, the real hard part begins — not "how to compute," but "what to study."

The VCF is the watershed: from "a pile of big files you can only wait on" to "a variant table you can query anytime." But it immediately runs into a harder problem — direction. There's an infinite amount you could dig into; what should you actually dig? Over these two weeks the direction kept getting re-adjusted, driven by three "AI overseers" taking turns — each with its own strengths and blind spots.

2a · Phoning it in at first

It started out too easy — I just ran a few crowd-pleaser variants (cilantro, earwax, alcohol metabolism) and stitched together a trivia list. Looks fun, but it's just table lookup: these common variants show up on any test, which doesn't match the point of paying for whole-genome sequencing.

2b · Claude as overseer → producing real results a level deeper

Under Claude's prodding I got serious: using DeepMind's AlphaGenome model to predict how the metabolic variants I personally carry affect gene expression. The key validation: blind to any answer, the model independently ranked two of my known functional variants in the top percentile — meaning its predictions on my variants aren't just guessing. That's where the whole-genome data finally got put to real use.

2c · ChatGPT as overseer → lots of ideas, but overreaching outside view

Later I had ChatGPT play overseer. It produced a high-quality, elegantly-structured audit. But it overreached: it rattled off 15 research tracks and 8 priorities — months of work combined. It charged hard down the "how to make the project bigger" instinct, and skipped the one question that mattered most — should I really be pouring this much more time in right now? It conflated "could do" with "should do."

2d · Claude course-corrects → because it knew my real situation convergence

Finally I had the Claude that understood my real context (startup, scarce attention) course-correct. It did what ChatGPT didn't — it cut: from 15 tracks down to 3 with clear priority, and pointed out that the truly scarce resource isn't the Nth finding the genome could still produce — it's my attention.

The biggest takeaway from this act: using different AIs to oversee each other is surprisingly useful — but you have to know each one's blind spots. One phones it in, one over-expands, one knows when to hit the brakes within your real constraints. The final call still has to be yours, because only you know what situation you're actually in.

The "could-flip-the-whole-conclusion" pitfalls hit while the direction swung

These don't burn time — they're more dangerous. The code runs perfectly fine; the conclusion is just wrong.

Coordinates mapped to the wrong chromosome

RealityThe genomic coordinates I'd looked up were wrong for a good half: one variant was labeled on chromosome 5 when it's actually on 15; several others were cross-swapped. A "genotype" read off the wrong position is pure garbage.
FixVerify every variant's coordinates one by one against Ensembl by rsID; and set up positive controls — check against variants whose answer I already know; if those don't match, something's broken.
💡 Secondhand coordinates can't be trusted — verify independently; controls are the only way to catch systematic errors.

Flip one strand and the whole reading reverses (the most dangerous)

RealityGenes have a forward and a reverse strand; the genomic base can be the exact complement of what the literature states. I have a metabolic variant where counting straight from the literature gives "0 risk alleles," when I actually carry 1; flip the direction on the alcohol-metabolism variant and "can drink fine" becomes "one sip and I'm down."
FixUnify everything to the forward strand, and use protein amino-acid mapping (VEP) to confirm which base corresponds to which risk.
💡 Get one allele's risk direction wrong and the entire reading reverses. This is the single line worth framing on the wall.

A database "classification conflict" mistaken for "pathogenic"

RealityThe initial screen "flagged" 12 pathogenic variants — alarming. On inspection they were all marked "conflicting classifications" in ClinVar (submitters disagree); my rule had mistaken "conflicting" for "pathogenic." After fixing it, true pathogenic = 0.
💡 A database is a "submission archive," not a "verdict." Conflicting ≠ pathogenic.

I overstated what the AI model said

RealityThe model said my variant "ranks #1 for expression impact in pancreas," and I turned that into "it mainly controls the pancreas." Actually the same variant has a bigger impact in the cerebellum, and scores across different genes can't be directly compared.
FixState it precisely: "ranks #1 among my variants, within the pancreas" — not "biggest impact body-wide"; for cross-gene comparison use quantiles.
💡 An AI model gives "hypothesis leads," not "verdicts." Loosen the boundary and you overstate.

I bet "eating late is worse" — the glucose data slapped me down on the spot

RealityI carry the relevant risk gene and I'm a night owl, so I assumed CGM (continuous glucose monitoring) would surely prove "eating late = worse glucose." Across 16 days and 37 meals the measurements said the exact opposite: the later I ate, the smaller the response. Then I checked the food — my dinners were coconut chicken and boiled shrimp (low-carb), my lunches were omurice (high-carb). Control for calories and the time-of-day effect vanishes.
FixHonestly report the hypothesis as refuted; what I actually learned is "my habit of going low-carb at dinner reversed the genetic tendency."
💡 The most valuable fall: a hypothesis refuted by your own data is worth more than one confirmed. Personal observational data is extremely easy to confound with lifestyle.

3So what did my genome actually say?

After all that engineering, here's the payoff — in plain language. Caveat: this is research/education, not medical diagnosis.

First, the confidence: every conclusion passed positive controls 5/5 — I checked against 5 variants whose answer I already knew (e.g. that I don't flush when drinking), and all matched, before trusting anything downstream. That's the quality gate on the "genome scorecard."

🟢 Good cards I drew

  • APOE ε3/ε3: no high-risk ε4 for Alzheimer's — the most ordinary, most reassuring combination
  • Very low Lp(a) (3.74): an inherited cardiovascular-protection advantage, a good card by birth
  • Drug safety: several key drug-metabolism genes (clopidogrel, statins, thiopurines) all normal, 7/7 consistent with the BGI report
  • Highly homoplasmic mitochondria: 38 years old, no accumulated mitochondrial mutation burden

🔴 Cards to manage

  • PNPLA3 G/G (high-risk homozygous for fatty liver): matches the mild fatty liver found on my ultrasound
  • MTNR1B heterozygous (glucose): affects early-phase post-meal insulin secretion, in step with my pre-diabetic markers
  • Central obesity + impaired glucose tolerance: genes aren't destiny, but a sign to keep an eye on metabolism

🟣 Fun little traits (dinner-table material)

  • No alcohol flush (ALDH2 normal) + fast alcohol metabolism — but the WHOOP data shows my body quietly keeps the tab (see next act)
  • Slow caffeine metabolism + anxiety-prone type: genetically not friendly to caffeine
  • Bitter super-taster: I taste the bitterness in broccoli and black coffee stronger than most
  • Dry earwax + no body odor (ABCC11), a classic East Asian type
  • Paternal / maternal haplogroups: both mainstream East Asian lineages (precise IDs are family-identifying information, omitted from the public version)
  • Rh blood type: even blood type can be inferred from sequencing coverage, consistent with my medical records (molecular subtype omitted)

4Betting the genetic predictions against real body data

This is the part nobody else can write — I have real WHOOP (wearable) and CGM (continuous glucose) data to test whether the genes were actually right.

4.1 Glucose: an OGTT delivered a diagnostic-grade metabolic localization

I did an oral glucose tolerance test (OGTT): fasting glucose normal (5.53), insulin resistance very mild (HOMA-IR 1.48), but 1-hour post-load spiked to 9.91 and 2-hour was still 8.47 (impaired glucose tolerance). It pinpointed the problem to the "early-phase post-meal insulin secretion" end — exactly the direction my MTNR1B gene's mechanism points to.

4.2 Continuous glucose (CGM): 16 days, 37 meals — stable, but a dawn phenomenon

90.5%TIR time in target range 6.40mean glucose 15.7%coefficient of variation (very stable) +1.43dawn phenomenon (morning rise)

Daytime control is actually excellent (glucose above 10 only 0.3% of the time), but there's a clear "dawn phenomenon" — morning glucose gets automatically pushed up. That explains why my fasting glucose bounces around: the morning reading gets physiologically nudged upward.

🍷 vs ☕ — the genes overestimated one, underestimated the other

🍷 Alcohol (WHOOP measured)
Gene says"fine, no flush, can drink"
Recovery score−13.0
HRV−5.6 ms
Truthgene underestimated the risk
☕ Caffeine (WHOOP measured)
Gene says"danger: slow + anxious"
Recovery scorealmost no effect
Sleep efficiencyslightly better, even
Truthconfounded obs; needs n=1

"Your face not turning red doesn't mean your body isn't keeping the tab." Someone with normal ALDH2 has no flush warning, drinks feeling fine — but across 80 drinking days the WHOOP data records the cost every time, in recovery and HRV. The caffeine column tells the opposite cautionary tale: in observational data, coffee is tangled up with "busy good days," so you can't claim "coffee helps" from "days with coffee were better" — that's exactly the confounding only a randomized experiment can cut through.

Footnote: my chronotype genes lean neutral, but WHOOP measures my sleep midpoint at 4:15 AM — a thorough night owl. The faint genetic signal gets drowned out by the startup schedule. Another live example of "genes aren't destiny."

5The honest verdict: the barrier is high, but the foundation is worth it

This wasn't easy. But looking back, it's not a one-off report — it's a base that can keep growing.

The barrier really is high — and it's several barriers stacked

💰 Money

A few thousand RMB sequencing fee — the price of admission.

⏳ Data is hard to get

Three calls did nothing; a complaint to investor relations got it the next day.

🛠️ Building the environment

A fragile toolchain that sabotages itself; half the effort just goes to making code run — and it cost me 7 hours.

🎯 Not knowing what to study

The hardest part isn't technical — it's "what do I actually want to study," and you have to think the direction through yourself.

📖 A whole new language

VCF, strand, haplogroup, quantile… a flood of new terms is an entire new language you have to roughly understand before you can start.

🧠 You also have to guard against the AI

AI can do the work, but it'll phone it in, overstate, over-expand — you need judgment to keep watch.

But the sense of achievement and the compounding are real too

The biggest feeling at the end: it leaves behind a set of traceable, auditable files — for every conclusion you can trace which data it came from, what method, how confident, where it might be wrong. This isn't a read-once-and-toss report; it's a foundation.

And the compounding is high: with the base in place, lots of things can grow on top — re-run when new bloodwork comes in, draw on it to write articles, use it as a yardstick to evaluate some gene-testing product. Invest once, reuse for a long time.

5/5positive controls (credibility discipline) 43auditable evidence records 4 tiersprivacy publication grading future reuse (foundation laid)

The content asset that settled out: a "failure-mode handbook"

These pitfalls can't be learned secondhand, because each was actually fallen into — which is exactly what makes them the most valuable content:

  1. Why rsID is more reliable than coordinates (coordinates get mapped to the wrong chromosome)
  2. Why a strand error flips the conclusion
  3. Why a database's "conflicting classification" can't be taken as pathogenic
  4. Why common variants aren't destiny (effect sizes are all small)
  5. Why an AI model is a hypothesis generator, not a verdict machine
  6. Why observational glucose data is easily confounded by food (I bet eating late was worse; got slapped down)
  7. Why statistics fool you easily when there's only one sample (you)
  8. Why short-read sequencing can't see certain genes (honestly label "can't measure")
  9. Why genome privacy is family privacy (what you publish is your parents' and child's information)
  10. Why "athletes" can have metabolic risk too

One sentence to sum up these two weeks: the barrier is high enough to turn people away, but once you're over it, what you get isn't a report — it's a foundation you can keep building on.

📖Glossary

Every dashed-underline term in the text has a plain-language explanation here.

Whole-genome sequencing (WGS)
Reading nearly all ~3 billion bases of a person's DNA. Orders of magnitude more information than a cheap "gene-testing chip" (which reads only a few hundred thousand spots).
BGI
BGI Genomics, a large Chinese sequencing company — the sequencing provider and raw-data source for this project.
FASTQ
The raw file the sequencer spits out, recording short "reads" of a few hundred bases each plus quality scores. The most raw and largest, tens of GB.
reads
The short DNA fragments (a few hundred bases) the sequencer reads at a time. Billions of reads stitched together make the full genome.
alignment
Pasting the mass of short reads back to their correct positions on the reference genome, like reassembling shredded paper into the original book.
BAM
The alignment result file, recording where each read landed on the genome. 53 GB in this project.
reference genome / GRCh38
A "standard human genome" template used as the ruler for alignment. GRCh38 (a.k.a. hg38) is the current mainstream version.
variant calling
Comparing your data to the reference template to find the positions where "you differ from the standard" (i.e. variants).
VCF
The variant result table, one line per position where "you differ from the standard." The core queryable data of the whole analysis.
sequencing depth (46.7×)
How many times, on average, each position of the genome was read. Higher is more reliable; 46.7× is a sufficient level for whole-genome analysis.
throttling
When a computer overheats it auto-lowers its speed to protect the hardware. A cooling fan avoids it, letting long jobs run at full speed.
conda
A tool for managing software environments — it can build multiple isolated "workrooms" so different tools' dependencies don't fight.
samtools / bcftools
The two most-used command-line tools for handling BAM/VCF files. Pinned to 1.21 in this project.
GATK4
The most authoritative software suite for variant calling (from the Broad Institute). Version 4.6.1.0 here.
dbNSFP
A huge (47 GB) variant-annotation database aggregating many predictions of "how likely this variant is to be harmful."
ClinVar
A public archive of variant-disease relationships. Note it's a "submission archive," not an "authoritative verdict" — it has conflicting classifications.
VEP
A variant-annotation tool that tells you which amino acid a variant changes. Switched to SnpEff here because it crashed on Apple silicon.
segfault
A classic crash where a program illegally accesses memory — common on incompatible hardware/OS.
BQSR
Base quality score recalibration, an optional refinement step before variant calling. Skipped here after evaluation (too little benefit).
normalize
Standardizing variants into a canonical representation (left-aligned, multi-allelics split) for easier downstream lookup.
QC (quality control)
A "physical exam" after the run: check whether variant count, Ti/Tv, sample contamination, etc. are normal — if not, don't proceed.
Ti/Tv
A QC metric (transition/transversion ratio); the genome-wide normal is ~2.0–2.1, deviation signals a data problem.
FREEMIX
A sample-contamination metric. Near 0 means your sample isn't mixed with someone else's DNA. 0.0006 here — very clean.
PASS variants
Variants that pass the quality filter and are trustworthy. 4.61 million here.
rsID
A globally unique ID for each common variant (e.g. rs10830963), independent of coordinates — the most reliable "ID card" for lookup.
coordinates
A variant's position on the genome (which chromosome, which base). They shift between reference versions and are error-prone.
allele
The different base versions possible at one position (e.g. A or G). You get one from each parent, forming your genotype.
strand
DNA is double-stranded; the forward and reverse strands are complementary. The same variant reads as opposite bases on different strands — get it wrong and the risk direction reverses entirely.
Ensembl
An authoritative genome database; this project used its API to verify coordinates one by one.
positive control
Checking against variants whose answer is already known. If even the known ones come out wrong, the whole pipeline is broken.
AlphaGenome
DeepMind's AI model predicting how a DNA variant affects molecular-level properties like gene expression and splicing. A "hypothesis generator," not a clinical verdict.
quantile
Where a score ranks among all possible values (as a percentile). Use it — not raw scores — to compare impact across different genes.
haplogroup
A population-lineage branch defined by the Y chromosome (paternal) or mitochondria (maternal), used to trace ancestral migration.
OGTT
Oral glucose tolerance test. Drink glucose, then draw blood at intervals to see the glucose/insulin curve — a diagnostic-grade metabolic test.
CGM
Continuous glucose monitoring. A sensor stuck on your arm reads glucose every few minutes, showing the full-day curve.
HOMA-IR
An insulin-resistance index computed from fasting glucose and insulin. Under 2 counts as normal. 1.48 here.
TIR
Time in target range for glucose. Higher is better; measures how stable glucose is. 90.5% here.
chronotype
A person's innate tendency toward early-to-bed/early-to-rise vs. late (night owl).
WHOOP
A wearable that continuously tracks heart-rate variability (HRV), resting heart rate, sleep, recovery score, etc.