Programs are graphs.
flowg keeps code in the shared form every language already represents: a typed dataflow graph. Materialize it to any language, lower it to many silicons, and meter every operation in picojoules.
Every program is already a graph.
Keep it, and the wins compound.
A program is a network of operations connected by typed, directed data flows; every language is a surface over that structure. Keep the graph as the source of truth and the gains are concrete. Materialize to any language deterministically, port and edit without re-debugging, rule out whole classes of errors by construction, and route each operation to the lowest-energy silicon with a joule receipt for every one.
kinds (closed set)
(ownership semantics)
from one graph
per silicon
Real savings in time, energy, and cost.
"The graph is canonical. Text in any language is just one projection of it, produced by lookup, not guessed token by token." The flowg premise
A typed hypergraph
you can run.
Nodes are computation primitives. Edges are resource-semantic
wires. Rust's ownership, generalized: Move, Borrow, Stream, Feedback,
and more. Each Apply node
carries an operation with physical cost metadata. This is the IR, the same
one the runtime executes.
// flowg graph: scale_and_sum (the graph is the program) graph scale_and_sum(xs: tensor<f32>, k: f32) -> f32 { n0 Param xs n1 Param k n2 Apply mul (n0, n1) // Custom("mul") n3 Apply reduce_sum (n2) // → backend chosen by joules return n3 } // wires carry ownership: n0 ──Move──▶ n2 ──Move──▶ n3
One graph.
Many languages.
The same graph projects to source in many target languages. Deterministic: the same output on every machine, every run. Text becomes a view, not the source of truth.
fn scale_and_sum(xs: &[f32], k: f32) -> f32 { xs.iter().map(|x| x * k).sum() }
def scale_and_sum(xs, k): return sum(x * k for x in xs)
// emitted WGSL compute shader (WebGPU) @compute @workgroup_size(64) fn main(@builtin(global_invocation_id) g: vec3<u32>) { out[g.x] = xs[g.x] * k; }
Routed to the silicon that
does it for the fewest joules.
Each operation carries a cost. flowg's placement pass picks the backend (CPU, Metal, AMX, CUDA, WebGPU, Wasm) that runs it at the lowest verifiable joules, with no default to any one device. Every run can produce an energy receipt.
Picojoule cost model · analytical prior refined by measured calibration.
The shared target
beneath the languages.
flowg is the graph every sibling language lowers into. They differ in surface; the target is identical. A program written in Joule runs a fused kernel on Metal without the language ever naming Metal.
Inspect a graph.
Build the toolchain, make a sample graph, and watch it run across devices with a joule receipt.
$ cargo install flowg $ flowg make-sample scale_and_sum.fg $ flowg inspect scale_and_sum.fg $ flowg run scale_and_sum.fg total: 8.42 μJ · placement: min-joule · determinism: strict