Curated collection of thoughts and builds centered around LLMs.
Why unified memory architecture is the only way to run 70B parameter models without a data-center budget.