Blog

Why a Private On-Prem LLM Beats $200/Seat AI Sprawl

On-prem AI · Ollama / vLLM · CAPEX vs OPEX · Data sovereignty

For most small and mid-size orgs, one well-spec’d GPU server running a private LLM is cheaper than buying $200-per-seat AI subscriptions for everyone — and your prompts never leave the network.

The default move when a team wants AI is to hand out per-seat SaaS subscriptions. At ~$200 per user per month, the bill scales linearly with headcount, and every prompt — including the sensitive ones — leaves your network. There is a quieter alternative: buy one capable GPU server, run an open-weight model on it, and serve the whole company.

The cost flip

The math is straightforward. For a 30-person org, a single workstation-class GPU box loaded with a strong open model (think Llama-3.x, Qwen, Mistral, or similar) often pays back in well under a year compared to the equivalent SaaS subscriptions. After that the marginal cost per user drops to near zero — power, maintenance, the occasional drive replacement.

30 users × $200 × 12 = $72,000 / year in SaaS AI fees alone.
One reasonable GPU server, with redundancy and quality storage, costs a fraction of that — and it’s a capital asset, not a recurring bill.
Add a second box for HA later if usage demands it; the cost still doesn’t scale per user.

The data sovereignty win

The cost flip is great. The bigger story for many regulated or privacy-sensitive orgs is sovereignty:

Prompts and responses never leave the LAN.
No third-party retention, no model training on your data, no jurisdictional drift.
Audit logs live in your systems, with the same access controls as the rest of your stack.
Disconnected sites — air-gapped labs, factories, remote offices — still get an AI assistant.

The stack I actually deploy

Serving: Ollama, vLLM, or llama.cpp depending on model size, concurrency, and how much GPU memory we have to play with.
Frontend: a hardened web UI (Open WebUI-class) behind nginx with TLS and IdP-backed sign-in, usually tied into Entra ID or AD DS.
Retrieval: a small vector store grounded in SharePoint libraries, file shares, ticket history, internal wikis, and code repos so answers are grounded in real company context.
Agents: agentic workflows via Claude Code, MCP servers, and custom orchestrators for ops, code, and back-office tasks.
Audit: rate limits, full prompt/response logging, and per-user policy controls — observability matches the rest of the security stack.

When the SaaS route still wins

It’s not a religion. The SaaS route still makes sense if:

You only need it for two or three power users.
You truly need the absolute frontier model on day one and can’t wait a release cycle.
You have zero appetite for owning the box, the model upgrades, or the GPU power profile.

For everyone else — and especially anyone with sensitive data, an actual headcount, or a finance team that knows what CAPEX means — running a private LLM on your own iron is the better trade.

Building a Lightweight, Self-Hosted IT Ticketing & KB Portal

osTicket · Ubuntu Server · LAMP · CAPEX-first

Designing a simple, reliable, and cost-effective ticketing system on repurposed hardware using Ubuntu Server, a LAMP stack, and osTicket — with a strong push toward self-help via a dedicated knowledge base.

I recently completed an IT consulting project where the goal was to create a simple, reliable, and cost-effective ticketing system for a small organization. The main requirement was to have a platform that could track tickets, store everything cleanly in a database, and lean heavily on a self-service knowledge base.

Why self-hosted?

Instead of choosing a SaaS service desk, I went with a fully self-hosted setup:

CAPEX over OPEX: the org preferred a one-time capital expense instead of ongoing subscription costs. Running the service in-house gave them full ownership with no recurring licensing fees.
Sized for a small team: with fewer than 40 office users, something like Jira Service Management would have been unnecessary overhead.

The stack

The solution runs on a repurposed workstation with Ubuntu Server and:

LAMP stack (Apache, MySQL/MariaDB, PHP).
osTicket as the core open-source ticketing system.
OAuth / SSO tied into the organization’s AD DS environment so users don’t need another password.
Automated email notifications and priorities aligned to their internal workflows.
Regular backups to the existing backup server.
UI customization to make the knowledge base prominent and the interface more user-friendly.

Result

Fully self-hosted and extremely cost-efficient.
Straightforward for employees to use.
Reliable for tracking and storing ticket history.
Structured around a strong self-help knowledge base (canned responses + KB articles).
Flexible enough to grow or be customized over time.
Free from SaaS lock-in or recurring charges.

It’s always satisfying to take older hardware, modern tools, and a clear set of requirements and turn them into something that delivers real value for a small team.

Beyond “Run SPMT”: SharePoint Online Migration in the Real World

SharePoint Online · SPMT · PowerShell · Power Automate

Migrating on-prem Windows file shares to SharePoint Online using SPMT is the easy part. The real work is backups, verification, least-privilege design, information architecture, and automation that actually sticks.

I recently migrated on-prem Windows Server file shares to SharePoint Online using the SharePoint Migration Tool (SPMT). It’s a solid tool, but it’s only one piece of the job.

Backups first

Before moving a single file, I took redundant backups: server snapshots, cloud backups, and offline copies. The rule was simple — rollback must always be an option.

Verify, don’t trust

Large shares (300GB+) would sometimes report Success while quietly skipping files. To catch that, I wrote PowerShell checks to:

Compare file counts and sizes between source and destination.
Re-upload missed content.
Re-run verification until everything matched 100%.

Least-privilege by design

Permissions were rebuilt instead of blindly copied. Department groups, security trimming, and GPO alignment were all designed with least privilege in mind — for example:

Accounting can see Payroll; Payroll is limited to the payroll group.
Sensitive libraries are split by ownership and security boundaries, not just by “nice-looking” site layout.

Information architecture that users can navigate

A central hub site acting as a company portal and launchpad into departmental sites.
Clear separation of sites vs. pages — sites for ownership and security, pages for content.
Navigation, naming, and library design focused on “can people find this in 3 clicks or less?”

Automation that sticks

Power Automate for approvals, notifications, and lifecycle tasks (e.g. document review reminders).
Scripts for repeatable checks, post-migration cleanups, and ongoing integrity checks.

Takeaway: migration is not “run a tool.” It’s planning, scripting, auditing, and permission modeling so content ends up complete, secure, and usable — not just “somewhere in SharePoint.”

Finding the Key: Notes from a Simple Reverse Engineering Lab

x86 · XOR decoding · Control-flow analysis · COMP325

A small “malware-style” assignment: tracing control flow from main, ignoring false paths, spotting an XOR loop in checkkey, and recovering the hidden string Ballyourbasearebelongtous.

For a COMP325 assignment I was given a stripped Linux binary with a simple goal: reverse engineer the key it expects and use that key to make the program accept my input. It was framed like a tiny piece of malware — confusing control flow, a couple of fake paths, and one real check.

Mapping the paths from `main`

Starting at main I mapped out the obvious branches and saw three candidate functions: foo, foobar, and checkkey. With a debugger and some breakpoints, it became clear that the interesting path was guarded by a comparison against the first character of my input:

cmp eax, 42h   ; 'B'
jz  short loc_8048629

When this comparison succeeded, execution jumped toward checkkey. That told me two things:

The key must start with the character 'B'.
The real validation logic lives after that jump, not in foo or foobar.

Spotting the XOR loop in `checkkey`

Inside checkkey there was a loop that loaded a byte, XORed it with 2Ah, and walked forward until it hit a zero byte. That looked exactly like an XOR-encoded string decode routine.

The encoded string in memory was:

"KFFSE_XHKYOKXOHOFEDM^E_Y"

Tracing the registers showed edx ending up with a pointer to this buffer before the XOR and compare logic. I manually decoded a few characters with 0x2A to confirm the professor’s partial hint, then used a quick Python script to XOR the full string.

The result was the classic phrase:

allyourbasearebelongtous

Putting the pieces together

At this point I had a plausible key fragment, but the program still rejected it. Going back to main, that earlier comparison against 0x42 made more sense: the first character has to be 'B' before the code even reaches checkkey.

Prepending 'B' to the decoded string gave the final key the binary was expecting:

Ballyourbasearebelongtous

What I took away from this

Don’t walk every basic block line-by-line. Map the high-level paths, then discard obviously fake or redundant branches.
Look for patterns: tight XOR loops around constant strings are usually doing some kind of simple decryption or obfuscation.
Trace the important registers (eax, edx, etc.) between blocks and set breakpoints where they change in interesting ways.
It’s fine to automate the boring parts. Once I knew the XOR mask, having a script decode the string was faster and less error-prone than doing it by hand.

Overall, the lab was a good reminder that reverse engineering is mostly about reducing the search space: follow the real path, ignore the noise, and let the control flow tell you where the actual checks live.