Feeding an Ollama LLM GitHub Php Scripts

Supercharging a Custom Ollama LLM with 150,000+ Real PHP Examples from GitHub

As a developer working with large language models, I’ve always been fascinated by the potential of AI in automating software development. Recently, I embarked on an ambitious project: to train and optimize a custom Ollama-based LLM to generate complete, production-ready PHP scripts and WordPress plugins — not just code snippets, but entire automation workflows. Here’s how I’m doing it, and why it’s working surprisingly well.

Collecting Real-World PHP Examples from GitHub

To provide the model with up-to-date knowledge of how modern PHP is written, I began harvesting public PHP repositories using the GitHub API. By filtering for recently created and updated repositories, I’ve compiled a curated dataset of over 150,000 PHP files — with more added daily.

These files come from diverse real-world projects, which makes them ideal training and reference material. The process involves:

– Fetching repositories via the GitHub REST API
– Extracting `.php` files only
– Filtering out boilerplate, stubs, and duplicates
– Cleaning and indexing useful scripts for downstream processing

This ensures the examples reflect contemporary best practices and creative usage of the PHP language.

Building Intelligence with Retrieval-Augmented Generation (RAG)

Rather than fine-tune the model from scratch — a process that requires substantial computing resources — I opted to use Retrieval-Augmented Generation (RAG). This technique enhances the LLM by dynamically feeding it relevant context at runtime.

Here’s how it works:

1. Embedding: Each script is converted into a vector using a text embedding model such as `nomic-embed-text`.
2. Indexing: The embedded scripts are stored in a vector database (e.g., Qdrant) along with metadata.
3. Retrieval: When a prompt is issued, such as _”Generate a plugin that logs failed WordPress logins”_, the system fetches matching examples from the database.
4. Prompt Injection: These examples are included in the model’s input, giving it real-world context to work with.

This method allows the Ollama LLM to stay lightweight while staying extremely relevant.

Results: A Smarter, More Capable PHP Assistant

Throughout this ongoing process, I’ve been testing the model periodically with fresh prompts and practical use cases. The outcome has been outstanding:

– The model generates full, functional plugins and scripts.
– It adapts to coding patterns found in real projects.
– It often improves on the examples by combining logic intelligently.

The LLM doesn’t just parrot code — it understands intent, thanks to the context-rich prompt engineering backed by the example database.

Next Steps and Vision

This is just the beginning. Here’s what I’m working on next:

– Scaling up the database to include 500,000+ examples
– Tagging and categorizing examples by framework (Laravel, Symfony, etc.)
– Building a web interface for querying, reviewing, and testing results
– Exploring lightweight fine-tuning for even tighter optimization

Final Thoughts

By combining Ollama’s flexible architecture with real-world data from GitHub, I’ve created a system that truly assists with PHP automation and plugin development. Whether you’re building a dashboard, writing cron jobs, or generating WordPress shortcodes, this enhanced LLM is proving to be a reliable coding companion.

Stay tuned for updates as the project evolves. If you’re interested in collaborating, exploring a similar approach, or just want to see the engine in action, feel free to reach out!

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

HTML Snippets Powered By : XYZScripts.com
Verified by ExactMetrics