Implementing ML Recommender Systems in Production

Lessons learned from building and deploying a machine learning recommender system that matches developers with job postings at scale.

Machine LearningPythonScikit-LearnProductionAPI

Implementing ML Recommender Systems in Production

Recommender systems are everywhere — Netflix, Spotify, Amazon. But building one that works reliably in production is very different from training a model in a Jupyter notebook. Here's how I built an ML-powered job matching system at Anyskillz.

The Problem

Anyskillz is a freelance marketplace connecting UK businesses with developers. The challenge: with hundreds of developers and thousands of job postings, how do you surface the most relevant matches?

Manual searching wasn't scaling. Clients needed developers with specific skill combinations, and developers were missing relevant opportunities buried in long lists.

Choosing the Right Approach

Content-Based vs. Collaborative Filtering

For our use case, we went with a hybrid approach:

Content-based filtering using skill matching and NLP on project descriptions
Collaborative filtering using interaction data (applications, bookmarks, messages)

The cold-start problem was real — new developers had no interaction history. Content-based filtering handled these cases while collaborative filtering improved recommendations for active users.

The Architecture

# recommender.py
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

class JobRecommender:
    def __init__(self):
        self.tfidf = TfidfVectorizer(
            stop_words='english',
            max_features=5000,
            ngram_range=(1, 2)
        )
        self.skill_weight = 0.6
        self.text_weight = 0.4

    def compute_skill_similarity(self, dev_skills, job_skills):
        """Jaccard similarity between skill sets"""
        dev_set = set(s.lower() for s in dev_skills)
        job_set = set(s.lower() for s in job_skills)
        
        if not dev_set or not job_set:
            return 0.0
        
        intersection = dev_set & job_set
        union = dev_set | job_set
        return len(intersection) / len(union)

    def compute_text_similarity(self, dev_profile, job_description):
        """TF-IDF cosine similarity between texts"""
        tfidf_matrix = self.tfidf.fit_transform([dev_profile, job_description])
        return cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])[0][0]

    def recommend(self, developer, jobs, top_k=10):
        """Get top-k job recommendations for a developer"""
        scores = []
        
        for job in jobs:
            skill_score = self.compute_skill_similarity(
                developer['skills'], job['required_skills']
            )
            text_score = self.compute_text_similarity(
                developer['bio'], job['description']
            )
            
            combined = (self.skill_weight * skill_score + 
                       self.text_weight * text_score)
            scores.append((job['id'], combined))
        
        scores.sort(key=lambda x: x[1], reverse=True)
        return scores[:top_k]

Serving Recommendations via API

The recommender was exposed as a Django REST API endpoint, called by the frontend to power the "Recommended Jobs" section:

# views.py
from rest_framework.views import APIView
from rest_framework.response import Response

class RecommendationsView(APIView):
    def get(self, request):
        developer = request.user.developer_profile
        active_jobs = Job.objects.filter(status='active')
        
        recommender = JobRecommender()
        recommendations = recommender.recommend(
            developer.to_dict(),
            [job.to_dict() for job in active_jobs],
            top_k=20
        )
        
        job_ids = [r[0] for r in recommendations]
        jobs = Job.objects.filter(id__in=job_ids)
        
        return Response(JobSerializer(jobs, many=True).data)

Production Challenges

1. Latency

Computing recommendations on-the-fly was too slow for hundreds of developers × thousands of jobs. Solution: pre-compute recommendations in a background task (Celery) and cache results in Redis.

2. Feature Freshness

Skills and job descriptions change. We ran a nightly batch job to recompute the TF-IDF matrix and update cached recommendations.

3. Evaluation Metrics

We tracked:

Click-through rate on recommended jobs (23% vs 8% for non-recommended)
Application rate from recommendations (15% vs 5% baseline)
Time to first application (reduced by 40%)

4. Feedback Loop

We incorporated implicit feedback — clicks, application rates, time spent viewing — to continuously improve the collaborative filtering component.

Key Lessons

Start simple. TF-IDF + cosine similarity is surprisingly effective and easy to debug.
Pre-compute whenever possible. Real-time ML inference at scale is expensive.
Measure everything. Without metrics, you can't prove your system works.
Handle cold starts explicitly. New users need a different strategy than active ones.
Cache aggressively. Recommendations don't need to be real-time; "fresh enough" is fine.

What I'd Do Differently

If I were building this today, I'd explore:

Sentence transformers (like SBERT) for better semantic understanding
A/B testing framework for systematic experimentation
Feature stores for managing ML features across the pipeline

Building ML systems in production taught me that the model is the easy part. The hard parts are data quality, serving infrastructure, and monitoring.

Want to discuss ML in production? Let's connect — I geek out about this stuff.