Implementing ML Recommender Systems in Production
Recommender systems are everywhere — Netflix, Spotify, Amazon. But building one that works reliably in production is very different from training a model in a Jupyter notebook. Here's how I built an ML-powered job matching system at Anyskillz.
The Problem
Anyskillz is a freelance marketplace connecting UK businesses with developers. The challenge: with hundreds of developers and thousands of job postings, how do you surface the most relevant matches?
Manual searching wasn't scaling. Clients needed developers with specific skill combinations, and developers were missing relevant opportunities buried in long lists.
Choosing the Right Approach
Content-Based vs. Collaborative Filtering
For our use case, we went with a hybrid approach:
- Content-based filtering using skill matching and NLP on project descriptions
- Collaborative filtering using interaction data (applications, bookmarks, messages)
The cold-start problem was real — new developers had no interaction history. Content-based filtering handled these cases while collaborative filtering improved recommendations for active users.
The Architecture
# recommender.py
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
class JobRecommender:
def __init__(self):
self.tfidf = TfidfVectorizer(
stop_words='english',
max_features=5000,
ngram_range=(1, 2)
)
self.skill_weight = 0.6
self.text_weight = 0.4
def compute_skill_similarity(self, dev_skills, job_skills):
"""Jaccard similarity between skill sets"""
dev_set = set(s.lower() for s in dev_skills)
job_set = set(s.lower() for s in job_skills)
if not dev_set or not job_set:
return 0.0
intersection = dev_set & job_set
union = dev_set | job_set
return len(intersection) / len(union)
def compute_text_similarity(self, dev_profile, job_description):
"""TF-IDF cosine similarity between texts"""
tfidf_matrix = self.tfidf.fit_transform([dev_profile, job_description])
return cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])[0][0]
def recommend(self, developer, jobs, top_k=10):
"""Get top-k job recommendations for a developer"""
scores = []
for job in jobs:
skill_score = self.compute_skill_similarity(
developer['skills'], job['required_skills']
)
text_score = self.compute_text_similarity(
developer['bio'], job['description']
)
combined = (self.skill_weight * skill_score +
self.text_weight * text_score)
scores.append((job['id'], combined))
scores.sort(key=lambda x: x[1], reverse=True)
return scores[:top_k]
Serving Recommendations via API
The recommender was exposed as a Django REST API endpoint, called by the frontend to power the "Recommended Jobs" section:
# views.py
from rest_framework.views import APIView
from rest_framework.response import Response
class RecommendationsView(APIView):
def get(self, request):
developer = request.user.developer_profile
active_jobs = Job.objects.filter(status='active')
recommender = JobRecommender()
recommendations = recommender.recommend(
developer.to_dict(),
[job.to_dict() for job in active_jobs],
top_k=20
)
job_ids = [r[0] for r in recommendations]
jobs = Job.objects.filter(id__in=job_ids)
return Response(JobSerializer(jobs, many=True).data)
Production Challenges
1. Latency
Computing recommendations on-the-fly was too slow for hundreds of developers × thousands of jobs. Solution: pre-compute recommendations in a background task (Celery) and cache results in Redis.
2. Feature Freshness
Skills and job descriptions change. We ran a nightly batch job to recompute the TF-IDF matrix and update cached recommendations.
3. Evaluation Metrics
We tracked:
- Click-through rate on recommended jobs (23% vs 8% for non-recommended)
- Application rate from recommendations (15% vs 5% baseline)
- Time to first application (reduced by 40%)
4. Feedback Loop
We incorporated implicit feedback — clicks, application rates, time spent viewing — to continuously improve the collaborative filtering component.
Key Lessons
- Start simple. TF-IDF + cosine similarity is surprisingly effective and easy to debug.
- Pre-compute whenever possible. Real-time ML inference at scale is expensive.
- Measure everything. Without metrics, you can't prove your system works.
- Handle cold starts explicitly. New users need a different strategy than active ones.
- Cache aggressively. Recommendations don't need to be real-time; "fresh enough" is fine.
What I'd Do Differently
If I were building this today, I'd explore:
- Sentence transformers (like SBERT) for better semantic understanding
- A/B testing framework for systematic experimentation
- Feature stores for managing ML features across the pipeline
Building ML systems in production taught me that the model is the easy part. The hard parts are data quality, serving infrastructure, and monitoring.
Want to discuss ML in production? Let's connect — I geek out about this stuff.