ai-powered-recruitment-platform@crowdbotics:~/case-study
HR Technology / Recruiting 8 months 2020-2021

AI-Powered Recruitment Platform

@ Crowdbotics — Python Engineer

Transforming technical recruitment with AI-powered candidate matching and automated screening workflows

60% Faster Screening
85% Match Accuracy
10K+ Candidates Processed

$ cat PROBLEM.md

Manual Resume Screening Couldn't Scale

Crowdbotics received hundreds of applications for each technical role. Recruiters spent hours manually reviewing resumes, often missing qualified candidates due to non-standard formatting or keyword mismatches. The technical skills assessment was inconsistent across different reviewers.

Key Challenges:

  • 🔴 Recruiters spending 70% of time on initial resume screening
  • 🔴 Inconsistent skill assessment — different reviewers had different standards
  • 🔴 Keyword matching missing candidates who described skills differently
  • 🔴 No way to surface passive candidates from historical applications

$ cat SOLUTION.md

NLP-Powered Resume Analysis with Skill Inference

We built an AI system that goes beyond keyword matching, using NLP to understand skills, experience levels, and candidate-job fit with explainable scoring.

Technical Approach:

1
Semantic Resume Parsing

NLP pipeline extracts skills, experience, and achievements from any resume format. Understands that 'Python backend' and 'Django REST APIs' indicate similar capabilities.

2
Skill Inference Engine

ML model infers skills from project descriptions. A candidate mentioning 'built microservices' likely knows Docker, APIs, and cloud deployment even if not explicitly listed.

3
Explainable Matching Scores

Every match score includes explanation: 'Strong match because: 5 years Python, Django experience, similar industry background.' Recruiters trust and refine the system.

4
Automated Outreach Sequences

High-match candidates automatically receive personalized outreach. System tracks engagement and adjusts messaging based on response patterns.

$ cat tech-stack.json

🚀 Core Technologies

Django

Core platform and business logic

Why: Rapid development with excellent ORM, admin interface for recruiter tools

spaCy / Transformers

NLP for resume parsing and skill extraction

Why: Industrial-strength NLP with custom model training support

PostgreSQL

Candidate and job data with full-text search

Why: Powerful text search and JSON support for parsed resume data

🔧 Supporting Technologies

Celery / Redis scikit-learn React

☁️ Infrastructure

AWS (EC2, RDS, S3) Docker GitHub Actions

$ cat ARCHITECTURE.md

The platform processes resumes through an NLP pipeline:

1
2
3
4
5
6
Resume Upload  Parser  Skill Extractor  Matcher  Recruiter Dashboard
                                           
    S3         spaCy NLP    ML Model    PostgreSQL
                                            
              Structured   Inferred    Ranked
                Data       Skills     Candidates

System Components:

Resume Parser

Handles PDF, DOCX, and LinkedIn imports with format normalization

Skill Extraction Service

NLP pipeline identifying explicit and implicit skills

Matching Engine

Scores candidates against job requirements with explanations

Outreach Automation

Personalized email sequences for qualified candidates

$ man implementation-details

Building the Resume Parser

Resume parsing had to handle incredible format diversity:

Challenges:

  • PDFs with no structure (just images)
  • DOCX with custom formatting
  • Multi-column layouts
  • International formats (dates, education systems)

Approach:

  1. PDF extraction — PyPDF2 for text, Tesseract OCR for images
  2. Layout analysis — Detect sections (experience, education, skills)
  3. Entity extraction — spaCy NER for dates, companies, job titles
  4. Normalization — Standardize to internal schema
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
class ResumeParser:
    def parse(self, file) -> ParsedResume:
        # Extract raw text
        text = self.extract_text(file)
        
        # Identify sections
        sections = self.segment_sections(text)
        
        # Extract structured data
        return ParsedResume(
            experience=self.parse_experience(sections.get('experience')),
            education=self.parse_education(sections.get('education')),
            skills=self.extract_skills(text),
            inferred_skills=self.infer_skills(sections.get('experience'))
        )

Skill Inference from Project Descriptions

Candidates don’t always list every skill they have. We built a model to infer skills:

Training Data:

  • 10K resumes with recruiter-annotated skills
  • Mapping from project descriptions to actual skill usage

Model Architecture:

  • Fine-tuned BERT for skill classification
  • Multi-label output (candidate may have multiple skills)
  • Confidence scores for each inferred skill

Examples:

  • “Built REST APIs for mobile app” → infers: API Design, Backend, possibly Python/Node
  • “Deployed to AWS using containers” → infers: Docker, AWS, DevOps
  • “Led team of 5 engineers” → infers: Leadership, Project Management
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
class SkillInferenceModel:
    def infer(self, project_description: str) -> List[InferredSkill]:
        # Tokenize and encode
        inputs = self.tokenizer(project_description, return_tensors="pt")
        
        # Get predictions
        outputs = self.model(**inputs)
        probabilities = torch.sigmoid(outputs.logits)
        
        # Return skills above threshold
        return [
            InferredSkill(skill=self.skills[i], confidence=prob)
            for i, prob in enumerate(probabilities[0])
            if prob > 0.5
        ]

$ echo $RESULTS

60% Reduction in Screening Time

60% Faster Screening Recruiters focus on qualified candidates only
85% Match Accuracy Measured against recruiter decisions
3x Candidate Throughput Process more applications per recruiter
10K+ Candidates Processed System handling full pipeline volume

Additional Outcomes:

  • Recruiters reported higher quality initial conversations
  • Hiring managers appreciated consistent skill assessment
  • Historical candidate database became searchable asset

$ cat LESSONS_LEARNED.md

Explainability is Non-Negotiable for AI in HR

Recruiters rejected black-box scores. Adding explanations ('matched because of X, Y, Z') increased adoption from 30% to 95%.

Domain-Specific NLP Training is Worth It

Generic NLP models struggled with tech jargon. Fine-tuning on 5K labeled resumes significantly improved skill extraction accuracy.

Human-in-the-Loop Improves the Model

Recruiter feedback on match quality became training data. The system got better as recruiters used it more.

$ cat README.md

Want Similar Results?

Let's discuss how I can help solve your engineering challenges.