URL Analysis¶
Analyze GitHub repositories directly by URL.
Overview¶
URL analysis allows you to analyze any GitHub repository without using the GitHub API rate limits. GreenMining clones repositories locally and extracts commit data with full diff information.
Benefits¶
- No GitHub API limits - Clone and analyze directly
- Full commit data - Access diffs, modified files, metrics
- Process metrics - Code churn, change set size, contributor count
- DMM metrics - Delta Maintainability Model scores
- Method-level analysis - Per-function complexity via Lizard
- Historical analysis - Analyze any date range
Python API¶
LocalRepoAnalyzer¶
The main class for URL-based analysis.
from greenmining.services.local_repo_analyzer import LocalRepoAnalyzer
analyzer = LocalRepoAnalyzer(
clone_path="/tmp/greenmining_repos", # Where to clone
cleanup_after=True # Delete after analysis
)
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
clone_path |
str | /tmp/greenmining_repos | Directory for cloning |
cleanup_after |
bool | True | Delete cloned repo after analysis |
Single Repository Analysis¶
Analyze a single repository.
from greenmining.services.local_repo_analyzer import LocalRepoAnalyzer
from datetime import datetime
analyzer = LocalRepoAnalyzer()
result = analyzer.analyze_repository(
repo_url="https://github.com/pallets/flask",
max_commits=200,
since_date=datetime(2024, 1, 1),
to_date=datetime(2024, 12, 31)
)
print(f"Total commits: {result['total_commits']}")
print(f"Green-aware: {result['green_aware_percentage']:.1f}%")
Multiple Repositories¶
repos = [
"https://github.com/pallets/flask",
"https://github.com/django/django",
"https://github.com/fastapi/fastapi"
]
for repo_url in repos:
result = analyzer.analyze_repository(repo_url, max_commits=100)
print(f"{result['repository']['name']}: {result['green_aware_percentage']:.1f}%")
analyze_repository() Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
repo_url |
str | (required) | GitHub repository URL |
max_commits |
int | 1000 | Maximum commits to analyze |
since_date |
datetime | None | Start date filter |
to_date |
datetime | None | End date filter |
Return Value¶
{
"repository": {
"name": "flask",
"url": "https://github.com/pallets/flask",
"owner": "pallets",
"clone_path": "/tmp/greenmining_repos/flask"
},
"total_commits": 200,
"green_aware_count": 47,
"green_aware_percentage": 23.5,
"commits": [
{
"sha": "abc123...",
"message": "Optimize template caching",
"author": "developer",
"date": "2024-03-15T10:30:00",
"green_aware": True,
"patterns": ["Cache Static Data"],
"modified_files": 3,
"insertions": 45,
"deletions": 12,
"dmm_unit_size": 0.85,
"dmm_unit_complexity": 0.72,
"dmm_unit_interfacing": 0.90
},
...
],
"pattern_distribution": {
"Cache Static Data": 15,
"Use Async Instead of Sync": 12,
...
},
"process_metrics": {
"change_set": {"max": 25, "avg": 5.2},
"code_churn": {"added": 5000, "removed": 2000},
"contributors_count": 45
}
}
Complete Example¶
#!/usr/bin/env python3
"""Analyze Flask repository for green patterns."""
from datetime import datetime
from greenmining.services.local_repo_analyzer import LocalRepoAnalyzer
# Initialize analyzer
analyzer = LocalRepoAnalyzer(
clone_path="/tmp/flask_analysis",
cleanup_after=True
)
# Analyze repository
print("Analyzing Flask repository...")
result = analyzer.analyze_repository(
repo_url="https://github.com/pallets/flask",
max_commits=100,
since_date=datetime(2024, 1, 1)
)
# Print summary
print(f"\n{'='*60}")
print("ANALYSIS RESULTS")
print(f"{'='*60}")
print(f"Repository: {result['repository']['name']}")
print(f"Total commits: {result['total_commits']}")
print(f"Green-aware: {result['green_aware_count']} ({result['green_aware_percentage']:.1f}%)")
# Top patterns
print(f"\nTop Patterns:")
for pattern, count in sorted(
result['pattern_distribution'].items(),
key=lambda x: x[1],
reverse=True
)[:5]:
print(f" {pattern}: {count}")
# Sample green commits
print(f"\nSample Green Commits:")
green_commits = [c for c in result['commits'] if c['green_aware']]
for commit in green_commits[:5]:
print(f" 🌱 {commit['message'][:60]}...")
print(f" Patterns: {commit['patterns']}")
Output:
Analyzing Flask repository...
============================================================
ANALYSIS RESULTS
============================================================
Repository: flask
Total commits: 100
Green-aware: 23 (23.0%)
Top Patterns:
Cache Static Data: 8
Use Async Instead of Sync: 5
Lazy Loading: 4
Compress Transmitted Data: 3
Optimize Database Queries: 3
Sample Green Commits:
🌱 Implement response caching for static assets...
Patterns: ['Cache Static Data']
🌱 Add async support for request handling...
Patterns: ['Use Async Instead of Sync']
Supported URL Formats¶
# HTTPS (recommended)
"https://github.com/owner/repo"
"https://github.com/owner/repo.git"
# SSH
"git@github.com:owner/repo.git"
# With branch (coming soon)
"https://github.com/owner/repo/tree/branch-name"
Commit Metrics¶
GreenMining extracts the following metrics for each commit:
Basic Commit Metrics¶
| Metric | Description |
|---|---|
modified_files |
Number of files changed |
insertions |
Lines added |
deletions |
Lines removed |
files |
List of modified file paths |
DMM Metrics (Delta Maintainability Model)¶
Measures how a commit impacts code maintainability on a 0-1 scale (higher is better).
| Metric | Range | Description |
|---|---|---|
dmm_unit_size |
0-1 | Unit size maintainability — proportion of changed code units that remain within acceptable size thresholds |
dmm_unit_complexity |
0-1 | Cyclomatic complexity impact — proportion of changed code units with acceptable complexity |
dmm_unit_interfacing |
0-1 | Interface complexity — proportion of changed code units with manageable parameter counts |
Process Metrics¶
All 8 process metrics tracked per repository:
| Metric | Description |
|---|---|
change_set |
Number of files changed per commit (max, avg) |
code_churn |
Lines added/removed over time |
contributors_count |
Unique contributors in the analysis period |
commits_count |
Total commits in the analysis period |
contributors_experience |
Average experience of contributors (commits to repo) |
history_complexity |
Normalized entropy of file change history |
hunks_count |
Number of contiguous changed blocks per file |
lines_count |
Total lines of code modified across all commits |
Method-Level Metrics¶
When method_level_analysis=True, GreenMining uses Lizard to extract per-function metrics:
| Metric | Description |
|---|---|
methods_count |
Number of methods analyzed in a commit |
total_nloc |
Total non-comment lines of code |
total_complexity |
Sum of cyclomatic complexity across all methods |
max_complexity |
Highest single-function complexity |
Each method entry includes:
| Field | Description |
|---|---|
name |
Function/method name |
nloc |
Non-comment lines of code |
complexity |
Cyclomatic complexity |
token_count |
Number of tokens |
parameters |
Number of parameters |
Configuration Parameters¶
URL analysis is configured via function parameters:
from greenmining.services.local_repo_analyzer import LocalRepoAnalyzer
analyzer = LocalRepoAnalyzer(
max_commits=500, # Maximum commits per repository
cleanup_after=True, # Remove cloned repos after analysis
skip_merges=True, # Skip merge commits
energy_tracking=False, # Enable energy measurement
energy_backend="auto", # Energy backend (rapl, codecarbon, cpu_meter, auto)
method_level_analysis=False, # Per-method complexity metrics
include_source_code=False, # Include source code before/after
)
Batch Analysis¶
Analyze multiple repositories efficiently:
from greenmining.services.local_repo_analyzer import LocalRepoAnalyzer
import json
repos = [
"https://github.com/pallets/flask",
"https://github.com/django/django",
"https://github.com/fastapi/fastapi",
]
analyzer = LocalRepoAnalyzer(cleanup_after=True)
all_results = []
for url in repos:
print(f"Analyzing {url}...")
result = analyzer.analyze_repository(url, max_commits=100)
all_results.append(result)
print(f" ✓ {result['green_aware_count']}/{result['total_commits']} green-aware")
# Save combined results
with open("batch_results.json", "w") as f:
json.dump(all_results, f, indent=2, default=str)
# Summary
print(f"\nTotal repositories: {len(all_results)}")
total_commits = sum(r['total_commits'] for r in all_results)
total_green = sum(r['green_aware_count'] for r in all_results)
print(f"Total commits: {total_commits}")
print(f"Total green-aware: {total_green} ({total_green/total_commits*100:.1f}%)")
Troubleshooting¶
Clone Failures¶
# Increase timeout
analyzer = LocalRepoAnalyzer(clone_timeout=300) # 5 minutes
# Use SSH for private repos
result = analyzer.analyze_repository("git@github.com:org/private-repo.git")
Large Repositories¶
# Limit commits for large repos
result = analyzer.analyze_repository(
repo_url="https://github.com/kubernetes/kubernetes",
max_commits=500 # Limit for faster analysis
)
Disk Space¶
# Always cleanup
analyzer = LocalRepoAnalyzer(cleanup_after=True)
# Or manual cleanup
import shutil
shutil.rmtree("/tmp/greenmining_repos")
Next Steps¶
- Energy Measurement - Measure energy during analysis
- Python API - Full API reference
- Configuration - All settings