Skip to content

GreenMining Library Reference

Description

GreenMining is an empirical Python library for Mining Software Repositories (MSR) in Green IT research. It analyzes GitHub repositories to detect green software engineering practices by matching commit messages and code changes against the Green Software Foundation (GSF) pattern catalog. The library supports energy measurement during analysis using Intel RAPL, CodeCarbon, or CPU utilization-based estimation, and provides statistical, temporal, and qualitative analysis capabilities.

Version: 1.1.9 License: MIT PyPI: greenmining Documentation: greenmining.readthedocs.io


File Tree

greenmining/
    __init__.py
    __main__.py
    __version__.py
    config.py
    gsf_patterns.py
    utils.py
    models/
        __init__.py
        repository.py
        commit.py
        analysis_result.py
        aggregated_stats.py
    services/
        __init__.py
        github_fetcher.py          (deprecated)
        github_graphql_fetcher.py
        commit_extractor.py
        data_analyzer.py
        data_aggregator.py
        local_repo_analyzer.py
        reports.py
    analyzers/
        __init__.py
        statistical_analyzer.py
        temporal_analyzer.py
        qualitative_analyzer.py
        code_diff_analyzer.py
        metrics_power_correlator.py
        power_regression.py
        version_power_analyzer.py
    energy/
        __init__.py
        base.py
        rapl.py
        cpu_meter.py
        codecarbon_meter.py
        carbon_reporter.py
    controllers/
        __init__.py
        repository_controller.py
    presenters/
        __init__.py
        console_presenter.py

Module Reference

greenmining/__init__.py

Top-level package entry point. Exposes the two main high-level API functions and the GSF pattern utilities.

Function Parameters Description
fetch_repositories() github_token, max_repos, min_stars, languages, keywords, created_after, created_before, pushed_after, pushed_before Search GitHub for repositories using GraphQL API v4. Returns a list of Repository objects matching the given filters.
analyze_repositories() urls, max_commits, parallel_workers, output_format, energy_tracking, energy_backend, method_level_analysis, include_source_code, ssh_key_path, github_token, since_date, to_date Clone and analyze multiple repositories from URLs using PyDriller. Supports energy measurement, method-level Lizard metrics, and source code extraction. Returns a list of RepositoryAnalysis objects.

Exports: Config, GSF_PATTERNS, GREEN_KEYWORDS, is_green_aware, get_pattern_by_keywords, fetch_repositories, analyze_repositories, __version__


greenmining/__main__.py

Allows running as python -m greenmining. Prints version and usage information.


greenmining/config.py

Configuration management supporting .env files, environment variables, and YAML configuration (greenmining.yaml).

_load_yaml_config(yaml_path)

Loads YAML configuration from file. Returns empty dict if the file does not exist or PyYAML is not installed.

class Config

Attribute Default Source Description
GITHUB_TOKEN required env GitHub personal access token
GITHUB_SEARCH_KEYWORDS ["microservices", ...] YAML/env Search keywords for repository discovery
SUPPORTED_LANGUAGES ["Java", "Python", "Go", ...] YAML/env Languages to filter
MAX_REPOS 100 env Maximum repositories to fetch
COMMITS_PER_REPO 50 YAML/env Maximum commits per repository
DAYS_BACK 730 YAML/env Analysis time window in days
SKIP_MERGES True YAML Skip merge commits
MIN_STARS 100 YAML/env Minimum stars filter
ENERGY_ENABLED False YAML/env Enable energy measurement
ENERGY_BACKEND "rapl" YAML/env Energy backend selection
CARBON_TRACKING False YAML/env Enable CO2 tracking
COUNTRY_ISO "USA" YAML/env Country for carbon intensity
OUTPUT_DIR ./data YAML/env Output directory for results
Method Description
__init__(env_file, yaml_file) Load configuration from environment and YAML. YAML values take precedence for supported options.
validate() Validate that all required configuration attributes are present.
_parse_repository_urls(urls_str) Parse comma-separated repository URLs from environment variable.

get_config(env_file)

Singleton factory that returns or creates a global Config instance.


greenmining/gsf_patterns.py

Contains the Green Software Foundation pattern catalog and keyword matching logic.

GSF_PATTERNS -- Dictionary of 124 green software patterns across 15 categories:

Category Count Examples
cloud 35+ Cache Static Data, Autoscaling, Serverless, Right-size Resources
web 15+ Lazy Loading, Minimize Data Transfer, Optimize Images
ai 15+ Model Quantization, Knowledge Distillation, Early Stopping
database 5+ Database Indexing, Query Optimization, Prepared Statements
networking/network 8+ Connection Pooling, gRPC Optimization, API Gateway
general 8+ Async Processing, Batch Processing, Memoization
resource 2 Resource Limits, Dynamic Resource Allocation
caching 2 Multi-Level Caching, Cache Invalidation Strategy
data 3 Data Deduplication, Efficient Serialization, Pagination
async 3 Event-Driven Architecture, Eliminate Polling, Reactive Streams
code 4 Algorithm Optimization, Code Efficiency, Garbage Collection Tuning
monitoring 3 Energy-Aware Monitoring, Performance Profiling, APM
microservices 4 Service Decomposition, Service Co-location, Graceful Shutdown
infrastructure 4 Minimal Container Images, Renewable Energy Regions, IaC

Each pattern has: name, category, keywords (list), description, sci_impact.

GREEN_KEYWORDS -- List of 332 keywords used for green awareness detection (e.g., "energy", "cache", "optimize", "serverless", "quantization").

Function Parameters Description
get_pattern_by_keywords(commit_message) commit_message: str Match a commit message against all GSF patterns. Returns list of matched pattern names.
is_green_aware(commit_message) commit_message: str Check if a commit message contains any green software keyword. Returns boolean.

greenmining/utils.py

Utility functions for file I/O, formatting, retry logic, and console output.

Function Parameters Description
format_timestamp(dt) dt: Optional[datetime] Format datetime as ISO 8601 string. Defaults to utcnow().
load_json_file(path) path: Path Load and parse a JSON file.
save_json_file(data, path, indent) data: dict, path: Path, indent: int Save data to JSON file, creating parent directories.
load_csv_file(path) path: Path Load CSV file as pandas DataFrame.
save_csv_file(df, path) df: DataFrame, path: Path Save DataFrame to CSV file.
estimate_tokens(text) text: str Estimate token count (len/4).
estimate_cost(tokens, model) tokens: int, model: str Estimate API cost based on Claude Sonnet 4 pricing.
retry_on_exception(max_retries, delay, exponential_backoff, exceptions) decorator args Decorator that retries a function on exception with configurable backoff.
colored_print(text, color) text: str, color: str Print colored text using colorama. Supported colors: red, green, yellow, blue, magenta, cyan, white.
handle_github_rate_limit(response) response Check for HTTP 403 and raise exception on rate limit.
format_number(num) num: int Format number with thousand separators.
format_percentage(value, decimals) value: float, decimals: int Format float as percentage string.
format_duration(seconds) seconds: float Format duration as human-readable string (e.g., "2m 30s").
truncate_text(text, max_length) text: str, max_length: int Truncate text with "..." suffix.
create_checkpoint(checkpoint_file, data) checkpoint_file: Path, data: dict Save checkpoint JSON for resumable operations.
load_checkpoint(checkpoint_file) checkpoint_file: Path Load checkpoint data if file exists.
print_banner(title) title: str Print formatted banner with decorators.
print_section(title) title: str Print section header with separator line.

Models

greenmining/models/repository.py

class Repository (dataclass)

Represents a GitHub repository with all metadata.

Field Type Description
repo_id int Sequential identifier
name str Repository name
owner str Repository owner/organization
full_name str owner/name format
url str HTML URL
clone_url str Git clone URL
language Optional[str] Primary programming language
stars int Star count
forks int Fork count
watchers int Watcher count
open_issues int Open issue count
last_updated str Last update ISO date
created_at str Creation ISO date
description Optional[str] Repository description
main_branch str Default branch name
topics list[str] Repository topics
size int Repository size in KB
archived bool Whether archived
license Optional[str] License key
Method Description
to_dict() Convert to dictionary.
from_dict(data) Class method: create from dictionary.
from_github_repo(repo, repo_id) Class method: create from PyGithub repository object.

greenmining/models/commit.py

class Commit (dataclass)

Represents a Git commit with metadata.

Field Type Description
commit_id str Commit SHA hash
repo_name str Repository full name
date str Commit date ISO string
author str Author name
author_email str Author email
message str Commit message
files_changed list[str] Modified file paths
lines_added int Lines added
lines_deleted int Lines deleted
insertions int Insertions count
deletions int Deletions count
is_merge bool Whether this is a merge commit
in_main_branch bool Whether commit is in main branch
Method Description
to_dict() Convert to dictionary.
from_dict(data) Class method: create from dictionary.
from_pydriller_commit(commit, repo_name) Class method: create from PyDriller commit object.

greenmining/models/analysis_result.py

class AnalysisResult (dataclass)

Represents the analysis result for a single commit.

Field Type Description
commit_id str Commit SHA
repo_name str Repository name
date str Commit date
commit_message str Full commit message
green_aware bool Whether commit is green-aware
green_evidence Optional[str] Evidence for green classification
known_pattern Optional[str] Matched GSF pattern name
pattern_confidence Optional[str] Confidence level (HIGH/MEDIUM/LOW)
emergent_pattern Optional[str] Novel pattern description
files_changed list Modified files
lines_added int Lines added
lines_deleted int Lines deleted

greenmining/models/aggregated_stats.py

class AggregatedStats (dataclass)

Holds aggregated analysis statistics.

Field Type Description
summary dict Overall summary statistics
known_patterns dict Pattern frequency data
repositories list[dict] Per-repository statistics
languages dict Per-language statistics
timestamp Optional[str] Aggregation timestamp

Services

greenmining/services/github_graphql_fetcher.py

class GitHubGraphQLFetcher

Fetches repositories and commits from GitHub using GraphQL API v4. Handles pagination and rate limiting.

Method Parameters Description
__init__(token) token: str Initialize with GitHub personal access token.
search_repositories(keywords, max_repos, min_stars, languages, created_after, created_before, pushed_after, pushed_before) see params Search for repositories matching criteria. Paginates automatically. Returns list of Repository objects.
get_repository_commits(owner, name, max_commits) owner: str, name: str, max_commits: int Fetch commit history for a specific repository. Returns list of commit dictionaries.
save_results(repositories, output_file) repositories: List[Repository], output_file: str Save repository list to JSON file.
_build_search_query(...) internal Build GitHub search query string with filters (stars, languages, dates).
_execute_query(query, variables) internal Execute a GraphQL query against GitHub API.
_parse_repository(node, repo_id) internal Parse GraphQL response node into Repository object.

greenmining/services/data_analyzer.py

class DataAnalyzer

Analyzes commits for green software patterns using GSF keywords and optional code diff analysis.

Method Parameters Description
__init__(batch_size, enable_diff_analysis) batch_size: int, enable_diff_analysis: bool Initialize with GSF patterns. Optionally enables CodeDiffAnalyzer for deeper code inspection.
analyze_commits(commits, resume_from) commits: list, resume_from: int Analyze a list of commit dictionaries. Returns analysis results with green awareness, matched patterns, confidence, and metadata.
save_results(results, output_file) results: list, output_file: Path Save analysis results to JSON with summary statistics.
_analyze_commit(commit) internal Analyze a single commit: check green awareness via is_green_aware(), match GSF patterns via get_pattern_by_keywords(), calculate confidence, and optionally run diff analysis.
_check_green_awareness(message, files) internal Check commit message and file names for green keywords.
_detect_known_pattern(message, files) internal Detect known green software pattern from message and file names.

greenmining/services/data_aggregator.py

class DataAggregator

Aggregates analysis results and generates summary statistics. Optionally integrates statistical and temporal analysis.

Method Parameters Description
__init__(enable_stats, enable_temporal, temporal_granularity) enable_stats: bool, enable_temporal: bool, temporal_granularity: str Initialize aggregator. When enable_stats=True, creates a StatisticalAnalyzer. When enable_temporal=True, creates a TemporalAnalyzer.
aggregate(analysis_results, repositories) analysis_results: list, repositories: list Compute summary, pattern analysis, per-repo stats, per-language stats. Optionally adds statistical analysis (trends, correlations, effect sizes) and temporal trend analysis.
save_results(aggregated_data, json_file, csv_file, analysis_results) see params Save aggregated data to JSON and detailed results to CSV.
print_summary(aggregated_data) aggregated_data: dict Print formatted summary tables to console using tabulate.
_generate_summary(results, repos) internal Calculate total commits, green count, percentage, repos with green commits.
_analyze_known_patterns(results) internal Count and rank detected GSF patterns with confidence breakdown.
_analyze_emergent_patterns(results) internal Collect novel patterns not in the GSF catalog.
_generate_repo_stats(results, repos) internal Compute per-repository green commit statistics.
_generate_language_stats(results, repos) internal Compute per-language green commit statistics.
_generate_statistics(results) internal Run temporal trends (Mann-Kendall), pattern correlations, effect sizes (Cohen's d), and descriptive statistics.

greenmining/services/local_repo_analyzer.py

The core analysis engine. Clones repositories from URLs and analyzes commits using PyDriller.

class MethodMetrics (dataclass)

Per-method analysis metrics extracted via Lizard integration through PyDriller.

Field Type Description
name str Method name
long_name str Fully qualified method name
filename str Source file name
nloc int Lines of code
complexity int Cyclomatic complexity
token_count int Token count
parameters int Parameter count
start_line int Start line number
end_line int End line number

class SourceCodeChange (dataclass)

Source code before/after a commit for refactoring detection.

Field Type Description
filename str File name
source_code_before Optional[str] Source code before the commit
source_code_after Optional[str] Source code after the commit
diff Optional[str] Unified diff
added_lines int Lines added
deleted_lines int Lines deleted
change_type str ADD, DELETE, MODIFY, or RENAME

class CommitAnalysis (dataclass)

Full analysis result for a single commit, including GSF patterns, DMM metrics, structural metrics, method-level analysis, source code, and energy data.

Field Type Description
hash str Commit SHA
message str Commit message
author / author_email str Author info
date datetime Author date
green_aware bool Green awareness flag
gsf_patterns_matched List[str] Matched GSF pattern names
pattern_count int Number of patterns matched
pattern_details List[Dict] Full pattern info (name, category, description, sci_impact)
confidence str high / medium / low
files_modified List[str] Modified file names
insertions / deletions int Line change counts
dmm_unit_size Optional[float] Delta Maintainability Model: unit size
dmm_unit_complexity Optional[float] DMM: unit complexity
dmm_unit_interfacing Optional[float] DMM: unit interfacing
total_nloc int Total lines of code across modified files
total_complexity int Total cyclomatic complexity
max_complexity int Maximum complexity of any modified file
methods_count int Total methods across modified files
methods List[MethodMetrics] Per-method metrics (when method_level_analysis=True)
source_changes List[SourceCodeChange] Source code changes (when include_source_code=True)
energy_joules Optional[float] Energy consumed (when energy_tracking=True)
energy_watts_avg Optional[float] Average power draw

class RepositoryAnalysis (dataclass)

Complete analysis result for a repository.

Field Type Description
url str Repository URL
name str Repository full name
total_commits int Total commits analyzed
green_commits int Green-aware commit count
green_commit_rate float Green commit percentage
commits List[CommitAnalysis] Per-commit analysis results
process_metrics Dict PyDriller process metrics
energy_metrics Optional[Dict] Energy measurement results

class LocalRepoAnalyzer

Method Parameters Description
__init__(clone_path, max_commits, days_back, skip_merges, compute_process_metrics, cleanup_after, ssh_key_path, github_token, energy_tracking, energy_backend, method_level_analysis, include_source_code, process_metrics, since_date, to_date) see params Initialize analyzer with all analysis options.
analyze_repository(url) url: str Clone and analyze a single repository. Handles authentication (HTTPS token injection, SSH key). Creates a fresh energy meter per repository for thread safety. Returns RepositoryAnalysis.
analyze_repositories(urls, parallel_workers, output_format) urls: List[str], parallel_workers: int, output_format: str Analyze multiple repositories sequentially or in parallel using ThreadPoolExecutor.
analyze_commit(commit) commit (PyDriller) Analyze a single PyDriller commit object. Extracts green awareness, GSF patterns, DMM metrics, structural metrics, optional method-level and source code data.
_compute_process_metrics(repo_path) internal Compute 8 PyDriller process metrics: ChangeSet, CodeChurn, CommitsCount, ContributorsCount, ContributorsExperience, HistoryComplexity, HunksCount, LinesCount.
_prepare_auth_url(url) internal Inject GitHub token into HTTPS URL for private repository access.
_setup_ssh_env() internal Configure SSH environment for private repository cloning.
_parse_repo_url(url) internal Parse owner and name from HTTPS or SSH GitHub URLs.
_extract_method_metrics(commit) internal Extract per-method Lizard metrics from modified files.
_extract_source_changes(commit) internal Extract source code before/after for each modified file.

greenmining/services/reports.py

class ReportGenerator

Generates comprehensive Markdown reports from aggregated analysis data.

Method Parameters Description
generate_report(aggregated_data, analysis_data, repos_data) see params Generate a full Markdown report with sections: Header, Executive Summary, Methodology, Results, Discussion, Limitations, Conclusion.
save_report(report_content, output_file) report_content: str, output_file: Path Write report to Markdown file.
_generate_header() internal Report title and metadata.
_generate_executive_summary(data) internal Key findings, percentage summaries, implications.
_generate_methodology(repos_data, analysis_data) internal Repository selection criteria, data extraction approach, analysis methodology (Q1/Q2/Q3).
_generate_results(data) internal Green awareness section, known patterns table, emergent patterns, per-repo analysis, statistics.
_generate_discussion(data) internal Interpretation, developer approaches, gap analysis, implications.
_generate_limitations() internal Sample bias, commit message limitations, scope limitations.
_generate_conclusion(data) internal Key findings, research question answers, recommendations.

Analyzers

greenmining/analyzers/statistical_analyzer.py

class StatisticalAnalyzer

Advanced statistical analysis using scipy and numpy.

Method Parameters Description
analyze_pattern_correlations(commit_data) commit_data: DataFrame Compute Pearson correlation matrix between pattern columns. Identifies significant pairs (
temporal_trend_analysis(commits_df) commits_df: DataFrame Monthly aggregation, Mann-Kendall trend test, optional seasonal decomposition (requires 24+ months), change point detection via rolling variance. Handles timezone-aware datetimes.
effect_size_analysis(group1, group2) group1: List[float], group2: List[float] Cohen's d effect size with magnitude classification (negligible/small/medium/large). Includes independent t-test for significance.
pattern_adoption_rate_analysis(commits_df) commits_df: DataFrame Analyze time-to-first-adoption, monthly adoption frequency, and pattern stickiness per pattern.

greenmining/analyzers/temporal_analyzer.py

class TemporalMetrics (dataclass)

Metrics for a specific time period: commit count, green count, green rate, unique patterns, dominant pattern, velocity.

class TrendAnalysis (dataclass)

Trend analysis results: direction (increasing/decreasing/stable), slope, R-squared, start/end rates, change percentage.

class TemporalAnalyzer

Analyze temporal patterns in green software adoption over configurable time granularities.

Method Parameters Description
__init__(granularity) granularity: str Initialize with time granularity: "day", "week", "month", "quarter", or "year".
group_commits_by_period(commits, date_field) commits: List[Dict], date_field: str Group commits into time periods based on granularity. Handles both datetime objects and ISO strings.
calculate_period_metrics(period_key, commits, analysis_results) see params Calculate TemporalMetrics for a single period.
analyze_trends(commits, analysis_results) commits: List[Dict], analysis_results: List[Dict] Full temporal analysis: period metrics, linear trend (least squares), cumulative adoption curve, velocity trend, and pattern evolution timeline.

greenmining/analyzers/qualitative_analyzer.py

class ValidationSample (dataclass)

Represents a single validation sample with commit data, detected patterns, and manual review fields.

class ValidationMetrics (dataclass)

Precision, recall, F1 score, and accuracy metrics.

class QualitativeAnalyzer

Framework for manual validation and inter-rater reliability assessment.

Method Parameters Description
__init__(sample_size, stratify_by) sample_size: int, stratify_by: str Initialize with sample size (default 30) and stratification method ("pattern" or "repository").
generate_validation_samples(commits, analysis_results, include_negatives) see params Generate stratified validation samples. 80% positive / 20% negative split for false-negative detection.
export_samples_for_review(output_path) output_path: str Export samples to JSON for manual review. Includes instructions for reviewers.
import_validated_samples(input_path) input_path: str Import manually validated samples from JSON. Updates sample statuses.
calculate_metrics() none Calculate precision, recall, F1, and accuracy from validated samples.
get_validation_report() none Generate comprehensive report: sampling info, metrics, error analysis (false positives/negatives), per-pattern accuracy.
get_inter_rater_reliability(samples_a, samples_b) two lists of ValidationSample Calculate Cohen's Kappa for inter-rater reliability with interpretation (slight/fair/moderate/substantial/almost perfect).

greenmining/analyzers/code_diff_analyzer.py

class CodeDiffAnalyzer

Analyze code diffs to detect green software patterns in actual code changes. Contains regex-based pattern signatures for 13 categories:

  • caching -- imports, annotations (@cache, @lru_cache), function calls, variable names
  • resource_optimization -- Kubernetes resource limits, Docker optimization
  • database_optimization -- indexes, query optimization, connection pooling
  • async_processing -- async/await, ThreadPoolExecutor, Celery
  • lazy_loading -- lazy, defer, dynamic import
  • serverless_computing -- AWS Lambda, Azure Functions, serverless frameworks
  • cdn_edge -- CloudFront, Cloudflare, edge caching
  • compression -- gzip, brotli, zstd, lz4
  • model_optimization -- quantization, pruning, ONNX, TensorRT
  • efficient_protocols -- HTTP/2, gRPC, protobuf
  • container_optimization -- Alpine, distroless, multi-stage builds
  • green_regions -- renewable energy regions
  • auto_scaling -- HPA, KEDA, scale-to-zero
  • code_splitting -- React.lazy, Suspense, dynamic import
  • green_ml_training -- early stopping, mixed precision, gradient checkpointing
Method Parameters Description
analyze_commit_diff(commit) commit: Commit (PyDriller) Analyze all modified files in a commit. Returns patterns detected, evidence (file:line), confidence score, and code metrics.
_detect_patterns_in_line(code_line) internal Match a single line against all pattern signatures.
_calculate_metrics(commit) internal Calculate lines added/removed, files changed, net lines, complexity change.
_calculate_diff_confidence(patterns, evidence, metrics) internal Confidence scoring: high (3+ patterns, 5+ evidence), medium (2+ patterns, 3+ evidence), low.
_is_code_file(modified_file) internal Check if file is code (.py, .java, .go, etc.) or Kubernetes manifest.

greenmining/analyzers/metrics_power_correlator.py

class CorrelationResult (dataclass)

Result of a metrics-to-power correlation: Pearson r/p, Spearman r/p, significance, strength classification.

class MetricsPowerCorrelator

Correlate code metrics (complexity, NLOC, churn) with power consumption measurements.

Method Parameters Description
__init__(significance_level) significance_level: float Initialize with p-value threshold (default 0.05).
fit(metrics, metrics_values, power_measurements) metrics: List[str], metrics_values: Dict, power_measurements: List[float] Compute Pearson and Spearman correlations for each metric against power data. Requires at least 3 data points. Computes feature importance (normalized absolute Spearman).
pearson (property) none Get Pearson correlation values for all metrics.
spearman (property) none Get Spearman correlation values for all metrics.
feature_importance (property) none Get normalized feature importance scores.
get_results() none Get all CorrelationResult objects.
get_significant_correlations() none Filter to only statistically significant results.
summary() none Generate summary with counts, correlations, feature importance, strongest positive/negative.

greenmining/analyzers/power_regression.py

class PowerRegression (dataclass)

A detected power regression: commit SHA, message, author, date, power before/after (watts), energy before/after (joules), percentage increase.

class PowerRegressionDetector

Detect commits that caused power consumption regressions by running a test command at each commit.

Method Parameters Description
__init__(test_command, energy_backend, threshold_percent, iterations, warmup_iterations) see params Initialize detector. Default: pytest tests/ -x, RAPL backend, 5% threshold, 5 iterations, 1 warmup.
detect(repo_path, baseline_commit, target_commit, max_commits) see params Iterate through commits from baseline to target. At each commit: checkout, run test command, measure energy. Flag commits where energy increased above threshold. Returns list of PowerRegression objects.

greenmining/analyzers/version_power_analyzer.py

class VersionPowerProfile (dataclass)

Power profile for a single version: version tag, commit SHA, energy (joules), power (watts avg), duration, iterations, energy standard deviation.

class VersionPowerReport (dataclass)

Complete power analysis report across versions: list of profiles, trend direction, total change %, most/least efficient versions.

Method Description
to_dict() Convert to dictionary.
summary() Generate human-readable summary string.

class VersionPowerAnalyzer

Measure and compare power consumption across software versions/tags.

Method Parameters Description
__init__(test_command, energy_backend, iterations, warmup_iterations) see params Initialize with test command and measurement settings. Default: 10 iterations, 2 warmup.
analyze_versions(repo_path, versions) repo_path: str, versions: List[str] Measure energy for each version (checkout, warmup, measure N iterations). Returns VersionPowerReport with trend analysis (increasing/decreasing/stable based on 5% threshold).

Energy

greenmining/energy/base.py

Core abstractions for energy measurement.

class EnergyBackend (Enum)

Supported backends: RAPL, CODECARBON, CPU_METER.

class EnergyMetrics (dataclass)

Energy measurement results.

Field Type Description
joules float Total energy consumed
watts_avg float Average power draw
watts_peak float Peak power draw
duration_seconds float Measurement duration
cpu_energy_joules float CPU-specific energy
dram_energy_joules float Memory energy
gpu_energy_joules Optional[float] GPU energy
carbon_grams Optional[float] CO2 equivalent in grams
carbon_intensity Optional[float] Grid carbon intensity (gCO2/kWh)
backend str Backend name
start_time / end_time Optional[datetime] Measurement timestamps
Property Description
energy_joules Alias for joules.
average_power_watts Alias for watts_avg.

class CommitEnergyProfile (dataclass)

Energy profile comparing a commit to its parent: energy_before, energy_after, energy_delta, energy_regression, regression_percentage.

class EnergyMeter (ABC)

Abstract base class for all energy measurement backends.

Method Description
is_available() Check if this backend works on the current system.
start() Begin energy measurement.
stop() Stop measurement, return EnergyMetrics.
measure(func, *args, **kwargs) Measure energy of a function call. Returns (result, EnergyMetrics).
measure_command(command, timeout) Measure energy of a shell command.
__enter__ / __exit__ Context manager support.

get_energy_meter(backend)

Factory function. Supported values: "rapl", "codecarbon", "cpu_meter", "cpu", "auto". Auto mode tries RAPL first (most accurate), falls back to CPU meter.


greenmining/energy/rapl.py

class RAPLEnergyMeter

Intel RAPL (Running Average Power Limit) energy measurement for Linux. Reads directly from /sys/class/powercap/intel-rapl.

Method Description
__init__() Discover available RAPL domains (package, core, dram, uncore).
is_available() Check if RAPL sysfs interface exists and is readable.
start() Record starting energy values for all domains.
stop() Calculate energy delta per domain. Handles 32-bit counter wrap-around. Returns EnergyMetrics with CPU, DRAM, and GPU (uncore) breakdowns.
get_available_domains() List discovered RAPL domain names.

greenmining/energy/cpu_meter.py

class CPUEnergyMeter

Cross-platform CPU energy estimation. Works on Linux, macOS, and Windows. Estimates power from CPU utilization percentage and TDP.

Power model: P = P_idle + (P_max - P_idle) * utilization where idle power is 30% of TDP.

Method Parameters Description
__init__(tdp_watts, sample_interval) tdp_watts: Optional[float], sample_interval: float Initialize. Auto-detects TDP from RAPL sysfs on Linux, or uses platform defaults (Linux: 65W, macOS: 30W, Windows: 65W).
is_available() none Always returns True (universal fallback).
start() none Begin measurement, prime psutil.
stop() none Calculate estimated energy from CPU utilization samples.

greenmining/energy/codecarbon_meter.py

class CodeCarbonMeter

Energy measurement with CO2 tracking via the CodeCarbon library. Provides carbon emissions in addition to energy data.

Method Parameters Description
__init__(project_name, output_dir, save_to_file) see params Initialize CodeCarbon tracker.
is_available() none Check if codecarbon package is installed.
start() none Create and start EmissionsTracker.
stop() none Stop tracker, extract energy (kWh to joules), emissions (kg to grams), and carbon intensity. Handles CodeCarbon v3.x Energy objects.
get_carbon_intensity() none Query current grid carbon intensity for the configured region.

greenmining/energy/carbon_reporter.py

CARBON_INTENSITY_BY_COUNTRY

Dictionary of average carbon intensity (gCO2/kWh) for 20 countries. Source: Electricity Maps, IEA.

CLOUD_REGION_INTENSITY

Dictionary of carbon intensity by cloud provider region for AWS (14 regions), GCP (9 regions), and Azure (8 regions).

class CarbonReport (dataclass)

Carbon emissions report: total energy (kWh), emissions (kg), carbon intensity, equivalents (tree-months, smartphone charges, km driven).

class CarbonReporter

Generate carbon footprint reports from energy measurements.

Method Parameters Description
__init__(country_iso, cloud_provider, region) see params Initialize with location. Cloud region intensity takes priority over country average.
generate_report(energy_metrics, analysis_results, total_joules) see params Generate CarbonReport from energy data. Converts joules to kWh, applies carbon intensity, calculates equivalents.
get_carbon_intensity() none Get the configured carbon intensity value.
get_supported_countries() static List supported country ISO codes.
get_supported_cloud_regions(provider) static List supported regions for a cloud provider.

Controllers

greenmining/controllers/repository_controller.py

class RepositoryController

Orchestrates repository fetching operations using the GraphQL fetcher and configuration.

Method Parameters Description
__init__(config) config: Config Initialize with Config object. Creates GitHubGraphQLFetcher.
fetch_repositories(max_repos, min_stars, languages, keywords, created_after, created_before, pushed_after, pushed_before) see params Fetch repositories via GraphQL, save to JSON file. Parameters default to Config values.
load_repositories() none Load previously fetched repositories from JSON file.
get_repository_stats(repositories) repositories: list[Repository] Compute statistics: total count, by-language breakdown, total/avg stars, top repository.

Presenters

greenmining/presenters/console_presenter.py

class ConsolePresenter

Handles formatted console output using tabulate and colorama.

Method Parameters Description
show_banner() static Display application banner.
show_repositories(repositories, limit) static Display repository table (name, language, stars, description).
show_commit_stats(stats) static Display commit statistics table.
show_analysis_results(results) static Display analysis results summary.
show_pattern_distribution(patterns, limit) static Display top N green patterns with counts and percentages.
show_pipeline_status(status) static Display pipeline phase status.
show_progress_message(phase, current, total) static Display progress percentage.
show_error(message) static Print error in red.
show_success(message) static Print success in green.
show_warning(message) static Print warning in yellow.