Skip to content

SPARQL Query Utilities

The GKC SPARQL module provides utilities for executing SPARQL queries against Wikidata, DBpedia, or any other SPARQL endpoint. It supports multiple input formats (raw queries, Wikidata URLs) and output types (JSON, DataFrames, CSV).

Features

  • Multiple Input Formats: Raw SPARQL queries or Wikidata Query Service URLs
  • Multiple Output Types: JSON objects, Python dictionaries, pandas DataFrames, CSV
  • Error Handling: Comprehensive error messages and custom exceptions
  • Custom Endpoints: Query any SPARQL endpoint, not just Wikidata
  • Pandas Integration: Optional pandas support for data analysis
  • Clean API: Both class-based and convenience functions

Installation

The SPARQL module is included in GKC. For optional pandas support:

pip install pandas

Quick Start

Basic Query

from gkc.sparql import SPARQLQuery

executor = SPARQLQuery()
results = executor.query("""
    SELECT ?item ?itemLabel WHERE {
      ?item wdt:P31 wd:Q146 .
      SERVICE wikibase:label {
        bd:serviceParam wikibase:language "en" .
      }
    }
    LIMIT 5
""")
print(results)

Query from Wikidata URL

If you build a query using the Wikidata Query Service (WDQS), the URL can be copied and pasted from a web browser and then parsed by the executor query function. Note that this is the full URL from the web browser's address line and not the short URL generated by WDQS. You can share SPARQL queries as Wikidata Query Service URLs:

url = "https://query.wikidata.org/#SELECT%20?item%20WHERE%20..."
results = executor.query(url)

Convert to DataFrame

df = executor.to_dataframe("""
    SELECT ?item ?itemLabel WHERE {
      ?item wdt:P31 wd:Q146 .
      SERVICE wikibase:label {
        bd:serviceParam wikibase:language "en" .
      }
    }
""")
print(df.head())

Export to CSV

executor.to_csv(query, filepath="results.csv")

API Reference

SPARQLQuery Class

Main class for executing SPARQL queries.

Constructor

SPARQLQuery(
    endpoint: str = "https://query.wikidata.org/sparql",
    user_agent: str = "GKC-SPARQL/1.0",
    timeout: int = 30
)

Parameters: - endpoint: SPARQL endpoint URL (default: Wikidata) - user_agent: User agent string for HTTP requests - timeout: Request timeout in seconds

Methods

query(query: str, format: str = "json", raw: bool = False) -> Any

Execute a SPARQL query and return raw results.

Parameters: - query: SPARQL query string or Wikidata Query Service URL - format: Response format ('json', 'xml', 'csv', 'tsv') - raw: If False, parse JSON to dict; if True, return raw string

Returns: Query results (dict for JSON, str for others)

Raises: SPARQLError if query fails

Example:

executor = SPARQLQuery()
results = executor.query("SELECT ?item WHERE { ?item wdt:P31 wd:Q5 }")
to_dict_list(query: str) -> list[dict[str, str]]

Execute query and return results as list of dictionaries.

Returns: List of dicts with variable names as keys

Example:

results = executor.to_dict_list("SELECT ?item ?itemLabel WHERE { ... }")
for row in results:
    print(row)
to_dataframe(query: str) -> pd.DataFrame

Execute query and return results as pandas DataFrame.

Requires: pandas package

Returns: pandas DataFrame

Raises: SPARQLError if pandas not installed

Example:

df = executor.to_dataframe("SELECT ?item ?itemLabel WHERE { ... }")
print(df.head())
to_csv(query: str, filepath: Optional[str] = None) -> str

Execute query and convert results to CSV.

Parameters: - query: SPARQL query - filepath: Optional file path to save results

Returns: CSV string

Example:

# Get CSV data
csv_data = executor.to_csv("SELECT ?item ?itemLabel WHERE { ... }")

# Save to file
executor.to_csv("SELECT ...", filepath="results.csv")
parse_wikidata_query_url(url: str) -> str (static)

Extract and decode SPARQL query from Wikidata Query Service URL.

Parameters: - url: Wikidata Query Service URL

Returns: Decoded SPARQL query string

Raises: SPARQLError if URL is invalid

Example:

url = "https://query.wikidata.org/#SELECT%20?item%20WHERE%20..."
query = SPARQLQuery.parse_wikidata_query_url(url)
normalize_query(query: str) -> str (static)

Normalize query string (extract from URL if needed).

Parameters: - query: SPARQL query string or Wikidata URL

Returns: Normalized query string

Example:

# Both work
query1 = SPARQLQuery.normalize_query("SELECT ?item WHERE { ... }")
query2 = SPARQLQuery.normalize_query("https://query.wikidata.org/#SELECT%20...")

Convenience Functions

execute_sparql(query: str, endpoint: str = DEFAULT_WIKIDATA_ENDPOINT, format: str = "json") -> Any

Quick function to execute a single query.

from gkc.sparql import execute_sparql

results = execute_sparql("SELECT ?item WHERE { ?item wdt:P31 wd:Q5 }")

execute_sparql_to_dataframe(query: str, endpoint: str = DEFAULT_WIKIDATA_ENDPOINT) -> pd.DataFrame

Quick function to execute query and return DataFrame.

from gkc.sparql import execute_sparql_to_dataframe

df = execute_sparql_to_dataframe("SELECT ?item ?itemLabel WHERE { ... }")

Exceptions

SPARQLError

Raised when a SPARQL query fails.

from gkc.sparql import SPARQLError

try:
    results = executor.query("INVALID SPARQL")
except SPARQLError as e:
    print(f"Query failed: {e}")

Input Formats

Raw SPARQL Query

query = """
SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31 wd:Q146 .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
"""
results = executor.query(query)

Wikidata Query Service URL

# Share queries as URLs
url = "https://query.wikidata.org/#SELECT%20?item%20WHERE%20%7B%0A%09?item%20wdt:P31%20wd:Q146%20."

# Execute directly
results = executor.query(url)

# Extract query
query = SPARQLQuery.parse_wikidata_query_url(url)

Output Formats

JSON (Default)

results = executor.query(query)
# Returns: {"head": {"vars": [...]}, "results": {"bindings": [...]}}

Dictionary List

results = executor.to_dict_list(query)
# Returns: [{"item": "Q1", "itemLabel": "One"}, ...]

Pandas DataFrame

df = executor.to_dataframe(query)
# Returns: pandas DataFrame with results

CSV

csv_data = executor.to_csv(query)
# Returns: "item,itemLabel\nQ1,One\n..."

Examples

Example 1: Find Cities with Large Populations

from gkc.sparql import SPARQLQuery

executor = SPARQLQuery()

query = """
SELECT ?item ?itemLabel ?population WHERE {
  ?item wdt:P31 wd:Q3624078 .
  ?item wdt:P1082 ?population .
  FILTER(?population > 5000000)
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
ORDER BY DESC(?population)
LIMIT 10
"""

results = executor.to_dict_list(query)
for row in results:
    print(f"{row['itemLabel']}: {row['population']}")

Example 2: Data Analysis with DataFrame

from gkc.sparql import execute_sparql_to_dataframe

df = execute_sparql_to_dataframe("""
SELECT ?item ?itemLabel ?population WHERE {
  ?item wdt:P31 wd:Q3624078 .
  ?item wdt:P1082 ?population .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
""")

# Analyze with pandas
df['population'] = pd.to_numeric(df['population'], errors='coerce')
top_10 = df.nlargest(10, 'population')
print(top_10)

Example 3: Custom Endpoint

# Query DBpedia instead of Wikidata
executor = SPARQLQuery(endpoint="https://dbpedia.org/sparql")

query = """
SELECT ?resource ?label WHERE {
  ?resource rdf:type dbo:Animal .
  ?resource rdfs:label ?label .
  FILTER(LANG(?label) = 'en')
}
LIMIT 10
"""

results = executor.query(query)

Example 4: Error Handling

from gkc.sparql import SPARQLError, SPARQLQuery

executor = SPARQLQuery()

try:
    results = executor.query("INVALID SPARQL SYNTAX")
except SPARQLError as e:
    print(f"Error: {e}")

try:
    results = executor.query("https://invalid-url.com/#SELECT%20*")
except SPARQLError as e:
    print(f"URL parsing error: {e}")

Best Practices

  1. Use Wikidata URLs for Sharing: Share queries as Wikidata URLs for easy collaboration
  2. Handle Errors: Always wrap queries in try-except blocks
  3. Use DataFrames for Analysis: Convert to DataFrame for complex data analysis
  4. Set Timeout: Adjust timeout for complex queries
  5. Limit Results: Always use LIMIT to avoid huge result sets
  6. Cache Results: For repeated queries, cache the results

Wikidata Resources

See Also