SPARQL Query Utilities

The GKC SPARQL module provides utilities for executing SPARQL queries against Wikidata, DBpedia, or any other SPARQL endpoint. It supports multiple input formats (raw queries, Wikidata URLs) and output types (JSON, DataFrames, CSV).

Features

Multiple Input Formats: Raw SPARQL queries or Wikidata Query Service URLs
Multiple Output Types: JSON objects, Python dictionaries, pandas DataFrames, CSV
Error Handling: Comprehensive error messages and custom exceptions
Custom Endpoints: Query any SPARQL endpoint, not just Wikidata
Pandas Integration: Optional pandas support for data analysis
Clean API: Both class-based and convenience functions

Installation

The SPARQL module is included in GKC. For optional pandas support:

pip install pandas

Quick Start

Basic Query

from gkc.sparql import SPARQLQuery

executor = SPARQLQuery()
results = executor.query("""
    SELECT ?item ?itemLabel WHERE {
      ?item wdt:P31 wd:Q146 .
      SERVICE wikibase:label {
        bd:serviceParam wikibase:language "en" .
      }
    }
    LIMIT 5
""")
print(results)

Query from Wikidata URL

If you build a query using the Wikidata Query Service (WDQS), the URL can be copied and pasted from a web browser and then parsed by the executor query function. Note that this is the full URL from the web browser's address line and not the short URL generated by WDQS. You can share SPARQL queries as Wikidata Query Service URLs:

url = "https://query.wikidata.org/#SELECT%20?item%20WHERE%20..."
results = executor.query(url)

Convert to DataFrame

df = executor.to_dataframe("""
    SELECT ?item ?itemLabel WHERE {
      ?item wdt:P31 wd:Q146 .
      SERVICE wikibase:label {
        bd:serviceParam wikibase:language "en" .
      }
    }
""")
print(df.head())

Export to CSV

executor.to_csv(query, filepath="results.csv")

API Reference

SPARQLQuery Class

Main class for executing SPARQL queries.

Constructor

SPARQLQuery(
    endpoint: str = "https://query.wikidata.org/sparql",
    user_agent: str = "GKC-SPARQL/1.0",
    timeout: int = 30
)

Parameters: - endpoint: SPARQL endpoint URL (default: Wikidata) - user_agent: User agent string for HTTP requests - timeout: Request timeout in seconds

Methods

`query(query: str, format: str = "json", raw: bool = False) -> Any`

Execute a SPARQL query and return raw results.

Parameters: - query: SPARQL query string or Wikidata Query Service URL - format: Response format ('json', 'xml', 'csv', 'tsv') - raw: If False, parse JSON to dict; if True, return raw string

Returns: Query results (dict for JSON, str for others)

Raises: SPARQLError if query fails

Example:

executor = SPARQLQuery()
results = executor.query("SELECT ?item WHERE { ?item wdt:P31 wd:Q5 }")

`to_dict_list(query: str) -> list[dict[str, str]]`

Execute query and return results as list of dictionaries.

Returns: List of dicts with variable names as keys

Example:

results = executor.to_dict_list("SELECT ?item ?itemLabel WHERE { ... }")
for row in results:
    print(row)

`to_dataframe(query: str) -> pd.DataFrame`

Execute query and return results as pandas DataFrame.

Requires: pandas package

Returns: pandas DataFrame

Raises: SPARQLError if pandas not installed

Example:

df = executor.to_dataframe("SELECT ?item ?itemLabel WHERE { ... }")
print(df.head())

`to_csv(query: str, filepath: Optional[str] = None) -> str`

Execute query and convert results to CSV.

Parameters: - query: SPARQL query - filepath: Optional file path to save results

Returns: CSV string

Example:

# Get CSV data
csv_data = executor.to_csv("SELECT ?item ?itemLabel WHERE { ... }")

# Save to file
executor.to_csv("SELECT ...", filepath="results.csv")

`parse_wikidata_query_url(url: str) -> str` (static)

Extract and decode SPARQL query from Wikidata Query Service URL.

Parameters: - url: Wikidata Query Service URL

Returns: Decoded SPARQL query string

Raises: SPARQLError if URL is invalid

Example:

url = "https://query.wikidata.org/#SELECT%20?item%20WHERE%20..."
query = SPARQLQuery.parse_wikidata_query_url(url)

`normalize_query(query: str) -> str` (static)

Normalize query string (extract from URL if needed).

Parameters: - query: SPARQL query string or Wikidata URL

Returns: Normalized query string

Example:

# Both work
query1 = SPARQLQuery.normalize_query("SELECT ?item WHERE { ... }")
query2 = SPARQLQuery.normalize_query("https://query.wikidata.org/#SELECT%20...")

Convenience Functions

`execute_sparql(query: str, endpoint: str = DEFAULT_WIKIDATA_ENDPOINT, format: str = "json") -> Any`

Quick function to execute a single query.

from gkc.sparql import execute_sparql

results = execute_sparql("SELECT ?item WHERE { ?item wdt:P31 wd:Q5 }")

`execute_sparql_to_dataframe(query: str, endpoint: str = DEFAULT_WIKIDATA_ENDPOINT) -> pd.DataFrame`

Quick function to execute query and return DataFrame.

from gkc.sparql import execute_sparql_to_dataframe

df = execute_sparql_to_dataframe("SELECT ?item ?itemLabel WHERE { ... }")

Exceptions

`SPARQLError`

Raised when a SPARQL query fails.

from gkc.sparql import SPARQLError

try:
    results = executor.query("INVALID SPARQL")
except SPARQLError as e:
    print(f"Query failed: {e}")

Input Formats

Raw SPARQL Query

query = """
SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31 wd:Q146 .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
"""
results = executor.query(query)

Wikidata Query Service URL

# Share queries as URLs
url = "https://query.wikidata.org/#SELECT%20?item%20WHERE%20%7B%0A%09?item%20wdt:P31%20wd:Q146%20."

# Execute directly
results = executor.query(url)

# Extract query
query = SPARQLQuery.parse_wikidata_query_url(url)

Output Formats

JSON (Default)

results = executor.query(query)
# Returns: {"head": {"vars": [...]}, "results": {"bindings": [...]}}

Dictionary List

results = executor.to_dict_list(query)
# Returns: [{"item": "Q1", "itemLabel": "One"}, ...]

Pandas DataFrame

df = executor.to_dataframe(query)
# Returns: pandas DataFrame with results

CSV

csv_data = executor.to_csv(query)
# Returns: "item,itemLabel\nQ1,One\n..."

Examples

Example 1: Find Cities with Large Populations

from gkc.sparql import SPARQLQuery

executor = SPARQLQuery()

query = """
SELECT ?item ?itemLabel ?population WHERE {
  ?item wdt:P31 wd:Q3624078 .
  ?item wdt:P1082 ?population .
  FILTER(?population > 5000000)
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
ORDER BY DESC(?population)
LIMIT 10
"""

results = executor.to_dict_list(query)
for row in results:
    print(f"{row['itemLabel']}: {row['population']}")

Example 2: Data Analysis with DataFrame

from gkc.sparql import execute_sparql_to_dataframe

df = execute_sparql_to_dataframe("""
SELECT ?item ?itemLabel ?population WHERE {
  ?item wdt:P31 wd:Q3624078 .
  ?item wdt:P1082 ?population .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
""")

# Analyze with pandas
df['population'] = pd.to_numeric(df['population'], errors='coerce')
top_10 = df.nlargest(10, 'population')
print(top_10)

Example 3: Custom Endpoint

# Query DBpedia instead of Wikidata
executor = SPARQLQuery(endpoint="https://dbpedia.org/sparql")

query = """
SELECT ?resource ?label WHERE {
  ?resource rdf:type dbo:Animal .
  ?resource rdfs:label ?label .
  FILTER(LANG(?label) = 'en')
}
LIMIT 10
"""

results = executor.query(query)

Example 4: Error Handling

from gkc.sparql import SPARQLError, SPARQLQuery

executor = SPARQLQuery()

try:
    results = executor.query("INVALID SPARQL SYNTAX")
except SPARQLError as e:
    print(f"Error: {e}")

try:
    results = executor.query("https://invalid-url.com/#SELECT%20*")
except SPARQLError as e:
    print(f"URL parsing error: {e}")

Best Practices

Use Wikidata URLs for Sharing: Share queries as Wikidata URLs for easy collaboration
Handle Errors: Always wrap queries in try-except blocks
Use DataFrames for Analysis: Convert to DataFrame for complex data analysis
Set Timeout: Adjust timeout for complex queries
Limit Results: Always use LIMIT to avoid huge result sets
Cache Results: For repeated queries, cache the results

SPARQL Query Utilities

Features

Installation

Quick Start

Basic Query

Query from Wikidata URL

Convert to DataFrame

Export to CSV

API Reference

SPARQLQuery Class

Constructor

Methods

`query(query: str, format: str = "json", raw: bool = False) -> Any`

`to_dict_list(query: str) -> list[dict[str, str]]`

`to_dataframe(query: str) -> pd.DataFrame`

`to_csv(query: str, filepath: Optional[str] = None) -> str`

`parse_wikidata_query_url(url: str) -> str` (static)

`normalize_query(query: str) -> str` (static)

Convenience Functions

`execute_sparql(query: str, endpoint: str = DEFAULT_WIKIDATA_ENDPOINT, format: str = "json") -> Any`

`execute_sparql_to_dataframe(query: str, endpoint: str = DEFAULT_WIKIDATA_ENDPOINT) -> pd.DataFrame`

Exceptions

`SPARQLError`

Input Formats

Raw SPARQL Query

Wikidata Query Service URL

Output Formats

JSON (Default)

Dictionary List

Pandas DataFrame

CSV

Examples

Example 1: Find Cities with Large Populations

Example 2: Data Analysis with DataFrame

Example 3: Custom Endpoint

Example 4: Error Handling

Best Practices

Wikidata Resources

See Also

SPARQL Query Utilities

Features

Installation

Quick Start

Basic Query

Query from Wikidata URL

Convert to DataFrame

Export to CSV

API Reference

SPARQLQuery Class

Constructor

Methods

query(query: str, format: str = "json", raw: bool = False) -> Any

to_dict_list(query: str) -> list[dict[str, str]]

to_dataframe(query: str) -> pd.DataFrame

to_csv(query: str, filepath: Optional[str] = None) -> str

parse_wikidata_query_url(url: str) -> str (static)

normalize_query(query: str) -> str (static)

Convenience Functions

execute_sparql(query: str, endpoint: str = DEFAULT_WIKIDATA_ENDPOINT, format: str = "json") -> Any

execute_sparql_to_dataframe(query: str, endpoint: str = DEFAULT_WIKIDATA_ENDPOINT) -> pd.DataFrame

Exceptions

SPARQLError

Input Formats

Raw SPARQL Query

Wikidata Query Service URL

Output Formats

JSON (Default)

Dictionary List

Pandas DataFrame

CSV

Examples

Example 1: Find Cities with Large Populations

Example 2: Data Analysis with DataFrame

Example 3: Custom Endpoint

Example 4: Error Handling

Best Practices

Wikidata Resources

See Also

`query(query: str, format: str = "json", raw: bool = False) -> Any`

`to_dict_list(query: str) -> list[dict[str, str]]`

`to_dataframe(query: str) -> pd.DataFrame`

`to_csv(query: str, filepath: Optional[str] = None) -> str`

`parse_wikidata_query_url(url: str) -> str` (static)

`normalize_query(query: str) -> str` (static)

`execute_sparql(query: str, endpoint: str = DEFAULT_WIKIDATA_ENDPOINT, format: str = "json") -> Any`

`execute_sparql_to_dataframe(query: str, endpoint: str = DEFAULT_WIKIDATA_ENDPOINT) -> pd.DataFrame`

`SPARQLError`