SPARQL Query Utilities
The GKC SPARQL module provides utilities for executing SPARQL queries against Wikidata, DBpedia, or any other SPARQL endpoint. It supports multiple input formats (raw queries, Wikidata URLs) and output types (JSON, DataFrames, CSV).
Features
- Multiple Input Formats: Raw SPARQL queries or Wikidata Query Service URLs
- Multiple Output Types: JSON objects, Python dictionaries, pandas DataFrames, CSV
- Error Handling: Comprehensive error messages and custom exceptions
- Custom Endpoints: Query any SPARQL endpoint, not just Wikidata
- Pandas Integration: Optional pandas support for data analysis
- Clean API: Both class-based and convenience functions
Installation
The SPARQL module is included in GKC. For optional pandas support:
pip install pandas
Quick Start
Basic Query
from gkc.sparql import SPARQLQuery
executor = SPARQLQuery()
results = executor.query("""
SELECT ?item ?itemLabel WHERE {
?item wdt:P31 wd:Q146 .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
LIMIT 5
""")
print(results)
Query from Wikidata URL
If you build a query using the Wikidata Query Service (WDQS), the URL can be copied and pasted from a web browser and then parsed by the executor query function. Note that this is the full URL from the web browser's address line and not the short URL generated by WDQS. You can share SPARQL queries as Wikidata Query Service URLs:
url = "https://query.wikidata.org/#SELECT%20?item%20WHERE%20..."
results = executor.query(url)
Convert to DataFrame
df = executor.to_dataframe("""
SELECT ?item ?itemLabel WHERE {
?item wdt:P31 wd:Q146 .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
""")
print(df.head())
Export to CSV
executor.to_csv(query, filepath="results.csv")
API Reference
SPARQLQuery Class
Main class for executing SPARQL queries.
Constructor
SPARQLQuery(
endpoint: str = "https://query.wikidata.org/sparql",
user_agent: str = "GKC-SPARQL/1.0",
timeout: int = 30
)
Parameters:
- endpoint: SPARQL endpoint URL (default: Wikidata)
- user_agent: User agent string for HTTP requests
- timeout: Request timeout in seconds
Methods
query(query: str, format: str = "json", raw: bool = False) -> Any
Execute a SPARQL query and return raw results.
Parameters:
- query: SPARQL query string or Wikidata Query Service URL
- format: Response format ('json', 'xml', 'csv', 'tsv')
- raw: If False, parse JSON to dict; if True, return raw string
Returns: Query results (dict for JSON, str for others)
Raises: SPARQLError if query fails
Example:
executor = SPARQLQuery()
results = executor.query("SELECT ?item WHERE { ?item wdt:P31 wd:Q5 }")
to_dict_list(query: str) -> list[dict[str, str]]
Execute query and return results as list of dictionaries.
Returns: List of dicts with variable names as keys
Example:
results = executor.to_dict_list("SELECT ?item ?itemLabel WHERE { ... }")
for row in results:
print(row)
to_dataframe(query: str) -> pd.DataFrame
Execute query and return results as pandas DataFrame.
Requires: pandas package
Returns: pandas DataFrame
Raises: SPARQLError if pandas not installed
Example:
df = executor.to_dataframe("SELECT ?item ?itemLabel WHERE { ... }")
print(df.head())
to_csv(query: str, filepath: Optional[str] = None) -> str
Execute query and convert results to CSV.
Parameters:
- query: SPARQL query
- filepath: Optional file path to save results
Returns: CSV string
Example:
# Get CSV data
csv_data = executor.to_csv("SELECT ?item ?itemLabel WHERE { ... }")
# Save to file
executor.to_csv("SELECT ...", filepath="results.csv")
parse_wikidata_query_url(url: str) -> str (static)
Extract and decode SPARQL query from Wikidata Query Service URL.
Parameters:
- url: Wikidata Query Service URL
Returns: Decoded SPARQL query string
Raises: SPARQLError if URL is invalid
Example:
url = "https://query.wikidata.org/#SELECT%20?item%20WHERE%20..."
query = SPARQLQuery.parse_wikidata_query_url(url)
normalize_query(query: str) -> str (static)
Normalize query string (extract from URL if needed).
Parameters:
- query: SPARQL query string or Wikidata URL
Returns: Normalized query string
Example:
# Both work
query1 = SPARQLQuery.normalize_query("SELECT ?item WHERE { ... }")
query2 = SPARQLQuery.normalize_query("https://query.wikidata.org/#SELECT%20...")
Convenience Functions
execute_sparql(query: str, endpoint: str = DEFAULT_WIKIDATA_ENDPOINT, format: str = "json") -> Any
Quick function to execute a single query.
from gkc.sparql import execute_sparql
results = execute_sparql("SELECT ?item WHERE { ?item wdt:P31 wd:Q5 }")
execute_sparql_to_dataframe(query: str, endpoint: str = DEFAULT_WIKIDATA_ENDPOINT) -> pd.DataFrame
Quick function to execute query and return DataFrame.
from gkc.sparql import execute_sparql_to_dataframe
df = execute_sparql_to_dataframe("SELECT ?item ?itemLabel WHERE { ... }")
Exceptions
SPARQLError
Raised when a SPARQL query fails.
from gkc.sparql import SPARQLError
try:
results = executor.query("INVALID SPARQL")
except SPARQLError as e:
print(f"Query failed: {e}")
Input Formats
Raw SPARQL Query
query = """
SELECT ?item ?itemLabel WHERE {
?item wdt:P31 wd:Q146 .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
"""
results = executor.query(query)
Wikidata Query Service URL
# Share queries as URLs
url = "https://query.wikidata.org/#SELECT%20?item%20WHERE%20%7B%0A%09?item%20wdt:P31%20wd:Q146%20."
# Execute directly
results = executor.query(url)
# Extract query
query = SPARQLQuery.parse_wikidata_query_url(url)
Output Formats
JSON (Default)
results = executor.query(query)
# Returns: {"head": {"vars": [...]}, "results": {"bindings": [...]}}
Dictionary List
results = executor.to_dict_list(query)
# Returns: [{"item": "Q1", "itemLabel": "One"}, ...]
Pandas DataFrame
df = executor.to_dataframe(query)
# Returns: pandas DataFrame with results
CSV
csv_data = executor.to_csv(query)
# Returns: "item,itemLabel\nQ1,One\n..."
Examples
Example 1: Find Cities with Large Populations
from gkc.sparql import SPARQLQuery
executor = SPARQLQuery()
query = """
SELECT ?item ?itemLabel ?population WHERE {
?item wdt:P31 wd:Q3624078 .
?item wdt:P1082 ?population .
FILTER(?population > 5000000)
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
ORDER BY DESC(?population)
LIMIT 10
"""
results = executor.to_dict_list(query)
for row in results:
print(f"{row['itemLabel']}: {row['population']}")
Example 2: Data Analysis with DataFrame
from gkc.sparql import execute_sparql_to_dataframe
df = execute_sparql_to_dataframe("""
SELECT ?item ?itemLabel ?population WHERE {
?item wdt:P31 wd:Q3624078 .
?item wdt:P1082 ?population .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
""")
# Analyze with pandas
df['population'] = pd.to_numeric(df['population'], errors='coerce')
top_10 = df.nlargest(10, 'population')
print(top_10)
Example 3: Custom Endpoint
# Query DBpedia instead of Wikidata
executor = SPARQLQuery(endpoint="https://dbpedia.org/sparql")
query = """
SELECT ?resource ?label WHERE {
?resource rdf:type dbo:Animal .
?resource rdfs:label ?label .
FILTER(LANG(?label) = 'en')
}
LIMIT 10
"""
results = executor.query(query)
Example 4: Error Handling
from gkc.sparql import SPARQLError, SPARQLQuery
executor = SPARQLQuery()
try:
results = executor.query("INVALID SPARQL SYNTAX")
except SPARQLError as e:
print(f"Error: {e}")
try:
results = executor.query("https://invalid-url.com/#SELECT%20*")
except SPARQLError as e:
print(f"URL parsing error: {e}")
Best Practices
- Use Wikidata URLs for Sharing: Share queries as Wikidata URLs for easy collaboration
- Handle Errors: Always wrap queries in try-except blocks
- Use DataFrames for Analysis: Convert to DataFrame for complex data analysis
- Set Timeout: Adjust timeout for complex queries
- Limit Results: Always use LIMIT to avoid huge result sets
- Cache Results: For repeated queries, cache the results
Wikidata Resources
See Also
- gkc.wd - Low-level Wikidata entity access
- gkc.mapping_builder - Property mapping utilities