Reference
from_html(html, root_url=None, include_fallbacks=False)
¶
Extract all favicons in a given HTML.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
html
|
str
|
HTML to parse. |
required |
root_url
|
Optional[str]
|
Root URL where the favicon is located. |
None
|
include_fallbacks
|
bool
|
Whether to include fallback favicons like |
False
|
Returns:
Type | Description |
---|---|
set[Favicon]
|
A set of favicons. |
from_url(url, include_fallbacks=False, client=None)
¶
Extracts favicons from a given URL.
This function attempts to retrieve the specified URL, parse its HTML, and extract any
associated favicons. If the URL is reachable and returns a successful response, the
function will parse the content for favicon references. If include_fallbacks
is True,
it will also attempt to find fallback icons (e.g., by checking default icon paths).
If the URL is not reachable or returns an error response, an empty set is returned.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url
|
str
|
The URL from which to extract favicons. |
required |
include_fallbacks
|
bool
|
Whether to include fallback favicons if none are explicitly defined. Defaults to False. |
False
|
client
|
Optional[Client]
|
A custom client instance from |
None
|
Returns:
Type | Description |
---|---|
set[Favicon]
|
A set of |
from_duckduckgo(url, client=None)
¶
Retrieves a website's favicon via DuckDuckGo's Favicon public API.
This function uses tldextract
to parse the given URL and constructs a DuckDuckGo
favicon URL using the top-level domain. It then fetch and populate a Favicon
object with any available metadata (e.g., width, height and reachability).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url
|
str
|
The target website URL. |
required |
client
|
Optional[Client]
|
A custom HTTP client to use for the request |
None
|
Returns:
Type | Description |
---|---|
Favicon
|
A |
from_google(url, client=None, size=256)
¶
Retrieves a website's favicon via Google's Favicon public API.
This function uses tldextract
to parse the given URL and constructs a Google
favicon URL using the top-level domain. It then fetch and populate a Favicon
object with any available metadata (e.g., width, height and reachability).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url
|
str
|
The target website URL. |
required |
client
|
Optional[Client]
|
A custom HTTP client to use for the request |
None
|
Returns:
Type | Description |
---|---|
Favicon
|
A |
download(favicons, mode='all', include_unknown=True, sleep_time=2, sort='ASC', client=None)
¶
Download previsouly extracted favicons.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
favicons
|
Union[list[Favicon], set[Favicon]]
|
list of favicons to download. |
required |
mode
|
str
|
select the strategy to download favicons.
- |
'all'
|
include_unknown
|
bool
|
include or not images with no width/height information. |
True
|
sleep_time
|
int
|
number of seconds to wait between each requests to avoid blocking. |
2
|
sort
|
str
|
sort favicons by size in ASC or DESC order. Only used for mode |
'ASC'
|
client
|
Optional[Client]
|
A custom client instance from |
None
|
Returns:
Type | Description |
---|---|
list[Favicon]
|
A list of favicons. |
guess_size(favicon, chunk_size=512, force=False, client=None)
¶
Get size of image by requesting first bytes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
favicon
|
Favicon
|
the favicon object from which to guess the size. |
required |
chunk_size
|
int
|
bytes size to iterate over image stream. |
512
|
force
|
bool
|
try to guess the size even if the width and height are not zero. |
False
|
Returns:
Type | Description |
---|---|
Favicon
|
The Favicon object with updated width, height, reachable and http parameters. |
guess_missing_sizes(favicons, chunk_size=512, sleep_time=1, load_base64_img=False, client=None)
¶
Attempts to determine missing dimensions (width and height) of favicons.
For each favicon in the provided collection, if the favicon is a base64-encoded
image (data URL) and load_base64_img
is True, the function decodes and loads
the image to guess its dimensions. For non-base64 favicons with missing or zero
dimensions, the function attempts to guess the size by partially downloading the
icon data (using guess_size
).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
favicons
|
Union[list[Favicon], set[Favicon]]
|
A list or set of |
required |
chunk_size
|
int
|
The size of the data chunk to download for guessing dimensions of non-base64 images. Defaults to 512. |
512
|
sleep_time
|
int
|
The number of seconds to sleep between guessing attempts to avoid rate limits or overloading the server. Defaults to 1. |
1
|
load_base64_img
|
bool
|
Whether to decode and load base64-encoded images (data URLs) to determine their dimensions. Defaults to False. |
False
|
Returns:
Type | Description |
---|---|
list[Favicon]
|
A list of |
check_availability(favicons, sleep_time=1, force=False, client=None)
¶
Checks the availability and final URLs of a collection of favicons.
For each favicon in the provided list or set, this function sends a head request
(or an optimized request if available) to check whether the favicon's URL is
reachable. If the favicon is reachable, its reachable
attribute is updated to
True. If the request results in a redirect, the favicon's URL is updated to the
final URL.
A delay (sleep_time
) can be specified between checks to avoid rate limits
or overloading the server.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
favicons
|
Union[list[Favicon], set[Favicon]]
|
A collection of |
required |
sleep_time
|
int
|
Number of seconds to sleep between each availability check to control request rate. Defaults to 1. |
1
|
force
|
bool
|
Check again the availability even if it has already been checked. |
False
|
client
|
Optional[Client]
|
A custom client instance from |
None
|
Returns:
Type | Description |
---|---|
list[Favicon]
|
A list of |
generate_favicon(url)
¶
Generates a placeholder favicon as an SVG containing the first letter of the domain.
This function extracts the domain name from the provided URL using tldextract
,
takes the first letter of the domain (capitalized), and embeds it into an SVG
image. The generated SVG is then loaded into a Favicon
object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url
|
str
|
The URL from which to extract the domain and generate the favicon. |
required |
Returns:
Type | Description |
---|---|
Favicon
|
A |
get_best_favicon(url, html=None, client=None, strategy=['content', 'duckduckgo', 'google', 'generate'], include_fallbacks=True)
¶
Attempts to retrieve the best favicon for a given URL using multiple strategies.
The function iterates over the specified strategies in order, stopping as soon as a valid favicon is found: - "content": Parses the provided HTML (if any) or fetches page content from the URL to extract favicons. It then guesses missing sizes, checks availability, and downloads the largest icon. - "duckduckgo": Retrieves a favicon from DuckDuckGo if the previous step fails. - "google": Retrieves a favicon from Google if the previous step fails. - "generate": Generates a placeholder favicon if all else fails.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url
|
str
|
The URL for which the favicon is being retrieved. |
required |
html
|
Optional[Union[str, bytes]]
|
Optional HTML content to parse. If not provided, the page content is retrieved from the URL. |
None
|
client
|
Optional[Client]
|
Optional HTTP client to use for network requests. |
None
|
strategy
|
list[str]
|
A list of strategy names to attempt in sequence. Defaults to ["content", "duckduckgo", "google", "generate"]. |
['content', 'duckduckgo', 'google', 'generate']
|
include_fallbacks
|
bool
|
check for fallbacks URL for |
True
|
Returns:
Type | Description |
---|---|
Optional[Favicon]
|
The best found favicon if successful, otherwise None. |
Raises:
Type | Description |
---|---|
ValueError
|
If an unrecognized strategy name is encountered in the list. |