Skip to content

Reference

from_html(html, root_url=None, include_fallbacks=False)

Extract all favicons in a given HTML.

Parameters:

Name Type Description Default
html str

HTML to parse.

required
root_url Optional[str]

Root URL where the favicon is located.

None
include_fallbacks bool

Whether to include fallback favicons like /favicon.ico.

False

Returns:

Type Description
set[Favicon]

A set of favicons.

from_url(url, include_fallbacks=False, client=None)

Extracts favicons from a given URL.

This function attempts to retrieve the specified URL, parse its HTML, and extract any associated favicons. If the URL is reachable and returns a successful response, the function will parse the content for favicon references. If include_fallbacks is True, it will also attempt to find fallback icons (e.g., by checking default icon paths). If the URL is not reachable or returns an error response, an empty set is returned.

Parameters:

Name Type Description Default
url str

The URL from which to extract favicons.

required
include_fallbacks bool

Whether to include fallback favicons if none are explicitly defined. Defaults to False.

False
client Optional[Client]

A custom client instance from reachable package to use for performing the HTTP request. If None, a default client configuration is used.

None

Returns:

Type Description
set[Favicon]

A set of Favicon objects found in the target URL's HTML.

from_duckduckgo(url, client=None)

Retrieves a website's favicon via DuckDuckGo's Favicon public API.

This function uses tldextract to parse the given URL and constructs a DuckDuckGo favicon URL using the top-level domain. It then fetch and populate a Favicon object with any available metadata (e.g., width, height and reachability).

Parameters:

Name Type Description Default
url str

The target website URL.

required
client Optional[Client]

A custom HTTP client to use for the request

None

Returns:

Type Description
Favicon

A Favicon object containing favicon data.

from_google(url, client=None, size=256)

Retrieves a website's favicon via Google's Favicon public API.

This function uses tldextract to parse the given URL and constructs a Google favicon URL using the top-level domain. It then fetch and populate a Favicon object with any available metadata (e.g., width, height and reachability).

Parameters:

Name Type Description Default
url str

The target website URL.

required
client Optional[Client]

A custom HTTP client to use for the request

None

Returns:

Type Description
Favicon

A Favicon object containing favicon data.

download(favicons, mode='all', include_unknown=True, sleep_time=2, sort='ASC', client=None)

Download previsouly extracted favicons.

Parameters:

Name Type Description Default
favicons Union[list[Favicon], set[Favicon]]

list of favicons to download.

required
mode str

select the strategy to download favicons. - all: download all favicons in the list. - largest: only download the largest favicon in the list. - smallest: only download the smallest favicon in the list.

'all'
include_unknown bool

include or not images with no width/height information.

True
sleep_time int

number of seconds to wait between each requests to avoid blocking.

2
sort str

sort favicons by size in ASC or DESC order. Only used for mode all.

'ASC'
client Optional[Client]

A custom client instance from reachable package to use for performing the HTTP request. If None, a default client configuration is used.

None

Returns:

Type Description
list[Favicon]

A list of favicons.

guess_size(favicon, chunk_size=512, force=False, client=None)

Get size of image by requesting first bytes.

Parameters:

Name Type Description Default
favicon Favicon

the favicon object from which to guess the size.

required
chunk_size int

bytes size to iterate over image stream.

512
force bool

try to guess the size even if the width and height are not zero.

False

Returns:

Type Description
Favicon

The Favicon object with updated width, height, reachable and http parameters.

guess_missing_sizes(favicons, chunk_size=512, sleep_time=1, load_base64_img=False, client=None)

Attempts to determine missing dimensions (width and height) of favicons.

For each favicon in the provided collection, if the favicon is a base64-encoded image (data URL) and load_base64_img is True, the function decodes and loads the image to guess its dimensions. For non-base64 favicons with missing or zero dimensions, the function attempts to guess the size by partially downloading the icon data (using guess_size).

Parameters:

Name Type Description Default
favicons Union[list[Favicon], set[Favicon]]

A list or set of Favicon objects for which to guess missing dimensions.

required
chunk_size int

The size of the data chunk to download for guessing dimensions of non-base64 images. Defaults to 512.

512
sleep_time int

The number of seconds to sleep between guessing attempts to avoid rate limits or overloading the server. Defaults to 1.

1
load_base64_img bool

Whether to decode and load base64-encoded images (data URLs) to determine their dimensions. Defaults to False.

False

Returns:

Type Description
list[Favicon]

A list of Favicon objects with dimensions updated where they could be determined.

check_availability(favicons, sleep_time=1, force=False, client=None)

Checks the availability and final URLs of a collection of favicons.

For each favicon in the provided list or set, this function sends a head request (or an optimized request if available) to check whether the favicon's URL is reachable. If the favicon is reachable, its reachable attribute is updated to True. If the request results in a redirect, the favicon's URL is updated to the final URL.

A delay (sleep_time) can be specified between checks to avoid rate limits or overloading the server.

Parameters:

Name Type Description Default
favicons Union[list[Favicon], set[Favicon]]

A collection of Favicon objects to check for availability.

required
sleep_time int

Number of seconds to sleep between each availability check to control request rate. Defaults to 1.

1
force bool

Check again the availability even if it has already been checked.

False
client Optional[Client]

A custom client instance from reachable package to use for performing the HTTP request. If None, a default client configuration is used.

None

Returns:

Type Description
list[Favicon]

A list of Favicon objects with updated reachable statuses and potentially updated URLs if redirects were encountered.

generate_favicon(url)

Generates a placeholder favicon as an SVG containing the first letter of the domain.

This function extracts the domain name from the provided URL using tldextract, takes the first letter of the domain (capitalized), and embeds it into an SVG image. The generated SVG is then loaded into a Favicon object.

Parameters:

Name Type Description Default
url str

The URL from which to extract the domain and generate the favicon.

required

Returns:

Type Description
Favicon

A Favicon instance populated with the generated SVG data.

get_best_favicon(url, html=None, client=None, strategy=['content', 'duckduckgo', 'google', 'generate'], include_fallbacks=True)

Attempts to retrieve the best favicon for a given URL using multiple strategies.

The function iterates over the specified strategies in order, stopping as soon as a valid favicon is found: - "content": Parses the provided HTML (if any) or fetches page content from the URL to extract favicons. It then guesses missing sizes, checks availability, and downloads the largest icon. - "duckduckgo": Retrieves a favicon from DuckDuckGo if the previous step fails. - "google": Retrieves a favicon from Google if the previous step fails. - "generate": Generates a placeholder favicon if all else fails.

Parameters:

Name Type Description Default
url str

The URL for which the favicon is being retrieved.

required
html Optional[Union[str, bytes]]

Optional HTML content to parse. If not provided, the page content is retrieved from the URL.

None
client Optional[Client]

Optional HTTP client to use for network requests.

None
strategy list[str]

A list of strategy names to attempt in sequence. Defaults to ["content", "duckduckgo", "google", "generate"].

['content', 'duckduckgo', 'google', 'generate']
include_fallbacks bool

check for fallbacks URL for content strategy.

True

Returns:

Type Description
Optional[Favicon]

The best found favicon if successful, otherwise None.

Raises:

Type Description
ValueError

If an unrecognized strategy name is encountered in the list.