Taxonomy of Web Page Genres

The notion of “web page genre” refers to the categories or typologies in which published pages on the Internet can be classified. The goal is to better understand the structure, function, and content of a page in order to classify and identify it more easily. In this article, we will explore the taxonomies of web page and site genres: why they are useful, what principles they follow, and what major families can be distinguished. A web page genre is rarely fixed or clearly defined, but I will try to cover all its aspects.

Can we cite sports, cinema, or video games as a category?

No. This classification corresponds to a thematic classification. When questioning the theme of a page, we ask: “What topic is covered?” In a genre classification, the subject matters little; instead, we ask: “What does the user do on my page?” To define what a page’s genre is, we need to focus on the practical aspect of the user experience. If we take cinema as a theme, do I want to buy a movie, write a review on my own blog, or read what a community says about it on a forum? We just named three examples of genres: e-commerce, blog, and forum.

Alright, but why should we care about it?

The classification of a web page’s genre addresses various issues for digital actors. Search engines, for example, rely on this distinction to better differentiate a scientific article, a sales page, or a product showcase, thereby providing more relevant results (Madjarov et al., 2019). In advertising, this categorization allows for more precise targeting of ads (Chaker, 2015), while it also facilitates archiving and academic research (Asheghi, 2015). Finally, by clarifying the very nature of a page, it helps optimize user experience and anticipate web evolutions. For a company, in particular, understanding these issues can help structure its site better and improve its online visibility.

What are the categories of web page genres?

To name them, we must first define the genre of a web page. Easy, we started doing that earlier!

There are several possible definitions in scientific literature, and they are rather complementary. They can be summarized into five aspects. For each part, I will provide examples of existing classifications:

Genre by Function

A web page genre is defined by the function it fulfills for the user (e.g., e-commerce page, scientific article, discussion forum, blog). This approach considers that the genre is independent of thematic content but relies on how the page is structured to meet a need (Chen & Choi, 2008; Rosso, 2008).

Stylistic and Formal Aspect

A genre is determined by stylistic and formal characteristics, such as word choice, layout, use of links, and media. This includes genres like homepages, FAQs, contact pages, or press pages, which are distinguished by their recurring forms (Santini, 2011; Mason et al., 2009).

User-Centered Taxonomy

To increase the chances of creating the right categories, user feedback is used to determine them. In many studies, a web page genre is defined by user recognition, where they assign labels based on their expectations and experiences. This approach relies on empirical studies where participants classify pages according to predefined categories (Rosso, 2005; Montesi, 2010).

Genres and Sub-genres

In 2008, Chen & Choi proposed classifying genres at two levels of precision:

  • First-level genres, which correspond to broad categories of websites like e-commerce sites, forums, digital media, and newspapers.
  • Sub-genres, which are more specific, such as a product page or a buying guide, both being sub-genres derived from the e-commerce category.

A Hybrid and Dynamic Vision of Genre

As we have seen earlier, genres can also be hybrid. This can happen by considering different levels of genre or allowing multiple classifications for a given page, even if they belong to several sub-genre labels. The names vary depending on studies, but the principle remains the same. Below is the classification of 1,145 sites conducted by a research group.

Finally, we can also say that the internet, and web page genres, evolve over time because the web is a constantly changing space, which can lead to the creation of new formats or the hybridization of genres (Santini, 2007; Vidulin et al., 2009). Below is another categorization proposal.

Conclusion

It must be acknowledged that the notion of web page genre is far from being a simple label placed on content. It allows for a finer understanding of what the user is looking for—buying a product, engaging with a community, reading a specialized article, etc.—and adapting the site’s structure accordingly. As we have seen, some genres are defined by function (e-commerce, blog, forum), others by form (FAQ, homepage, contact page), and one does not necessarily exclude the other. Since the web is constantly evolving, these classifications are inevitably in flux: new formats emerge, practices change, and it sometimes becomes challenging to draw a clear boundary between two genres. But this is precisely where the value of this taxonomy lies: providing reference points to better organize information, even if it remains fluid and hybrid. In other words, engaging with genre classification means equipping oneself to understand, anticipate, and shape the uses that define our digital landscape.

We will soon explore the tools and techniques used to classify web pages according to their genre!

References

Chen & Choi (2008)
Web Page Genre Classification

Rosso
What type of page is this? Genre as web descriptor (2005)
User-based Identification of Web Genres (2008)

Santini
Characterizing Genres of Web Pages: Genre Hybridism and Individualization (2007)
Automatic Identification of Genre in Web Pages: A New Perspective (2011)

Mason et al. (2009)
Classifying Web Pages by Genre: An n-Gram Approach

Montesi (2010)
Genre Analysis of Bookmarked Web Pages

Vidulin et al. (2009)
Multilabel Approaches to Web Genre Identification

Madjarov et al. (2019)
Web Genre Classification with Methods for Structured Output Prediction

Chaker (2015)
Enhanced and Combined Centroid-based Approach for Web Genre Categorization

Asheghi (2015)
Human Annotation and Automatic Detection of Web Genres

FAQ

What is a “web page genre”?
It is a way to classify a page based on its function and structure (blog, e-commerce, forum…) rather than by its theme (sports, cinema, etc.).

How does genre differ from thematic classification?
Thematic classification focuses on the subject (cinema, sports), while genre looks at what the user does on the page (buying, publishing an article, discussing).

Why should we care about genre classification?
It helps search engines better distinguish content, facilitates ad targeting, supports academic archiving, and improves user experience.

What are the main approaches to defining the genre of a web page?

  • By function (e-commerce, forum…)
  • Stylistic and formal aspects (layout, style)
  • User-centered classification (how users categorize the page)
  • Genres and sub-genres (broad categories + specific ones)
  • Hybrid and dynamic vision (a page can combine multiple genres and evolve over time)

Can multiple genres be mixed on a single page?
Yes. A single page can serve multiple functions (e.g., a blog with an e-commerce module).

Do genres evolve over time?
Yes, the web constantly changes, creating new formats or hybridizing existing genres.

What is the difference between “first-level genres” and “sub-genres”?

  • First-level genres: broad categories (e-commerce, blog, forum…).
  • Sub-genres: more specialized versions (product page, buying guide…).

How does genre classification impact SEO?
It helps search engines better understand your pages, improving result relevance and potentially increasing your visibility.

What are the technical challenges of implementing genre classification?
Identifying relevant criteria (structure, style, function) and creating automated models capable of handling the variety and constant evolution of web pages.

Can multiple genres be combined to better promote a company’s products?
Yes. A site can include a showcase, a news blog, and an e-commerce section to inform, engage, and sell to customers, reflecting different genre types and meeting various needs.