Skip to content
Blueprint Technologies - Data information specialists
Main Menu
  • What we do

      Artificial Intelligence

      Intelligent SOP
      Generative AI
      Video analytics

      Engineering

      Application development
      Cloud & infrastructure
      Lakehouse optimization

      Data & Analytics

      Data platform modernization
      Data governance
      Data management
      Data migration
      Data science & analytics

      Strategy

      TCO planning
      Productization
      Future proofing
  • Industries

      Manufacturing

      Enhance productivity and efficiency through tailored technology solutions, optimizing processes, and drive innovation in manufacturing operations.

      Retail

      Revolutionize customer experiences through innovative technology solutions for seamless shopping journeys and enhanced retail operations.

      Health & Life Sciences

      Advance healthcare outcomes and pharmaceutical innovations through cutting-edge technology solutions and data-driven strategies.

      Financial Services

      Empower financial institutions with secure and scalable technology solutions, driving digital transformation, and personalized customer experiences.

  • Databricks

      Databricks
      Center of Excellence

      Maximize your Databricks experience with our comprehensive Center of Excellence resources and support.

      QuickStarts

      Proof-of-value projects designed to get you started quickly on Databricks.

      Accelerated Data Migration

      Regardless of the source, we specialize in migration your data to Databricks with speed and quality.

      Unity Catalog Migration

      Accelerate your UC migration and minimize errors with our meticulously tested Brickbuilder approved solution.

      Lakehouse Optimizer

      Get higher return on your investment and minimize your total cost of ownership with self-facilitated optimization.

      Accelerated Snowflake to Databricks Migration

      Unlock increased cost savings, heightened operational efficiency, and enhanced analytical capabilities. 

  • Our work
  • Insights
  • About

      Our Approach

      Discover our holistic approach to uncovering strategic opportunities.

      Careers

      Explore exciting career opportunities and join our team today.

      News

      Get the latest updates and insights about our company.

      Events

      Stay updated on upcoming events and webinars.

      Our Partners

      Get to know our trusted technology partners and collaborators.

Connect
Blueprint Technologies - Data information specialists

How to apply smart localization approaches to online searches

By Avelino López García

Most multilingual websites parse the end user's query in a non-English language, normalize the text and then map the term to an English-driven taxonomy, leaving room for errors in direct translations. Blueprint's Localization team examines better ways to maximize search results for non-English markets.

We are all used to going to a website and typing a few words to retrieve information, images or simply to find an item we want to buy. The process may seem straightforward for an end user, but there are hidden complexities that allow this type of website search to work well across languages. A significant number of adaptations and unique localization approaches are needed for non-English language searches and their complexities related to morphology, written scripts and conceptual and cultural differences.

The search space, one of the most interesting and lesser-known areas of localization that fall in the crossroads of search, taxonomies and translation, requires a hybrid expertise in library science and translation skills to produce results relevant to the end user.

Localizing your search results

These complexities can be attributed to the fact that most search systems are conceived for English, which requires customization for other languages. Most multilingual sites parse the end user’s query in a non-English language, then modify the query in certain ways (for instance, normalizing it to remove accent marks) to facilitate processing. After that, they map the normalized query term to an English-driven taxonomy in which keywords are localized into multiple languages to reach the corresponding concept tied to a unique identifier of the relevant asset. That is then retrieved and presented to the end user.

In this common setup, the route going from the end user’s query, through the localized keywords and ending in the asset does not always work perfectly. This is because there are many conceptual, linguistic and cultural differences among languages. Here are a few examples of problem areas that often produce inaccurate results in an English-centric setup, even though they contain accurately translated keywords and good taxonomies.

Girl with Schultüte on her first day of School

Schultüte is a uniquely German/Czech concept that has no equivalent in English or in many other cultures and languages.

Conceptual issues: In Europe, an image search for “family” typically shows results with a couple of children; in parts of Africa and the Middle East it could show more children. But in China during the one-child policy period, you would expect, for the most part, to get search results with families with only one child. In other words, even a concept seemingly as simple as family can require adaptations for different countries or markets.

Missing concepts: Some cultures have unique concepts, like “Schultüte” in Germany. That is a large, colorful cone full of school supplies, sweets and little presents given to kids when they are about to start their very first day of school. This word has no equivalent in English or in many other cultures and languages, so an English-centric database will not have an entry for this concept, making it impossible to add localized keywords for it. This often clouds the search with wrong or irrelevant results.

Absence of synonyms or missing tags: English concepts often have multiple equivalents in a target language. For instance, Spanish users looking for a “puzzle” are going to interchangeably use the Spanish translations “puzle” and “rompecabezas.” Mapping both terms during a search can be solved by ensuring that the taxonomy includes both synonyms. But to fully leverage the translated synonyms in the taxonomy, every single asset must be tagged with the English keyword “puzzle.” Yet, taxonomies are very large databases and grow continuously, so their localization is almost never complete. To compensate for any taxonomy shortcomings, it is common to pull more results by matching searches against the text in descriptions, captions, footnotes and other unstructured or free text associated to the assets.

Ambiguity issues: The last example of problematic searches is when a user includes homographs, or words with multiple meanings, like “bridge” in English. If the system lacks an effective disambiguation mechanism, these searches can pose a challenge for any language and return inaccurate results.

Localization and logic operators

From the perspective of the search engine, there is a set of best practices that can mitigate or solve many of those issues. First, ensure that all assets are tagged with English keywords. Second, make sure the keywords in the taxonomy are fully localized. Third, create language-specific concepts where needed. Fourth, have disambiguation prompts for the user to clarify their search. Last, and probably most effective, leverage the conversion of complex inputs from international users into Boolean searches — this is one of the most flexible and clever localization features I have used and experienced. It simply means to transform the original end user’s query at runtime into a compound search sequence that includes logical operators (AND, OR, NOT) as defined by the English mathematician George Boole in his book The Laws of Thought.

Boolean Search Logic

Boolean conversions can be leveraged to improve both recall and precision, which are the cornerstones of search metrics. Precision means fewer, but more accurate results, reducing the noise of irrelevant output. Recall means presenting a wider set of results, which is useful when you don’t have enough assets to show and want to drive users toward a related yet relevant set of results.

This is how Boolean conversions work:

When users in mainland China search for “family,” they expect the results to be images of a small family with 1 or 2 children, but instead they receive images of large families, perhaps even some with the Octomom family. For a search engine to prevent that type of result, a Chinese localizer can convert the query for “family” into (family AND (“1_child” OR “2_children”)). Doing so ensures the results will show the kind of small family users in China expect to see.

Now consider users in Germany looking for a “Schultüte” image. Since the concept does not exist in English, this search may generate no results or very few ones solely based on free text matches — if they are activated. In that case, the database developers have to create a German-specific concept in the taxonomy. But a simpler and cleverer alternative is to transform this query on the backend into a German sequence equivalent to ((school AND cone) NOT “traffic_cone”). Note the use of “NOT” to exclude more common cones irrelevant to this search.

A Spanish user searching for “puzzles” using one of its translations, “rompecabezas” or “puzle,” would only find assets if both translations were in the localized taxonomy and the search matched assets with either translation in free text. If all the assets are not tagged, if only one synonym was entered in the taxonomy and if the search does not expand to match free text, the search results could miss many assets. In this situation, the localizer can convert the queries for both “puzle” and “rompecabezas” into the Boolean compound (puzle OR rompecabezas). Then the results for both queries would include any assets containing either word, both in free text and in the taxonomy, casting the widest net to capture all relevant results.

Similarly, when a search uses homographs, like “game bridge,” you most likely want images of the card game. But free text can give you images of a deer or another type of game animal next to a road bridge. In that case, Boolean conversions can be used in any language to improve the precision of the search results. The localizer may convert end users’ queries like “game of bridge,” “bridge game” and “bridge card game” into (bridge (NOT (“dental_bridge” OR “road_bridge”)) AND “card_game”). The results would be more accurate than a blunt search in free text, retrieved from descriptions, captions, footnotes, etc.

Boolean conversion is a very smart and extremely flexible mechanism for search localization. It helps queries in all languages overcome a wide range of issues and drastically improve both precision and recall, the top two measures in the search world. Knowing how Boolean conversions work to achieve better search results is also useful in other contexts. When our localization team at Blueprint was recently asked to localize tags for software products to enable filtering by keyword and browsing by facet, it was exactly this kind of hybrid expertise that informed our approach to provide high-quality localization.

The Blueprint Localization team is a hidden gem in the U.S. West Coast technology landscape. Its collective knowledge, especially the combination of linguistic, cultural and research expertise, is a game-changing differentiator for the localization industry. Are you interested in leveraging your technical and language skills as part of a talented and quality-driven team? Check out our Careers page to learn more. 

Share with your network

You may also enjoy

Classic vs. Serverless: Exploring Databricks’ latest Innovations

Explore the benefits of Databricks’ serverless solutions, which simplify resource management, improve productivity, and optimize costs. Discover key insights and best practices to enhance your data strategy with cutting-edge serverless technologies.

Help for FinOps Leaders – How the Lakehouse Optimizer can assist with your Lakehouse 

Discover how FinOps leaders manage cloud and data costs effectively while maximizing business value. Learn how the Lakehouse Optimizer (LHO) addresses common business problems through discovery, optimization, and operation.
Blueprint Technologies - Data information specialists

What we do

  • Generative AI
  • Video analytics
  • Application development
  • Cloud and infrastructure
  • Data platform modernization
  • Data governance
  • Data management
  • Data science and analytics
  • TCO Planning 
  • Productization
  • Future Proofing
  • Intelligent SOP
  • Lakehouse Optimization
  • Data Migrations
  • Generative AI
  • Video analytics
  • Application development
  • Cloud and infrastructure
  • Data platform modernization
  • Data governance
  • Data management
  • Data science and analytics
  • TCO Planning 
  • Productization
  • Future Proofing
  • Intelligent SOP
  • Lakehouse Optimization
  • Data Migrations

Industries

  • Manufacturing
  • Retail
  • Health & Life Sciences
  • Financial Services
  • Manufacturing
  • Retail
  • Health & Life Sciences
  • Financial Services

Databricks

  • Databricks Center of Excellence
  • QuickStart Offerings
  • Accelerated Data Migration
  • Accelerated Unity Catalog Migration
  • The Lakehouse Optimizer
  • Accelerated Snowflake to Databricks Migration
  • Databricks Center of Excellence
  • QuickStart Offerings
  • Accelerated Data Migration
  • Accelerated Unity Catalog Migration
  • The Lakehouse Optimizer
  • Accelerated Snowflake to Databricks Migration

About

  • Our approach
  • News
  • Events
  • Partners
  • Careers
  • Our approach
  • News
  • Events
  • Partners
  • Careers

Insights

Our work

Support

Contact us

Linkedin Youtube Facebook Instagram

© 2024 Blueprint Technologies, LLC.
2600 116th Avenue Northeast, First Floor
Bellevue, WA 98004

All rights reserved.

Media Kit

Employer Health Plan

Privacy Notice