Talent IQ Data Dictionary

Introduction

This is your guide to understanding the structure and content of Zeki’s datasets. It provides detailed definitions, metadata, data types and the relationships between key entities—ensuring consistency, clarity and ease of use for all developers, analysts and partners working with our data.

At the heart of this is Talent IQ, Zeki’s proprietary system for assessing the quality of deep-tech talent within an organisation. Talent IQ utilises advanced scoring models to evaluate the innovation track record of individuals—what we call Talent Alpha—encompassing their technical influence, peer recognition and field-specific impact.

Talent IQ is offered in two tiers:

Talent IQ Core: essential scoring and insights for rapid assessment

Talent IQ Plus: enriched contextual data, deeper analytics and comparative benchmarks

Together, these layers enable a robust analysis of talent strength across sectors, regions and competitors.

Data Scheme Overview 

Datasets for Talent IQ Core

Talent IQ provides insights into companies' R&D focus, emerging centres of excellence and detailed information on their employees.

Company Transitions

Company Transitions contains information on transitions into and out of a set of base Companies. The data consists of inflows, outflows and netflows.  

Aggregation Dimensions: 

  • Company 

  • Gender 

  • Job category: R&D role categories including data specialists, hardware engineers, software engineers, systems engineers, research specialists and AI specialists 

  • Role seniority  

  • Year and month 

Key Features: 

  • Each row represents a unique combination of these dimensions 

  • The primary metric is the count of R&D Talent within each aggregation. 

R&D Role Taxonomy 

Zeki has developed a proprietary role taxonomy for R&D in deep-tech sectors and maps job titles using custom machine learning (ML) algorithms. We create job title representations with a custom encoder to identify role clusters.  

Company Transitions: Inflows

Identify which companies are attracting top AI talent and where from. 

Typically, the dataset is organised at the company and month level, where each row corresponds to a specific company in a particular month. Generally, the broadest configuration of this dataset is the company of concern and the total count of joiners based on the previous companies. The base company in the inflows file is denoted by ‘new’.   

New Company  Previous Company  Year/Month  Count of Inflows (joiners) 
Company A  Company B  2023/1  30 
Company A  Company C  2023/1  10 
Company A  Company D  2023/1  12 

 
As shown in the table above, the key metric is the Count of Inflows (joiners) to Company A. One way to visualise this data is through a graph, with the Count of Inflows (joiners) on the X-axis and the Previous Companies on the Y-axis. This approach facilitates easy comparison of inflows to Company B. Additionally, this data can be tracked over time by plotting the Count of Inflows (joiners) on the Y-axis and representing each Previous Company as a separate line. 

Company Transitions: Outflows

Identify which companies are losing top AI talent and where they are going.  

Generally, the broadest configuration of this dataset is the company of concern and the total count of leavers based on the new companies as a time series. In that case, every row observes a particular base company of concern (e.g. Microsoft). The base company in the outflows file is denoted by ‘prev’. 

Previous Company  New Company  Year/Month  Count of Outflows (leavers) 
Company A  Company B  2023/1  2,548 
Company A  Company C  2023/1  1,710 
Company A  Company D  2023/1  2,468 

 
As we can see from the table above, the outcome of concern is Count of Outflows (leavers) from Company A. This enables us to visualise the table as a graph, where the outcome Count of Outflows (leavers) can be represented along the X-axis and the New Companies along the Y-axis.  

Company Transitions: NetFlows

Identify which companies are winning in the war for top AI talent. 

Generally, the broadest configuration of this dataset is the company of concern and an aggregated time series of yearly netflows count. NetFlow is calculated as the difference between inflow and outflow each year. The base company in the netflows file is denoted by ‘prev’. 

Previous Company  New Company  Year/Month  Count of NetFlow  
Company A  Company B  2023/1  10 
Company A  Company C  2023/1  12 
Company A  Company D  2023/1  14 

 
As we can see from the table above, the outcome of concern is Count of NetFlow by Year/Month based on the base company (e.g. Company A). This enables us to visualise the table as a graph, where the outcome Count of NetFlow can be represented along the Y-axis and the Year/Month along the X-axis.  

Datasets for Talent IQ Plus

This data focuses on talent within a company with a proven research track record identified and tracked by Zeki. These individuals are typically hired to drive innovation, leveraging their advanced skills in R&D. 

Ways Data Can be Segmented

  • Top Areas of Expertise by Company 
  • Zeki Company Data 
  • Talent Attraction Indicators  
  • Zeki Intelligence Scores for Companies 
  • Role Category Counts 
  • Job Seniority Counts 
  • Professional Experience Level Counts 
  • Additional Scores

Top Areas of Expertise by Company 

Areas of expertise reflect the R&D background of an individual based on their research output over time. These areas showcase the specialised knowledge that professionals apply in their roles within a company. Additionally, they serve as an indicator of a company’s innovation focus, particularly in terms of hiring preferences.  

Each row in the dataset corresponds to a company, an area of expertise and a count, as shown in the table below. We identify more than 20 key areas of expertise derived from a comprehensive taxonomy of 4,500 areas of expertise created by Zeki. Each individual is assigned two primary areas of expertise, with the primary being the one with the highest count. 

Company  Area of Expertise  Year/Month  Count Top R&D Talent 
Company B   Natural Language Processing  2010/1  92 
Company B   Deep Learning in Computer Vision  2011/1  26 
Company B   Human Action Recognition  2012/1  26 

This dataset closely follows the format of Top Areas of Expertise by Country as per the Talent Atlas Product. 

Zeki Company Data

These indicators evaluate whether a company is attracting talent from the most innovative companies or institutes globally with a proven track record of hiring R&D talent to push the boundaries of science and engineering innovation.

Zeki Company Data

  1. Talent Attraction Indicators

    • magnificent_seven_count: count of individuals who have previously worked at a Magnificent Seven company
    • internship_magnificent_seven_count: count of individuals who have previously interned at a Magnificent Seven company
    • innovation_company_count: count of individuals who have previously worked for a top innovation company identified by Zeki
    • internship_innovation_company_count: as above but an intern position
    • top_research_institute_count: count of individuals who previously worked at a top research institute identified by Zeki
    • interned_top_research_institute_count: as above but an intern position

  2. Zeki Intelligence Scores

    • Frontier_innovation_2_yr_count: R&D talent at company in the top 10 per cent in their field in terms of impact of research published in the last two years
    • frontier_innovation_5_yr_count: R&D talent at the company in the top 10 per cent in their field in terms of impact of research published in the last two years

  3. Role Category Counts

    • data_specialists_count
    • hardware_engineers_count
    • software_engineers_count
    • systems_engineers_count
    • research_specialists_count
    • ai_specialists_count

  4. Job Seniority Counts

    • leadership_count
    • senior_count
    • interns_count
    • students_count

  5. Professional Experience Level Counts

    • early_count
    • intermediate_count
    • advanced_count
    • expert_count

Supporting Datasets for All of Our Data

No content available for this term.

Country Mapping File

Zeki can provide insights on over 100 countries. This file provides additional variables per country and can be used to support Talent Atlas and Talent IQ datasets. 

  • country (categorical): 100+ countries are covered by this dataset 

  • country_code (categorical): Zeki uses the country code alpha-2 (ISO 3166-1) system 

  • country_region (categorical): One of our geographical aggregation dimensions is region. We classify locations into 15 distinct geographical regions. 

We do not include China in our data. 

Company Mapping File

Zeki can provide insights on more than 30,000 companies globally. This file provides additional variables per company and can be used to support Talent Atlas and IQ datasets. 

  • zcid (categorical): unique Zeki company ID 

  • company (categorical): name of company 

  • company_hq_region (categorical): region of company's headquarters 

  • company_hq_country (categorical): country where the company's headquarters is located  

  • company_hq_country_code (categorical): country code of headquarters location  

  • company_hq_city (categorical):city of headquarters location  

  • company_sector (categorical): company’s industry sector  

  • company_founded_year (numerical): year company was founded 

  • company_age (numerical): age of company in years 

  • company_employees (numerical): total employees present at company 

  • company_employees_range (numerical): number of employees at company, grouped by range 

  • company_size_band (categorical): size of company 

  • exchange_name (categorical):Exchange ticker listed 

  • ticker (categorical):Stock market identifier 

  • magnificent_seven (categorical):  

  • top_innovation_company (categorical):Companies identified by Zeki as Top Innovation Companies based on their innovation track record 

  • top_research_institute (categorical):Institutes identified as the most prestigious of all research institutes.