socid_extractor Field Ontology

May 9, 2026 · View on GitHub

Standard field names for extraction schemes. When adding a new scheme, use existing names from this document. Create platform-specific fields (with a prefix) only when data does not fit any standard category.

Identity

FieldDescriptionAPI mapping examples
uidNumeric or string account ID on the platformid, user_id, pk, userid, steamid, did
usernameUnique login / handle (what appears in URL)login, screen_name, handle, slug, vanity
fullnameDisplay namename, displayName, display_name, full_name, screenname, personaname
idAlternative to uid (legacy, prefer uid)--

Rule: If the API returns name as a display name, map it to fullname. If name is a login/handle, map it to username.

Profile

FieldDescriptionAPI mapping examples
bioBiography / profile descriptiondescription, about, signature, status_message, tagline
imageAvatar / profile picture URLphoto, avatar, profile_pic_url, avatar_url, profileImageURL
image_bgBackground / banner image URLbanner, cover, bannerImageURL, cover_250_url
websitePersonal website URLurl, domain_url, external_url, blog_url
emailPublic email--
occupationJob title / professionjobTitle, role, work
companyEmployer / organizationcompany_name, organization, worksFor
interestsInterests / hobbies (free text)interest_names

Demographics

FieldDescriptionAPI mapping examples
genderGendersex
countryCountry (name)country_code (normalize to name)
country_codeCountry (ISO 3166-1 alpha-2 code)countryCode, country_code
cityCity--
locationLocation (free text, city+country)address
birthdayDate of birthbirth_date, dateOfBirth

Dates

FieldDescriptionAPI mapping examples
created_atRegistration / account creation datejoinedAt, registration, registration_date, history.joined, joined (unix), created_time (unix), createdAt
updated_atLast profile update datemodifyDate, updatedAt
latest_activity_atLast visit / activity / online timelast_online (unix), last_seen_at, active

Counters

FieldDescriptionAPI mapping examples
follower_countFollowers / subscribersfollowers, followers_count, followersCount, subscribers_count, numFollowers
following_countFollowing / subscriptionsfollowing, friends_count, followingCount, numFollowing
posts_countNumber of posts / publicationspostsCount, media_count, buzz_count, numPosts
last_posts_countNumber of recent posts visible on the first page (not the total)--
comments_countNumber of commentsnumPosts (Disqus counts comments as posts)
likes_countReceived likesnumLikesReceived, likes_count, heartCount
views_countViewsall_views_count, profile_views, profile_hits
videos_countNumber of videosvideos_total, nb_videos
photos_countNumber of photosphotoCount, photos_count
friends_countFriends (mutual relationship)totalFriends

Rule: Always use singular noun + _count: follower_count (not followers_count), video_count / videos_count.

Boolean flags

FieldDescriptionAPI mapping examples
is_verifiedVerified accountverified, isVerified, has_verified_badge
is_privatePrivate profileisPrivate, text_post_app_is_private
is_bannedBanned accountisBanned
is_deletedDeleted account--
is_suspendedSuspended accountsuspended
is_businessBusiness account--
is_employeePlatform employeeisEmployee, scratchteam

Rule: Boolean fields start with is_ and are stored as strings 'True'/'False'. Never use bare verified — always is_verified.

Common mistakes to avoid:

  • verifiedis_verified
  • joinedcreated_at
  • last_onlinelatest_activity_at
FieldDescriptionAPI mapping examples
linksList of external linkssocialLinks, websites
social_linksLinked social accounts (structured)--
twitter_usernameTwitter/X handletwitterUsername, twitter_screen_name
facebook_uidNumeric Facebook IDAlso extracted from graph.facebook.com/{id}/picture in avatar URLs
facebook_usernameFacebook username--
instagram_usernameInstagram handle--
telegram_usernameTelegram handle--
vk_usernameVK handle--
github_usernameGitHub username--
linkedin_usernameLinkedIn handlelinkedin_url (extract last path segment)

Platform-specific fields

Fields with unique meaning that have no standard equivalent. Use a platform prefix only for such fields:

# Correct -- unique platform-specific data:
youtube_channel_id    # UC... format, unique to YouTube
tiktok_id             # differs from username and uid
sec_uid               # TikTok-specific secondary UID
steam_id              # Steam64 ID
gaia_id               # Google Account ID
karma                 # Reddit/Habr/Lesswrong
heart_count           # TikTok
streak                # Duolingo

# Wrong -- should use standard names:
xvideos_user_id       # -> uid
xvideos_username      # -> username
chess_user_id         # -> uid
twitchtracker_username # -> username

API response mapping examples

JSON API (Bluesky)

'fields': {
    'uid': lambda x: x.get('did'),          # did:plc:... -> uid
    'username': lambda x: x.get('handle'),   # jay.bsky.team -> username
    'fullname': lambda x: x.get('displayName'),
    'bio': lambda x: x.get('description'),
    'image': lambda x: x.get('avatar'),
    'follower_count': lambda x: x.get('followersCount'),
    'following_count': lambda x: x.get('followsCount'),
    'posts_count': lambda x: x.get('postsCount'),
    'created_at': lambda x: x.get('createdAt'),
}

HTML regex (XVideos)

'regex': r'"id_user":(?P<uid>\d+),"username":"(?P<username>[^"]+)","display":"(?P<fullname>[^"]*)"'
         r'[\s\S]*?"sex":"(?P<gender>[^"]*)"'
         r'[\s\S]*?Country:</strong>\s*<span>(?P<country>[^<]*)</span>'
         r'[\s\S]*?Subscribers:</strong>\s*<span>(?P<follower_count>[^<]*)</span>'
         r'[\s\S]*?Signed up:</strong>\s*<span>(?P<created_at>[^(<]*)'

og:meta (Habr)

# og:title = "Name aka username" -> fullname + username
'regex': r'og:title" content="(?P<fullname>.+?) aka (?P<username>\w+)'

When to create a new field

  1. Check this document -- is there a standard equivalent?
  2. Does the data fit a standard field? -> Use it with mapping in the lambda.
  3. Is the data unique to the platform with no equivalent? -> Create with platform prefix.
  4. Is the data useful across multiple platforms but not listed here? -> Add it to this document as a new standard field.