Breaking
John Snow Labs

John Snow Labs

Generative AI & Healthcare Infrastructure

John Snow Labs is a healthcare AI company providing an NLP platform and pre-trained models for healthcare and life science organizations.

About John Snow Labs

John Snow Labs is an award-winning healthcare AI company providing state-of-the-art software, language models, and data to help healthcare organizations rapidly adopt AI.

Leadership Team

Key executives and founders sourced from public LinkedIn profiles.

Veysel Kocaman, PhD

Veysel Kocaman, PhD

CTO

Healthcare AI leader, researcher, and technology executive operating at the intersection…

Kateryna Lee

Kateryna Lee

Head of Marketing

I help B2B SaaS, HealthTech, and Life Sciences companies turn marketing into a predictable growth engine. As a Fractional CMO and founder of K Digital, I partner with CEOs and marketing leaders to architect multi-channel demand strategies that accelerate revenue, scale teams, and strengthen brand authority in complex, regulated markets. Over the past decade, I’ve led global marketing organizations and built high-performing teams that delivered measurable impact: +190% inbound MQL growth (2022), +148% (2023), and +33% YTD (2024) through integrated demand-generation frameworks. +188% increase in strategic pipeline using ABM and account-orchestrated engagement programs. +13-point improvement in MQL-to-SQL conversion by aligning marketing, sales, and product around shared metrics. My experience spans SaaS, enterprise software, and IT across North America, Europe, and APAC — from scaling early-stage startups to driving growth for industry leaders in FinTech, HealthTech, EdTech, HR Tech, and MarTech. At K Digital, my team and I operate as an extension of your leadership bench, combining CMO-level strategy with agency-grade execution to drive qualified pipeline, efficient spend, and brand visibility that converts. Certified across all major HubSpot disciplines (Inbound, ABM, Content, Email, Social, Contextual, Growth-Driven Design, and Marketing Software). 100% project success rate. 4.9 / 5 average client rating. Learn more at kdigital.io

Mission & Approach

AI Approach

LLM, NLP, computer vision (Visual NLP), information extraction, clinical reasoning, active learning, human-in-the-loop annotation

Products & Solutions

JSL Vision-30B

model

30B flagship VLM for OCR and document understanding

30B flagship vision-language model for OCR and document understanding. Scores 0.689 on schema-constrained JSON extraction benchmark. Provides guaranteed-valid JSON outputs via schema-aware decoding with on-premise deployment for regulated workflows.

Deploy: on premisePricing: usage based

Pacific AI

platform

Healthcare AI Governance, Validation & Monitoring

Enterprise-grade AI governance, compliance, and risk management platform. Ensures healthcare AI deployments meet regulatory and audit requirements at scale with global policy coverage, automated risk governance, and continuous model monitoring.

Deploy: on premisePricing: enterprise

Terminology Server

software

Semantic mapping of medical phrases to standard or custom code systems, concept maps, and value sets

Semantic mapping solution for medical terminology that maps clinical phrases to standard or custom code systems, concept maps, and value sets. Enables standardization and interoperability of clinical data.

Deploy: on premisePricing: enterprise

Martlet AI

software

Risk Adjustment and HCC Coding

AI-powered Hierarchical Condition Category coding that automates risk adjustment workflows with clinical-grade accuracy for health plans and providers. Features clinician-friendly gap capture, high-precision chart review, and mock audits with defensible packets. Independently validated and peer-reviewed.

Deploy: on premisePricing: enterprise

JSL Vision Structured-8B

model

8B model for structured document parsing and schema-constrained JSON extraction

An 8B vision-language model built for structured document parsing and schema-constrained JSON extraction. Runs on a single A10G (24 GB VRAM). Ranks #1 among on-premise models for JSON schema extraction with 0.730 accuracy. Ships with xgrammar structured generation for guaranteed-valid JSON outputs.

Deploy: on premisePricing: usage based

De-identification Solution

software

Anonymize and obfuscate tabular, free text, FHIR, PDF, DICOM, and SVS files with regulatory-grade accuracy

The most accurate clinical de-identification solution available. Anonymizes multimodal data including free text, FHIR, EHR, PDF, DICOM, and SVS files with regulatory-grade accuracy. Supports compliance requirements for protected health information.

Deploy: on premisePricing: enterpriseEHR: FHIR

Data Curation

software

Automate and scale the creation of accurate patient registries, cohorts, quality measures, and analytics

Automates patient registries, cohorts, and quality measures from clinical documents. Turns unstructured EHR text into structured, queryable data. Enables care gap identification and quality measure reporting.

Deploy: on premisePricing: enterprise

Generative AI Lab

Flagshipplatform

Implement Human-in-the-loop workflows for regulatory-grade AI without coding

No-code platform for human-in-the-loop annotation and validation. Build regulatory-grade AI pipelines without writing code. Features active learning capabilities and enables teams to create compliant AI workflows for healthcare applications.

Deploy: on premisePricing: enterprise

Patient Journey Intelligence

Flagshipplatform

Automatically integrate multimodal, longitudinal, and messy clinical data into a unified OMOP data model

Secondary use platform that integrates multimodal, longitudinal clinical data into a unified, living OMOP data model. Delivers a complete, structured view of every patient's care pathway for real-world evidence generation.

Deploy: on premisePricing: enterprise

Healthcare NLP

Flagshipmodel

3000+ small language models for de-identification and data curation from clinical & biomedical text

Comprehensive library of 3,000+ pre-trained small language models for de-identification, named entity recognition (NER), assertion status detection, and relation extraction. Fast and deployable on commodity hardware. Proven at 2 billion patient notes scale.

Deploy: on premisePricing: enterprise

Medical LLM

Flagshipmodel

Generative AI models exceling in clinical text summarization, information extraction and question answering

Purpose-built medical large language models for scalable healthcare tasks including information extraction, clinical summarization, reasoning, and Q&A. Ranks #1 on 12 healthcare benchmarks vs. GPT-5.4, Gemini-3.1, and Claude-Opus-4.6. Designed for on-premise, HIPAA-compliant deployment.

Deploy: on premisePricing: enterprise

Customer Success Stories

Databricks Healthcare & Life Sciences customers

other

Delivering domain-specific Large Language Models and Natural Language Processing technologies for healthcare and life sciences applications

Read case study →

Pricing

Pro

Free Academic

free

Free for academic research

  • Open-source libraries
  • AI models
  • Software for academia

Hiring Activity

View Open Roles →

Product & Market

Product Stage

enterprise

Primary Market

health_systems

Company Details

Founded
2015
Headquarters
Lewes, DE, USA
Employees
75
Stage
seed
Profile last updated
June 22, 2026

Work here?

Claim this listing to update company information and connect with our audience.

Claim This Listing

Social & Media

Latest Updates

John Snow Labs

Medical Terminologies in Generative AI Lab: From Entity to Standard Code

Clinical NLP extracts meaning from unstructured text. But in healthcare, extracted meaning isn’t useful until it speaks the same language as the systems that need to act on it. An NLP model identifies “metformin 500mg” in a discharge summary. A medication reconciliation system needs NDC codes. An insurance claim needs HCPCS. A research database needs RxNorm. The entity is correct. But without standardization, it’s unusable by the downstream systems that depend on coded clinical data. The followi

Read more →

Plain-Text OCR Benchmark: The Gap Between On-Prem and API Is Gone

In our first benchmark , we showed that JSL Vision OCR is the #1 grounded OCR model overall , beating every closed-source frontier system on the FUNSD dataset. This post answers a different question: plain-text OCR . Many enterprise workflows don’t need bounding boxes. Search indexing, clinical summarization, RAG pipelines, compliance audits, downstream LLM reasoning, all they need is clean, human-readable text in natural reading order. So how does JSL Vision ( jsl_vision), our 30B-class flagshi

Read more →

John Snow Labs detects 54% more clinical PHI than OpenAI’s Privacy Filter, at 5.8× the speed on CPU

We benchmarked OpenAI Privacy Filter against a John Snow Labs de-identification pipeline on 381,959 tokens of real clinical text. The John Snow Labs pipeline reached 0.95 F1 on PHI detection vs. 0.55 for OpenAI Privacy Filter, with 0.98 recall vs. 0.64. It ran 5.8× faster on CPU. The label mapping was deliberately conservative: ambiguous clinical labels were not forced into OpenAI’s taxonomy. OpenAI recently released openai/privacy-filter , a permissively-licensed token-classification model for

Read more →

Schema-Constrained OCR: The Benchmark That Actually Matters in Production

In the first two posts of this series, we benchmarked OCR on two increasingly demanding tasks: Grounded (BBox) OCR , reading text AND returning its coordinates Image → Markdown OCR , plain-text extraction for RAG / search / summarization This third post tackles the task that actually gets shipped to production in healthcare , insurance, and regulated workflows: schema-constrained JSON extraction . You give the model an image and a JSON Schema. You expect back valid JSON that conforms to that sch

Read more →

A 2026 Field Guide to Visual Document Processing

If you’ve shopped for an OCR model recently, you already know the problem: every vendor claims state-of-the-art accuracy, every benchmark uses a different dataset, and “VLMs can do OCR” is technically true of a dozen models. As a buying criterion, it’s nearly useless. We built JSL Vision and wanted to know exactly where it stands against the full field. This series documents that comparison: 20+ open-source and closed-source models , three production tasks, one consistent methodology. The short

Read more →

Data Accuracy Notice: Company information on HealthAI Central is compiled from public sources and updated regularly. While we strive for accuracy, details such as employee counts and product offerings may change. We recommend verifying critical information directly with the company before making business decisions.