
John Snow Labs
Generative AI & Healthcare Infrastructure
John Snow Labs is a healthcare AI company providing an NLP platform and pre-trained models for healthcare and life science organizations.
About John Snow Labs
John Snow Labs is an award-winning healthcare AI company providing state-of-the-art software, language models, and data to help healthcare organizations rapidly adopt AI.
Leadership Team
Key executives and founders sourced from public LinkedIn profiles.
I help B2B SaaS, HealthTech, and Life Sciences companies turn marketing into a predictable growth engine. As a Fractional CMO and founder of K Digital, I partner with CEOs and marketing leaders to architect multi-channel demand strategies that accelerate revenue, scale teams, and strengthen brand authority in complex, regulated markets. Over the past decade, I’ve led global marketing organizations and built high-performing teams that delivered measurable impact: +190% inbound MQL growth (2022), +148% (2023), and +33% YTD (2024) through integrated demand-generation frameworks. +188% increase in strategic pipeline using ABM and account-orchestrated engagement programs. +13-point improvement in MQL-to-SQL conversion by aligning marketing, sales, and product around shared metrics. My experience spans SaaS, enterprise software, and IT across North America, Europe, and APAC — from scaling early-stage startups to driving growth for industry leaders in FinTech, HealthTech, EdTech, HR Tech, and MarTech. At K Digital, my team and I operate as an extension of your leadership bench, combining CMO-level strategy with agency-grade execution to drive qualified pipeline, efficient spend, and brand visibility that converts. Certified across all major HubSpot disciplines (Inbound, ABM, Content, Email, Social, Contextual, Growth-Driven Design, and Marketing Software). 100% project success rate. 4.9 / 5 average client rating. Learn more at kdigital.io
Mission & Approach
AI Approach
LLM, NLP, computer vision (Visual NLP), information extraction, clinical reasoning, active learning, human-in-the-loop annotation
Products & Solutions
JSL Vision-30B
model30B flagship VLM for OCR and document understanding
30B flagship vision-language model for OCR and document understanding. Scores 0.689 on schema-constrained JSON extraction benchmark. Provides guaranteed-valid JSON outputs via schema-aware decoding with on-premise deployment for regulated workflows.
Pacific AI
platformHealthcare AI Governance, Validation & Monitoring
Enterprise-grade AI governance, compliance, and risk management platform. Ensures healthcare AI deployments meet regulatory and audit requirements at scale with global policy coverage, automated risk governance, and continuous model monitoring.
Terminology Server
softwareSemantic mapping of medical phrases to standard or custom code systems, concept maps, and value sets
Semantic mapping solution for medical terminology that maps clinical phrases to standard or custom code systems, concept maps, and value sets. Enables standardization and interoperability of clinical data.
Martlet AI
softwareRisk Adjustment and HCC Coding
AI-powered Hierarchical Condition Category coding that automates risk adjustment workflows with clinical-grade accuracy for health plans and providers. Features clinician-friendly gap capture, high-precision chart review, and mock audits with defensible packets. Independently validated and peer-reviewed.
JSL Vision Structured-8B
model8B model for structured document parsing and schema-constrained JSON extraction
An 8B vision-language model built for structured document parsing and schema-constrained JSON extraction. Runs on a single A10G (24 GB VRAM). Ranks #1 among on-premise models for JSON schema extraction with 0.730 accuracy. Ships with xgrammar structured generation for guaranteed-valid JSON outputs.
De-identification Solution
softwareAnonymize and obfuscate tabular, free text, FHIR, PDF, DICOM, and SVS files with regulatory-grade accuracy
The most accurate clinical de-identification solution available. Anonymizes multimodal data including free text, FHIR, EHR, PDF, DICOM, and SVS files with regulatory-grade accuracy. Supports compliance requirements for protected health information.
Data Curation
softwareAutomate and scale the creation of accurate patient registries, cohorts, quality measures, and analytics
Automates patient registries, cohorts, and quality measures from clinical documents. Turns unstructured EHR text into structured, queryable data. Enables care gap identification and quality measure reporting.
Generative AI Lab
FlagshipplatformImplement Human-in-the-loop workflows for regulatory-grade AI without coding
No-code platform for human-in-the-loop annotation and validation. Build regulatory-grade AI pipelines without writing code. Features active learning capabilities and enables teams to create compliant AI workflows for healthcare applications.
Patient Journey Intelligence
FlagshipplatformAutomatically integrate multimodal, longitudinal, and messy clinical data into a unified OMOP data model
Secondary use platform that integrates multimodal, longitudinal clinical data into a unified, living OMOP data model. Delivers a complete, structured view of every patient's care pathway for real-world evidence generation.
Healthcare NLP
Flagshipmodel3000+ small language models for de-identification and data curation from clinical & biomedical text
Comprehensive library of 3,000+ pre-trained small language models for de-identification, named entity recognition (NER), assertion status detection, and relation extraction. Fast and deployable on commodity hardware. Proven at 2 billion patient notes scale.
Medical LLM
FlagshipmodelGenerative AI models exceling in clinical text summarization, information extraction and question answering
Purpose-built medical large language models for scalable healthcare tasks including information extraction, clinical summarization, reasoning, and Q&A. Ranks #1 on 12 healthcare benchmarks vs. GPT-5.4, Gemini-3.1, and Claude-Opus-4.6. Designed for on-premise, HIPAA-compliant deployment.
Customer Success Stories
Databricks Healthcare & Life Sciences customers
other
Delivering domain-specific Large Language Models and Natural Language Processing technologies for healthcare and life sciences applications
Pricing
ProFree Academic
free
Free for academic research
- Open-source libraries
- AI models
- Software for academia
Hiring Activity
View Open Roles →Product & Market
Product Stage
enterprisePrimary Market
health_systemsCompany Details
- Founded
- 2015
- Headquarters
- Lewes, DE, USA
- Employees
- 75
- Stage
- seed
- Profile last updated
- June 22, 2026
Official Sources
Work here?
Claim this listing to update company information and connect with our audience.
Claim This ListingSocial & Media
X / Twitter
View Profile →Latest Updates

Medical Terminologies in Generative AI Lab: From Entity to Standard Code
Clinical NLP extracts meaning from unstructured text. But in healthcare, extracted meaning isn’t useful until it speaks the same language as the systems that need to act on it. An NLP model identifies “metformin 500mg” in a discharge summary. A medication reconciliation system needs NDC codes. An insurance claim needs HCPCS. A research database needs RxNorm. The entity is correct. But without standardization, it’s unusable by the downstream systems that depend on coded clinical data. The followi

Plain-Text OCR Benchmark: The Gap Between On-Prem and API Is Gone
In our first benchmark , we showed that JSL Vision OCR is the #1 grounded OCR model overall , beating every closed-source frontier system on the FUNSD dataset. This post answers a different question: plain-text OCR . Many enterprise workflows don’t need bounding boxes. Search indexing, clinical summarization, RAG pipelines, compliance audits, downstream LLM reasoning, all they need is clean, human-readable text in natural reading order. So how does JSL Vision ( jsl_vision), our 30B-class flagshi

John Snow Labs detects 54% more clinical PHI than OpenAI’s Privacy Filter, at 5.8× the speed on CPU
We benchmarked OpenAI Privacy Filter against a John Snow Labs de-identification pipeline on 381,959 tokens of real clinical text. The John Snow Labs pipeline reached 0.95 F1 on PHI detection vs. 0.55 for OpenAI Privacy Filter, with 0.98 recall vs. 0.64. It ran 5.8× faster on CPU. The label mapping was deliberately conservative: ambiguous clinical labels were not forced into OpenAI’s taxonomy. OpenAI recently released openai/privacy-filter , a permissively-licensed token-classification model for

Schema-Constrained OCR: The Benchmark That Actually Matters in Production
In the first two posts of this series, we benchmarked OCR on two increasingly demanding tasks: Grounded (BBox) OCR , reading text AND returning its coordinates Image → Markdown OCR , plain-text extraction for RAG / search / summarization This third post tackles the task that actually gets shipped to production in healthcare , insurance, and regulated workflows: schema-constrained JSON extraction . You give the model an image and a JSON Schema. You expect back valid JSON that conforms to that sch

A 2026 Field Guide to Visual Document Processing
If you’ve shopped for an OCR model recently, you already know the problem: every vendor claims state-of-the-art accuracy, every benchmark uses a different dataset, and “VLMs can do OCR” is technically true of a dozen models. As a buying criterion, it’s nearly useless. We built JSL Vision and wanted to know exactly where it stands against the full field. This series documents that comparison: 20+ open-source and closed-source models , three production tasks, one consistent methodology. The short
John Snow Labs in the News
All News →Data Accuracy Notice: Company information on HealthAI Central is compiled from public sources and updated regularly. While we strive for accuracy, details such as employee counts and product offerings may change. We recommend verifying critical information directly with the company before making business decisions.
