Writings and Perambulations#

Code Samples: https://gist.github.com/mutaku/81a6cf00717462686734d5ef9454a2e9

My expertise is in building and leading teams of Data Scientists, Machine Learning Engineers, Data Engineers, and Data Product experts of diverse experience levels, distributed across cultural and knowledge bases, to drive innovation, product development, and cutting edge, actionable research. I have a proven track record of mentoring, growing, executing, and building strategy around ambitious goals across the Data Science technical platform to provide business value.

My current research focuses on modernizing predictive pipelines and modeling capabilities using knowledge graphs, agentic AI, and analytical AI, with a strong emphasis on clinical implementation. I specialize in multimodal models, ranging from naive builds to fine-tuning foundation models and GraphRAG, aiming to provide clinicians with predictive assessment data, therapeutic planning strategies, and risk identification to improve patient outcomes and reduce fatigue. I am also developing an algorithm and model platform to identify and quantify cognitive complexity in patient interactions and clinicians’ daily workloads.My collective experience in advanced AI, knowledge graph development, and multimodal modeling is strategically directed towards pioneering the creation of comprehensive patient digital twins for transformative impact in both clinical practice and research.My core research endeavors are dedicated to revolutionizing predictive healthcare through the strategic integration of cutting-edge AI methodologies. Specifically, I focus on modernizing predictive pipelines and enhancing modeling capabilities by leveraging the synergistic power of knowledge graphs, agentic AI, and analytical AI. A paramount aspect of this work is its strong emphasis on clinical implementation, ensuring that theoretical advancements translate into tangible improvements in patient care.

My specialization lies in the domain of multimodal models. This encompasses a broad spectrum of approaches, ranging from the development of foundational, “naive” builds to the sophisticated fine-tuning of pre-trained foundation models. A particular area of expertise is GraphRAG, which combines the strengths of knowledge graphs with retrieval-augmented generation to enhance the accuracy and interpretability of predictive insights. The overarching goal of these efforts is to equip clinicians with robust predictive assessment data, actionable therapeutic planning strategies, and early risk identification capabilities. By doing so, we aim to significantly improve patient outcomes, optimize care pathways, and reduce the burden of clinician fatigue, thereby fostering a more efficient and effective healthcare system.

Beyond predictive modeling, I am also actively engaged in the development of an innovative algorithm and model platform designed to objectively identify and quantify cognitive complexity within patient interactions and the daily workloads of clinicians. This initiative seeks to provide a granular understanding of the demands placed on healthcare providers, paving the way for optimized workflows and improved clinician well-being.

Collectively, my extensive experience in advanced AI, sophisticated knowledge graph development, and nuanced multimodal modeling is strategically directed towards a pioneering vision: the creation of comprehensive patient digital twins. These digital twins are envisioned as dynamic, continuously evolving representations of individual patients, amalgamating diverse data streams to provide a holistic and real-time understanding of their health status. The ultimate aim of this transformative endeavor is to profoundly impact both clinical practice and medical research, ushering in an era of personalized, predictive, and proactive healthcare.

My background is in Quantitative Cell Biology and I have a PhD in Biochemistry and Molecular Biology. I spent much of my career researching novel cancer therapeutics through live cell imaging and machine learning. You can now find me developing and implementing machine learning platforms and algorithms for everything from optimization problems to personalized medicine and medicinal AI. I have researched and built novel recommenders, 24/7 production machine learning solutions, and received patents for personalization solutions in the consumer sales space and chemical profile recommenders.

Most recently, I have been working in the Biotechnology space where I have developed the long-term strategy and vision for Data Science, led a growing team of Data Scientists to develop novel algorithms in the predictive modeling and Bioinformatics spaces, and accelerated biological discovery. Moreover, I designed and implemented novel algorithms to detect differential modes of action in both natural products and whole microbe systems, and engineered a proprietary online learning platform to serve as an artificial intelligence decision guidance system for our extensive microbial collection and proprietary product discovery platform (Genesis), a digital twin. I rebuilt global research and enablement programs across domains for knowledge generation and product creation and delivery using cutting edge deep learning techniques on unstructured records, genomics, omics, and image data as the global lead for digital science discovery and delivery programs. Established, built, and implemented data governance and provenance at both the discovery and enterprise levels. Built multimodal knowledge graphs using LLM guidance to develop GraphRAG technologies in-house for production use in the global pipeline.

I am now pursuing my passion work in improving patient outcomes utilizing cutting edge, clinical AI. The company is in early inception phases, but we are already building out the next generation of clinical AI technology. I would love to discuss with you the exciting work we are doing, so please reach out!

Currently

Mayo Clinic (Rochester, MN) - Multiple roles spanning teams Additional research focus is on modernizing predictive pipelines and modeling capabilities through the utilization of knowledge graphs, agentic agents, analytical AI, focusing on clinical implementation. I work primarily with Multimodal models and solution from naive builds to fine tuning foundation models and GraphRAG. My goal is to provide clinicians with predictive assesment data, therapeutic planning strategies, and risk identification and scoring to lower clinician fatigue and improve patient outcomes. As part of this work, I am currently developing an algorithm and model platform to identify cognitive complexity for each patient touchpoint and the summation to the clinician’s day load. Please reach out to chat about my research, or any of the work below.

Owner/Creator and Team Lead - Knowledge Factory AI Platform - Rochester, MN

As leader of a highly skilled team, I’m developing a cutting-edge, robust data platform. Engineered for multi-modal data, it integrates seamlessly with diverse databases and data lakes, facilitating real-time ingestion into a sophisticated knowledge graph. Our core mission is to transform complex, disparate data into actionable insights, driving advancements across various domains.

We have achieved significant breakthroughs in developing proprietary algorithms and sophisticated workflows that adeptly manage multi-modal data, extracting value from both structured and unstructured sources. A cornerstone of our work lies in the creation, structuring, and iterative refinement of knowledge graphs, which are directly informed and enhanced by the outputs of our advanced AI-driven data processing. Our focus on optimizing data access and retrieval is crucial for both intricate agentic AI workflows, where autonomous systems require immediate access to information, and for advanced graph machine learning applications, including the development of highly accurate predictive models, sophisticated forecasting capabilities, and clinical digital twins in research and practice.

Our team has successfully engineered and deployed intelligent agents and powerful toolchains to streamline data retrieval for complex question-answering flows. We’ve also developed methodologies for neighborhood and shortest path summarization, invaluable for comprehensive documentation and establishing a single source of truth (SSOT).

Currently, we’re deeply immersed in developing a robust execution graph for advanced agentic capabilities. This groundbreaking system will revolutionize operations and decision-making, from optimizing patient care to accelerating scientific research. It will enable our AI agents to perform complex, multi-step tasks, interact intelligently with various systems, and drive significant improvements in effectiveness and outcomes across critical Mayo Clinic domains.

Technical Lead and Lead Solution Architect (Transformation Hub) - Rochester, MN

Leading technical development of the Transformation Hub products and core infrastructure assets and partnering with product to realize needs through solutioning and execution

Collaborating tightly with product management to create a cohesive execution strategy and ways of work

Serving as the technical liaison between the Transformation Hub and other Mayo teams and initiatives

Designed and executing the overarching Transformation Hub technical and product architecture that focuses on future state technologies and capabilities to bring the highest value to CDH and the Mayo Clinic through innovation acceleration, knowledge generation, analytics engines, artificial intelligence, data and model assets, and modular for efficiency and scalability

Developing a Knowledge Hub utilizing data scraping and aggregation to build an advanced knowledge graph using an LLM and exposing these data through APIs, chat, document generation, and product integration through GraphRAG approaches - LLM-created multimodal knowledge graph supporting GraphRAG, Analytical Agents, and direct queries for institutional knowledge understanding, resource planning, and enablement opportunities Developing an Intake and Triage web application that will utilize the Knowledge Hub and CDH Capabilities models through artificial intelligence to guide innovators and CDH customers/stakeholders to the quickest realization of CDH service - this system aims to further provide CDH with internal insights around gaps, opportunities, staffing, and the ability to build capacity balancing into the general Intake and Triage engine

Senior Analytics Architect (Solution Enablement) - Rochester, MN

Serving as a team leader for a group of analytics architects to provide solutioning to Center for Digital Health (CDH) and Enterprise needs

Developing an artificial intelligence system to provide self service analytic, dashboard, and insights capabilities from Mayo Clinic data encompassing practice information to capabilities writ large

Experience

17 years of production Data Science/ML/AI experience and leadership

17 years of rigorous scientific training in academic programs - healthcare and precision medicine

11+ years solutioning ML/AI architecture

8+ years of Data Science and Machine Learning Engineering Leadership and Strategy Development at the Director or VP level

5 patents awarded or in final status for the development and application of novel applications of ML/AI

25+ Member teams across disciplines in data and product

21 years of production Software Engineering experience

8 years of ML/AI and advanced analytics product management

Industry proven leader in Data Science and Machine Learning Innovation across the AI landscape

Previously

GLOBAL TRAITS DIGITAL SCIENCE LEAD, GLOBAL TRAITS DISCOVERY AND DELIVERY PROGRAM LEAD

BENCHLING RESEARCH PIPELINE TECHNICAL LEAD

Led Trait (gene, phenotype) Discovery and Delivery efforts within RD and IT Digital Science across the entire Syngenta global space

Created, built, and led Generative AI program to automate and optimize the delivery process for gene delivery processes to target specific traits (phenotypes) or gene expression objectives

Created and led the program to use multimodal modeling with structured and instructed data, and foundation LLMs for knowledge generation, product discovery and delivery pipeline performance guidance, regulatory audits, and prompt-based experimental design

Owned stack from data model, application landscape, and AI research initiatives from gene discovery to trait (phenotype) introgression including building of API integrations, MLOps, DevOPs workflows

Team comprised business (Analysts, Architects, Delivery Managers, Scrum Masters), IT (Developers, QA, MLOps, DevOPs), and science (SMEs, Product Owners, Wet Lab Researchers) domains - both domestic and distributed teams

Mentored junior team members with a focus on professional development and upskilling opportunities

Led Data Science efforts to research and deploy foundational Large Language Models (LLMs) for protein design to predict expression levels of various molecular biology constructs as a software workbench for bench researchers - work emphasized multimodal data and contextualization/fine tuning of foundation models

Built strategy and vision across discovery and delivery to streamline scientific application portfolio, create single source of truth data streams, and initiated foundational work to bring cutting edge ML/AI technologies to the scientific pipeline

Communicated strategy and work initiatives to secure funding and oversee resourcing of execution

Collaborated with stakeholders across research and business domains to ensure cooperative acceleration and growth

Previously

DIRECTOR OF DATA SCIENCE/ML/AI AND TECHNICAL LEAD/STRATEGIC VISION

Served as director of Data Science/Machine Learning/Artificial Intelligence strategy and innovation initiatives.

Created, built, deployed, then led the genomic AI program for disease prediction, novel mode of action identification, biomarker discovery, and understanding population variation from DNA to signaling pathways/omics.

Served as technical lead for the development and utilization of mixed effects models to explain the impact of environmental factors on phenotypic outcomes in the background of genomic models.

Led a growing team of Data Scientists working to drive biological research and accelerate product development.

Team size ranged from 7 to over 20, across disciplines of Data and Product.

Developed and drove 4 year strategy and vision for Data Science, Data Engineering, Bioinformatics, and Data Product for entire organization.

Onboarded artificial intelligence methodologies like Generative AI and Large Language Models for genomic information and protein variant creation.

Owned stack from data model, application landscape, and AI research initiatives across all of ML/AI and Data Science including building of API integrations, MLOps, DevOPs workflows

Advocate and conduct learning around Data Science/ML/AI products and platforms both to internal teams, executive team and board members, and external partners and collaborators (customers).

Directly contributed to high impact publications and invited for speaking engagements.

Served as Data Science liaison across the company, communicating strategy, accomplishments, initiatives, and best practices. Developed the long-term Data Science strategy and vision, hiring plan, technological innovation pipeline, and overall project management for the department.

Developed a proprietary predictive artificial intelligence decision guidance system to serve as a core to our proprietary natural and whole microbe product discovery platform - GENESIS, a digital twin technology in part combining genomics and geospatial data modeling. This digital twin technology has already generated a large corpus of actionable research, product leads, and accelerated screening paradigms to put the best leads in the field. Excitingly, my team was able to harness GENESIS to generate cross-indication predictive models to drive and accelerate lead identification across the research platform.

Led Data Science contribution to external manuscripts and internal white papers for both research, Data Science, and SOPs.

Sat on a team of technical leaders that drove research initiatives across the company and served as a hive mind to solve challenges across domains. Identified and addressed gaps in research and Data Science.

Developed a novel algorithm to use interpretable machine learning model ensembles to traverse genomic annotations and drive mode of action discovery from small datasets. Designed and developed a predictive modeling platform to accelerate product discovery in indication screening. Designed and developed a cross-indication prediction platform to enable multi-target identification.

Two publications and multiple patentable IP technologies developed within first year.

Previously

DIRECTOR OF DATA SCIENCE/ML/AI AND PRINCIPAL, RESEARCH AND MACHINE LEARNING

Was responsible for envisioning and creating Data Science from scratch that saw a rapid incline of over 10x user growth and key KPIs like LTV, spend, and user ratings through my creation of our AI-driven personalization program.

Created and maintained the core, patented algorithms behind the Firstleaf wine club using both shallow, rules-based, and deep learning machine learning and AI technologies on both molecular data and marketing big data and ran the DevOps and MLOps for the realtime, 24/7 AI platform.

Designed, built, deployed, then led a set of interpretable model algorithms to generate industry first user profiles built on billions of data points per user.

Designed, built, deployed, then led a data-driven product creation AI built on molecular and consumer profile data, optimized through MCMC parameterizarion, that could scope down to zip code level targets and was used in both standard creation workflows, as well as running regional wine clubs like the LA Times.

I built and led a local and distributed team of Data Scientists and Machine Learning engineers (The Research and Machine Learning Team) developing machine learning and AI platforms to drive real time recommendations, inform business strategy, and create/integrate with product design life cycle. This included the end-to-end development of the ML stack using both traditional ML and cutting edge deep learning techniques including computer vision, generative AI, and NLP, along with novel algorithm development.

Team size ranged from 5 to 15, across disciplines of Data and Product.

I was further responsible for the continual growth of the team, maintaining stakeholder communication, driving Data Science strategy across the organization, and directly working with C-level executives to maintain a vision and communicate strategy and execution plans aligned with business executives.

The Research and Machine Learning team was responsible for identifying and developing key Data Science and Machine Learning technologies for Firstleaf. We utilized cutting edge approaches to both empower internal company function as well as customer facing products. The team was also responsible for design and implementation of the patented (developed technologies, co-write and secured patents) algorithms that drove the Firstleaf experience.

The team was responsible for initiating, driving, and executing on data science strategies across Marketing, Finance, Business Intelligence, and Wine Making functions – the Research and Machine Learning team was a full spectrum B2B and B2C solution within Firstleaf.

5 patents (3 awarded, 2 in final review), multiple interviews, blog posts, and department awards providing high visibility to intellectual property and achievements of team.

Previously

POSTDOCTORAL RESEARCH FELLOW (Biological Machine Learning and AI) - University of North Carolina, Lineberger Cancer Center

American Heart Association funded research fellow in AI-driven precision medicine

Identifying, modeling, and understanding noise in single cell signaling during stress responses focusing on utilization of live cell imaging and machine learning on big data

Built and ran Data Science and Data Engineering capabilities in the areas of computer vision, natural language processing, artificial intelligence, infrastructure, high performance computing, predictive modeling, and mathematical modeling

Built and maintained live cell imaging infrastructure, developed a suite of machine learning algorithms to understand noise in single cell signaling, wrote and secured two fellowships, directly contributed to high impact publications and invited for speaking engagements, and mentored several graduate students and postdoctoral fellows.

Extensive experience working with time series data from signal data from engineering of pipelines and early data processing to complex machine learning algorithm development and implementation for decision-making and novel hypothesis generation

Expertise

Predictive Modeling | Medicinal AI | Machine Learning | Strategy and Vision Building | Resource Allocation Logistics | Algorithm Development | Technical Writing | Leadership | Mentorship | Team Building | Cross-departmental Collaboration | Data Science | Data Engineering | Artificial Intelligence | Generative AI | Python/Software Engineering | Cell Biology | Biophysics | Microscopy | Cancer Therapeutics | E-commerce | Manuscript and Grant Preparation | Patent Development | Bioinformatics | Microbiology | Digital Twins | Biotechnology | Precision Medicine | Multimodal Modeling | Generative AI | Fine-Tuning | AI Trust and Safety | Digital Product Management | AI Validation | Testing and Experimentation | B2B and B2C | Data and Analytics | Data Governance | Regulations and Compliance

LANGUAGES, TOOLING, INFRASTRUCTURE

Python | Amazon Web Services (AWS) | Pandas | Numpy | Scikit-learn | Matplotlib | Plotly | Tableau | Scipy | SQL | PyTorch | Sympy | Flask | Gunicorn | FastAPI | BASH | Linux | Git Github | Jupyter | HTML | Javascript | Microsoft Office | Google Suite | Matlab | Full Stack Dev | APIs | Agile | Jira | Time Series Modeling | Simulations | A/B Testing | Multi-armed Bandits | Causal Inference | MLOps | DevOPs | Generative AI| LLMs | NLP (OpenAI, Hugging Face, Vertex) | MultiModal | GenAI | Fine Tuning | Knowledge Graphs | Computer Vision | Snowflake | Databases | DBT | Cloud Computing and Infrastructures (multiple)

Patents

Systems and methods for labeling and distributing products having multiple versions with recipient version correlation on a per user basis

Method, system, and computer readable medium for labeling and distributing products having multiple versions with recipient version correlation on a per user basis

Systems and methods for controlling production and distribution of consumable items based on their chemical profiles

Using FI-RT to build wine classification models

Using FI-RT to generate wine shopping and dining recommendations

Selected Publications

Laura K. Potter, Matthew K. Martz*, Douglas Lawton*. *These authors contributed equally to the work. Ground Truthed Models to Inform Tangible Guids of Global Microbial Diversity Using Deep Neural Network Computer Vision. In Preparation.

Yong Jun Goh*, Brody J. DeYoung, Nicholas C. Dove, Brant R. Johnson, Matthew K. Martz, Patrick Videau. AgBiome: Harnessing the Microbial World for Human Benefit. Trends in Biotechnology. 2023.

McCarter PC, Vered L, Martz MK, Errede BE, Dohlman, HG, Elston, TC. Temporal separation of opposing MAPK feedback loops leads to robust stress adaptation. In preparation.

Ramona Schrage, …, Matthew Martz, …, Evi Kostenis. The experimental power of FR900359 to study Gq-regulated biological processes. Nature Communications 6, Article number: 10156. 14 December 2015.

Michelle C Helms, Elda Grabocka, Matthew K Martz, Christopher C Fischer, Nobuchika Suzuki, Philip B Wedegaertner. Mitotic-dependent phosphorylation of leukemia-associated RhoGEF (LARG) by Cdk1. Cellular Signalling, Volume 28, Issue 1, January 2016, Pages 43-52.

Martz MK, Grabocka E, Beeharry N, Yen TJ, Wedegaertner PW. Leukemia-Associated RhoGEF (LARG) is a Novel RhoGEF in Cytokinesis and Required for the Proper Completion of Abscission. Mol. Biol. Cell September 15, 2013 vol. 24 no. 18 2785-2794.

Matthew Martz and Philip Wedegaertner: Faculty of 1000 Biology, 23 Jul 2010 F1000Prime.com/4242964#eval4039063

Carkaci-Salli N, Flanagan JM, Martz MK, Salli U, Walther DJ, Bader M, Vrana KE. Functional domains of human tryptophan hydroxylase 2 (hTPH2). J Biol Chem. 2006 Sep 22;281(38):28105-12. Epub 2006 Jul 24.

Here is a list of most recent posts:

  • 09 November - Summer of AI - An AgBiome Perspective

    This is an interview that served as the starting point for a podcast wherein we discussed Artificial Intelligence with an AgBiome perspective. The podcast went beyond what is below and looked at the broader societal perspective; I will link the podcast shortly.

  • 17 April - Notes on MLOps - One

    This is a short piece I wrote while at Firstleaf as a response to a really great article on the state of MLOps. I used several strong points in the article to articulate my thoughts on where we did things well and directions I would like to see us take.

  • 13 November - Python Generators and Comprehension

    Digging into generators and comprehension - from basics to to implementation in a comprehensive tutorial. This is a walkthrough for beginners that will build up to real world examples.

  • 13 November - Dictionary Lookup - Exploring the Depths

    Exploring methods of performant Python dictionary lookups