About TensorWave
Our mission is simple: deliver seamless, secure, reliable, and resilient AI compute at scale. We've built a versatile cloud platform that eliminates infrastructure barriers, empowering builders to focus on innovation instead of fighting their stack. Because breakthrough AI should move at the speed of ideas, not infrastructure.
About the Role
We’re looking for an Executive Assistant to join our team during an exciting phase of growth. In this role, you’ll be responsible for supporting our executive team, working closely with cross-functional partners to support business objectives while upholding our standards for excellence, collaboration, and impact.
What You’ll Do
Manage calendars, meetings, travel, and priorities for the executive teamCoordinate high-impact internal and external meetings (investors, partners, customers)Draft emails, prepare briefing materials, and handle follow-upsHelp keep the leadership team organized and on top of key initiativesHandle confidential information with discretion and professionalismSupport planning for company-wide events and off-sites
Who You Are
Required Qualifications
3+ years of experience in an EA or chief of staff, ideally in a startup or tech environmentStrong written, verbal, and interpersonal skillsAbility to multitask, prioritize, and stay calm under pressureDiscretion and good judgment when handling sensitive informationA "no job too small" attitude and eagerness to help where neededComfort in a fast-paced, sometimes ambiguous environment
What We Offer
Stock Options100% paid Medical, Dental, and Vision insurance for EmployeesCompany Health Savings Account Contributions100% paid Short Term and Long Term Disability Insurance for EmployeesLife and Voluntary Supplemental Insurance OptionsOther Insurance Options, such as Pet & Legal InsuranceVarious Supplementary Health Benefits, such as discounted Virtual Healthcare Appointments and Serious Illness SupportFlexible Spending Account401(k)Employee Assistance ProgramFlexible PTOPaid HolidaysParental LeaveOther In-Office Perks
Equal Employment Opportunity
TensorWave is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate on the basis of any protected status under applicable law.
Reasonable Accommodations
TensorWave provides reasonable accommodations in accordance with applicable laws. If you require accommodation during the hiring process, please note it on your application.
Employment Eligibility
All offers of employment are contingent upon verification of identity and authorization to work in the United States, as required by law.
Background Checks
Where permitted by law, employment may be contingent upon the successful completion of a job-related background check.
Data Privacy Notice
By submitting an application, you acknowledge that TensorWave may collect, use, and retain your personal information for recruiting and employment-related purposes in accordance with applicable data privacy laws.
Read LessAbout TensorWave
Our mission is simple: deliver seamless, secure, reliable, and resilient AI compute at scale. We've built a versatile cloud platform that eliminates infrastructure barriers, empowering builders to focus on innovation instead of fighting their stack. Because breakthrough AI should move at the speed of ideas, not infrastructure.
About the Role
TensorWave is a GPU cloud infrastructure provider powering the most compute-intensive AI and machine learning workloads in the industry. We operate a growing fleet of data centers across the United States and are scaling aggressively to meet surging demand for GPU compute.
As the Data Center RMA & Inventory Specialist, you are the logistical backbone of our infrastructure operations. While our Technicians focus on the "racking and stacking," you ensure they have every component they need to succeed and that every failed part is tracked, returned, and replaced. You will manage the entire lifecycle of our hardware assets—from the moment a pallet arrives at the loading dock to the moment a decommissioned drive is securely destroyed. You will also serve as the primary point of contact for hardware vendors, coordinating on-site repairs and ensuring our maintenance contracts are fully utilized.
Responsibilities
Stock Control: Maintain a precision inventory of critical spares (SFPs, fiber jumpers, hard drives, RAM, power cables, and tools).DCIM Accuracy: Ensure the Data Center Infrastructure Management (DCIM) database and ERP systems are updated in real-time as assets move in and out of the "Ready for Use" (RFU) state.Audit Leadership: Conduct weekly and monthly physical audits of the inventory cage and the data center floor to ensure 100% asset accuracy.Procurement Support: Monitor "par levels" and alert the Data Center Manager when stock reaches reorder points to prevent project delays.End-to-End RMA: Initiate and track Return Merchandise Authorizations (RMA) with vendors (e.g., Dell, HPE, Cisco, Arista).Defective Media Handling: Manage the secure handling and tracking of failed storage media, ensuring compliance with data privacy standards (e.g., NIST 800-88).Shipping & Receiving: Manage the loading dock operations, including unboxing, palletizing, and coordinating with carriers (FedEx, UPS, Freight) for outgoing repairs.On-Site Coordination: Act as the primary liaison for third-party vendor technicians. You will schedule their visits and ensure they have the necessary access to perform repairs.Repair Oversight: Escort vendor technicians on the data center floor, ensuring they follow all safety, security, and "Clean Room" protocols while performing hardware swaps.Quality Assurance: Verify that vendor repairs are completed to standard and that all replaced parts are correctly logged before the vendor leaves the site.Tech Collaboration: Work closely with the 8 Data Center Technicians to identify hardware failure trends and ensure that "Dead on Arrival" (DOA) equipment is caught and processed immediately.Basic Troubleshooting: Perform initial "health checks" on returned or refurbished gear to ensure it is functional before returning it to general stock.Required Experience
2+ years in inventory management, logistics, or warehouse operations, preferably within a high-tech or data center environment.Strong understanding of shipping/receiving procedures and RMA workflows.Ability to identify various IT components (CPUs, DIMMs, SFP types, Fiber connectors).Proficiency with inventory management software and ticketing systems (e.g., ServiceNow, Jira, NetBox).High level of organizational skill—you are the person who notices when a single SFP is missing from a box of 50.Willingness to work shift schedules including nights, weekends, and holidays as part of a 24/7 coverage modelAbility to operate a pallet jack and other warehouse equipment safely.Ability to lift and move boxes up to 50 lbs.Willingness to work in a loud, climate-controlled data center environment.
Preferred Experience
Experience in a customer-facing operations role at a cloud provider, managed services provider, or colocation facilityExposure to GPU infrastructure, HPC clusters, or AI/ML compute environment.Direct experience managing hardware replacement workflows through enterprise portals such as Dell TechDirect, HPE Support Center, or Cisco CCO.DCIM/ERP Expertise: Previous experience using Data Center Infrastructure Management (DCIM) tools (e.g., NetBox, Device42, Nlyte) or enterprise inventory systems (e.g., SAP, Oracle, or ServiceNow Asset Management).Data Sanitation Standards: Working knowledge of NIST 800-88 or DoD 5220.22-M standards for secure data destruction and hard drive decommissioning.5S/Lean Methodology: Experience implementing 5S (Sort, Set in order, Shine, Standardize, Sustain) or Lean principles within a warehouse or inventory "cage" environment to optimize space and workflow.Technical Hardware Identification: Ability to visually distinguish between similar but incompatible components (e.g., SFP vs. QSFP, Multi-mode vs. Single-mode fiber, or different generations of DDR RAM).Material Handling Certification: Current or previous certification in the operation of pallet jacks, stackers, or forklifts within a warehouse or loading dock setting.Logistics & Compliance: Familiarity with international shipping requirements, including Commercial Invoices, AES filings, or ATA Carnets for moving hardware across borders.Asset Tagging Strategy: Experience designing or maintaining a barcoding/QR code system for rapid asset tracking and audit reconciliation.
What We Offer
Mission-driven companyCompetitive salaryStock options100% paid Medical, Dental, and Vision insuranceFlexible PTOPaid Holidays401(k)Parental LeaveFlexible Spending AccountShort Term Disability InsuranceLife and Voluntary Supplemental InsuranceMental Health Benefits through Spring Health
Equal Employment Opportunity
TensorWave is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate on the basis of any protected status under applicable law.
Reasonable Accommodations
TensorWave provides reasonable accommodations in accordance with applicable laws. If you require accommodation during the hiring process, please contact accomodations@tensorwave.com.
Employment Eligibility
All offers of employment are contingent upon verification of identity and authorization to work in the United States, as required by law.
Background Checks
Where permitted by law, employment may be contingent upon the successful completion of a job-related background check.
Data Privacy Notice
By submitting an application, you acknowledge that TensorWave may collect, use, and retain your personal information for recruiting and employment-related purposes in accordance with applicable data privacy laws.
Read LessAbout TensorWave
Our mission is simple: deliver seamless, secure, reliable, and resilient AI compute at scale. We've built a versatile cloud platform that eliminates infrastructure barriers, empowering builders to focus on innovation instead of fighting their stack. Because breakthrough AI should move at the speed of ideas, not infrastructure.
About the Role
We are hiring a Video Editor who leads with story. Raw footage is just the starting point. Your role is finding the arc, the moment, the thing that makes someone actually watch to the end. You'll work across a range of formats: event recaps, interviews, social clips, brand films and more. The bar is the same across all of it: does it build the TensorWave voice, and does it meaningfully stand out in a crowded space?
What You’ll Do
Edit a mix of content types: brand films, event highlights, interview cuts, podcast video, social clips, and explainer content
Color grade LOG footage to a finished, cohesive look
Manage your own project files and deliverables with a clean, consistent file structure
Intermediate After Effects a plus — able to execute provided assets and build motion graphics from scratch when needed
Ingest, organize, and log footage from multi-camera shoots
Take notes and revisions efficiently without needing direction repeated
Hit deadlines: including short-turnaround deadlines for event recap content
Who You Are
Required Qualifications
5+ years of professional editing experience, with a portfolio that demonstrates real narrative instincts
Fluent in Premiere Pro and After Effects; DaVinci Resolve is a plus
Strong sense of narrative pacing: you're not just cutting on beats, you're building something real
Comfortable working with Sony FX-series camera footage, log formats, and basic color grading
Self-directed: you can take a brief, a folder of footage, and a deadline and return something that stands out
Clean communication and quick turnaround on revision feedback
Preferred Qualifications
Experience editing B2B or tech brand content
A background or interest in cinematography or production (you understand why shots were made the way they were)
Familiarity with lighting and on-set production (able to assist during shoots when needed)
What We Offer
Stock Options
100% paid Medical, Dental, and Vision insurance for Employees
Company Health Savings Account Contributions
100% paid Short Term and Long Term Disability Insurance for Employees
Life and Voluntary Supplemental Insurance Options
Other Insurance Options, such as Pet & Legal Insurance
Various Supplementary Health Benefits, such as discounted Virtual Healthcare Appointments and Serious Illness Support
Flexible Spending Account
401(k)
Employee Assistance Program
Flexible PTO
Paid Holidays
Parental Leave
Other In-Office Perks
Equal Employment Opportunity
TensorWave is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate on the basis of any protected status under applicable law.
Reasonable Accommodations
TensorWave provides reasonable accommodations in accordance with applicable laws. If you require accommodation during the hiring process, please contact accomodations@tensorwave.com.
Employment Eligibility
All offers of employment are contingent upon verification of identity and authorization to work in United States, as required by law.
Background Checks
Where permitted by law, employment may be contingent upon the successful completion of a job-related background check.
Data Privacy Notice
By submitting an application, you acknowledge that TensorWave may collect, use, and retain your personal information for recruiting and employment-related purposes in accordance with applicable data privacy laws.
Read LessAbout TensorWave
Our mission is simple: deliver seamless, secure, reliable, and resilient AI compute at scale. We've built a versatile cloud platform that eliminates infrastructure barriers, empowering builders to focus on innovation instead of fighting their stack. Because breakthrough AI should move at the speed of ideas, not infrastructure.
About the Role
We are looking for a driven Graphic Designer to join our creative team at our HQ in Las Vegas, NV. At TensorWave, we move at the speed of ideas, and our visual brand needs to keep pace. In this role you will be a part of the team translating complex AI infrastructure concepts into engaging, high-impact design assets.
This is an entry level role where you will be reporting to our senior creative leads. You’ll be an instrumental part of the team building our brand and digital presence, ensuring that every touchpoint a builder has with our brand is a seamless experience.
What You’ll Do
Execute Visual Storytelling: Collaborate with the creative team to develop and execute digital designs for social media platforms, including graphics, illustrations, and layouts.
Fuel Our Channels: Create social-first content that cuts through the noise to engage and attract the AI developer community.
Protect the Brand: Act as a steward of the TensorWave brand guidelines, ensuring consistency and high quality across every pixel and platform.
Iterate & Innovate: Stay ahead of design trends and tools to bring fresh, "frontier-tech" ideas to our brainstorming sessions.
Collaborate Under Pressure: Take direction from senior designers and adapt quickly to shifting project priorities, maintaining a "builder" mindset and a positive attitude.
Who You Are
You are a detail-oriented designer who thrives in a collaborative environment and isn't afraid to solve problems visually.
Required Qualifications
Education: Bachelor’s degree in Graphic Design (strongly preferred) or equivalent professional experience.
Professional Tenure: 1-2+ years of hands-on experience within a fast-paced, professional graphic design environment.
Technical Toolkit: High proficiency in Adobe Creative Suite, specifically Photoshop, Illustrator and InDesign. Experience in Figma is a plus
Design Fundamentals: A solid understanding of applying typography, layout, color theory, and brand standards to your work.
Operational Excellence: An Ability to work on multiple projects simultaneously, and embrace feedback as a catalyst for growth.
Communication: Excellent written and verbal communication skills when communicating with stakeholders.
Continuous Learning: A proactive drive to master emerging tools and technologies while continuously expanding your technical expertise.
What We Offer
Stock Options
100% paid Medical, Dental, and Vision insurance for Employees
Company Health Savings Account Contributions
100% paid Short Term and Long Term Disability Insurance for Employees
Life and Voluntary Supplemental Insurance Options
Other Insurance Options, such as Pet & Legal Insurance
Various Supplementary Health Benefits, such as discounted Virtual Healthcare Appointments and Serious Illness Support
Flexible Spending Account
401(k)
Employee Assistance Program
Flexible PTO
Paid Holidays
Parental Leave
Other In-Office Perks
Equal Employment Opportunity
TensorWave is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate on the basis of any protected status under applicable law.
Reasonable Accommodations
TensorWave provides reasonable accommodations in accordance with applicable laws. If you require accommodation during the hiring process, please contact accomodations@tensorwave.com.
Employment Eligibility
All offers of employment are contingent upon verification of identity and authorization to work in United States, as required by law.
Background Checks
Where permitted by law, employment may be contingent upon the successful completion of a job-related background check.
Data Privacy Notice
By submitting an application, you acknowledge that TensorWave may collect, use, and retain your personal information for recruiting and employment-related purposes in accordance with applicable data privacy laws.
Read LessAbout TensorWave
Our mission is simple: deliver seamless, secure, reliable, and resilient AI compute at scale. We've built a versatile cloud platform that eliminates infrastructure barriers, empowering builders to focus on innovation instead of fighting their stack. Because breakthrough AI should move at the speed of ideas, not infrastructure.
About the Role
We are hiring an AWS Cloud Engineer to design, provision, optimize, and support the AWS infrastructure powering our AMD GPU AI/HPC platform. This is a hands-on execution role — you'll work closely with Rust backend engineers, TypeScript developers, SREs, and platform teams to keep cloud infrastructure reliable, cost-efficient, and scalable. The goal is simple: reduce cloud bottlenecks and give our engineering teams a solid foundation to build on.
What You’ll Do
Own the full lifecycle of AWS infrastructure across dev, staging, production, and customer-facing environments — provisioning, scaling, monitoring, security, cost optimization, and decommissioning
Build and maintain Infrastructure-as-Code (Terraform, Pulumi, AWS CDK, CloudFormation)
Implement cloud patterns for high availability, auto-scaling, secure service communication, and customer environment provisioning
Build and maintain CI/CD workflows for cloud infrastructure and hosted services
Improve observability through metrics, logging, alerting, dashboards, and runbooks
Troubleshoot AWS networking, compute, storage, IAM, and deployment issues
Participate in incident response, post-incident reviews, and root cause analysis
Document architecture, operational processes, and best practices
Who You Are
Required Qualifications
5+ years in cloud infrastructure, DevOps, SRE, or platform operations
Hands-on AWS experience: VPCs, EC2, S3, IAM, CloudWatch, Route 53, load balancers, security groups, private networking
Proficiency with IaC tooling (Terraform strongly preferred)
Strong Linux fundamentals — networking, process management, storage, troubleshooting
Experience with CI/CD, Git-based workflows, and monitoring/alerting platforms
Clear communicator who can document infrastructure and collaborate across engineering teams
Preferred Qualifications
Experience with AI/ML, GPU, or HPC workloads
Kubernetes on AWS (EKS or self-managed)
Observability platforms: Prometheus, Grafana, Loki, OpenTelemetry, Datadog
AWS cost optimization: right-sizing, savings plans, lifecycle policies, tagging
Startup or high-growth infrastructure environment background
What We Offer
Stock Options
100% paid Medical, Dental, and Vision insurance for Employees
Company Health Savings Account Contributions
100% paid Short Term and Long Term Disability Insurance for Employees
Life and Voluntary Supplemental Insurance Options
Other Insurance Options, such as Pet & Legal Insurance
Various Supplementary Health Benefits, such as discounted Virtual Healthcare Appointments and Serious Illness Support
Flexible Spending Account
401(k)
Employee Assistance Program
Flexible PTO
Paid Holidays
Parental Leave
Other In-Office Perks
Equal Employment Opportunity
TensorWave is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate on the basis of any protected status under applicable law.
Reasonable Accommodations
TensorWave provides reasonable accommodations in accordance with applicable laws. If you require accommodation during the hiring process, please contact accomodations@tensorwave.com.
Employment Eligibility
All offers of employment are contingent upon verification of identity and authorization to work in the United States, as required by law.
Background Checks
Where permitted by law, employment may be contingent upon the successful completion of a job-related background check.
Data Privacy Notice
By submitting an application, you acknowledge that TensorWave may collect, use, and retain your personal information for recruiting and employment-related purposes in accordance with applicable data privacy laws.
Read LessAbout TensorWave
Our mission is simple: deliver seamless, secure, reliable, and resilient AI compute at scale. We've built a versatile cloud platform that eliminates infrastructure barriers, empowering builders to focus on innovation instead of fighting their stack. Because breakthrough AI should move at the speed of ideas, not infrastructure.
About the Role
We are seeking a highly skilled Staff BMC Developer to own the software lifecycle, configuration, integration, and long-term management of TensorWave’s custom Baseboard Management Controller systems.
This role will serve as the technical owner for our out-of-band management architecture across next-generation AI compute platforms. The primary focus will be writing software for, configuring, integrating, and maintaining Axiado 3000-series BMC modules, with specific emphasis on OpenBMC, Redfish, PLDM, MCTP, low-level hardware interfaces, and integration with AMD Universal Base Board architecture for high-density GPU platforms.
The Staff BMC Developer will work closely with infrastructure engineering, hardware provisioning, network engineering, platform engineering, AMD engineering, and OEM hardware vendors to ensure our server management layer is stable, secure, observable, automatable, and production-ready at fleet scale.
This role is not a general firmware support position. It is a senior technical ownership role responsible for the management-plane software foundation required to deploy, monitor, update, recover, and operate large-scale AI compute infrastructure.
What You’ll Do
BMC Software Development & Platform Ownership
Own the full software lifecycle for TensorWave's custom BMC systems: firmware development, image customization, board configuration, validation, release management, upgrade workflows, and recovery procedures.
Lead development and deployment for Axiado 3000-series BMC modules (Smart-SCM3002 / DC-SCM architectures).
Define BMC platform standards covering firmware builds, configuration management, access control, secure interfaces, and automation integration.
OpenBMC Development
Build, customize, maintain, and troubleshoot OpenBMC-based firmware for TensorWave-specific hardware platforms.
Integrate platform-specific sensors, inventory, and control paths; support board bring-up and hardware enablement.
Develop in C, C++, Python, and Shell within Yocto/BitBake and OpenBMC build systems.
Axiado & AMD UBB Integration
Lead integration and operationalization of Axiado BMC modules; configure, debug, and validate sensor, power, reset, and firmware update workflows.
Architect BMC integration with AMD Universal Base Board platforms supporting high-density MI-series GPU systems.
Coordinate across BMC, BIOS, GPU, NIC, and system firmware dependencies; work directly with AMD, Axiado, and OEM vendors.
Low-Level Hardware Protocols & Firmware Lifecycle
Debug and configure hardware communication interfaces: I2C, I3C, SPI, UART, PCIe, IPMI, PLDM, MCTP, and Redfish/DMTF APIs.
Design and maintain robust firmware update processes (BIOS, BMC, NIC, GPU, CPLD/FPGA), including PLDM-based workflows, rollback, version tracking, and fleet-scale rollout planning.
Implement and maintain Redfish APIs exposing power, thermal, inventory, sensor, and health data for automation and bare-metal provisioning.
Telemetry, Vendors & Cross-Functional Collaboration
Ensure platform health metrics (thermal, power, voltage, fan, GPU, system health) are exposed to infrastructure and monitoring systems — not trapped in firmware or vendor tools.
Serve as primary technical point of contact with AMD, Axiado, OEM vendors, and ODM platform teams; lead escalations and validate vendor fixes before production.
Partner closely with Infrastructure, Platform Engineering, DevOps, Network Engineering, Observability, and Security teams.
Who You Are
Required
Deep hands-on experience with BMC firmware development and OpenBMC (Yocto/BitBake).
Strong proficiency in C, C++, Python, and Shell scripting in embedded Linux environments.
Working knowledge of hardware protocols: I2C, I3C, SPI, UART, PCIe, IPMI, PLDM, MCTP.
Experience with Redfish/DMTF APIs and out-of-band server management.
Ability to work close to the metal — moving between firmware code, protocol traces, Linux diagnostics, and vendor documentation.
Strong cross-functional communication skills; comfortable leading technical conversations with silicon vendors, OEMs, and internal engineering teams.
Preferred
Experience with Axiado BMC modules or DC-SCM/Smart-SCM architectures.
Familiarity with AMD UBB platforms and MI-series GPU infrastructure.
Background in fleet-scale firmware lifecycle management.
Experience contributing to or maintaining upstream OpenBMC
What We Offer
Stock Options
100% paid Medical, Dental, and Vision insurance for Employees
Company Health Savings Account Contributions
100% paid Short Term and Long Term Disability Insurance for Employees
Life and Voluntary Supplemental Insurance Options
Other Insurance Options, such as Pet & Legal Insurance
Various Supplementary Health Benefits, such as discounted Virtual Healthcare Appointments and Serious Illness Support
Flexible Spending Account
401(k)
Employee Assistance Program
Flexible PTO
Paid Holidays
Parental Leave
Other In-Office Perks
Equal Employment Opportunity
TensorWave is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate on the basis of any protected status under applicable law.
Reasonable Accommodations
TensorWave provides reasonable accommodations in accordance with applicable laws. If you require accommodation during the hiring process, please contact accomodations@tensorwave.com.
Employment Eligibility
All offers of employment are contingent upon verification of identity and authorization to work in United States, as required by law.
Background Checks
Where permitted by law, employment may be contingent upon the successful completion of a job-related background check.
Data Privacy Notice
By submitting an application, you acknowledge that TensorWave may collect, use, and retain your personal information for recruiting and employment-related purposes in accordance with applicable data privacy laws.
Read LessAbout TensorWave
Our mission is simple: deliver seamless, secure, reliable, and resilient AI compute at scale. We've built a versatile cloud platform that eliminates infrastructure barriers, empowering builders to focus on innovation instead of fighting their stack. Because breakthrough AI should move at the speed of ideas, not infrastructure.
About the Role
The Infrastructure Engineer – DevOps, Kubernetes & Automation will support the infrastructure team by helping deploy, maintain, troubleshoot, and improve internal infrastructure automation and Kubernetes platform operations.
This role will work across Ansible, Kubernetes, Linux systems, Git-based workflows, CI/CD tooling, and internal platform services. The engineer will help convert manual operational work into repeatable automation, assist with production deployments, validate infrastructure changes, and contribute to the operational health of the environment.
This is a hands-on technical role. The ideal candidate has strong Linux fundamentals, practical automation experience, and a desire to grow into deeper Kubernetes, DevOps, and infrastructure engineering responsibilities.
What You’ll Do
Kubernetes OperationsAssist with the deployment, maintenance, and troubleshooting of Kubernetes clusters.
Support cluster lifecycle activities including node maintenance, configuration updates, upgrades, and validation.
Help investigate Kubernetes issues related to pods, services, networking, storage, ingress, certificates, and node health.
Work with senior engineers to improve Kubernetes deployment patterns, operational runbooks, and standard configurations.
Support platform services that run on Kubernetes where owned by the infrastructure team.
Ansible & Infrastructure AutomationWrite, update, and maintain Ansible roles and playbooks.
Help standardize infrastructure automation across sites, clusters, and environments.
Execute controlled Ansible deployments using approved rollout patterns.
Validate idempotency, error handling, and safe rollback behavior where applicable.
Assist with inventory organization, group variables, host variables, and reusable role design.
Convert manual operational steps into repeatable automation.
DevOps & CI/CD SupportSupport Git-based workflows for infrastructure code.
Assist with CI/CD pipeline improvements for infrastructure automation and deployment processes.
Help maintain deployment scripts, validation tooling, and operational utilities.
Contribute to automated testing and validation of infrastructure changes.
Support internal platform tooling used by the DevOps and infrastructure teams.
Linux Systems OperationsTroubleshoot Linux system issues related to services, networking, packages, storage, users, SSH, logs, and systemd.
Support Ubuntu-based infrastructure systems and GPU node operating environments.
Assist with baseline configuration, package management, service validation, and host-level remediation.
Help improve operational runbooks for common Linux and infrastructure support tasks.
Documentation & Operational ProcessDocument deployment procedures, troubleshooting steps, and operational standards.
Contribute to onboarding material for new engineers.
Maintain clear change notes and implementation records for infrastructure work.
Help improve consistency across runbooks, READMEs, and internal engineering documentation.
Who You Are
Required Qualifications
Linux system administration experience.
Basic to intermediate Kubernetes experience.
Practical Ansible experience.
Git workflow familiarity.
Understanding of CI/CD concepts.
Basic networking knowledge including DNS, routing, firewalls, subnets, and load balancing concepts.
Experience troubleshooting services using logs, systemd, command-line tools, and metrics.
Ability to read and modify YAML, shell scripts, and infrastructure configuration files.
Preferred Qualifications
Experience with Ubuntu server environments.
Experience with RKE2, Rancher, Cilium, or similar Kubernetes platforms.
Experience with Prometheus, Grafana, Loki, or other observability tools.
Experience with MAAS, PXE, bare metal provisioning, or data center infrastructure.
Experience supporting GPU, AI, HPC, or large-scale compute environments.
Familiarity with Python or Go for operational tooling.
Experience working in production infrastructure environments with change control or staged rollout practices.
What We Offer
Stock Options
100% paid Medical, Dental, and Vision insurance for Employees
Company Health Savings Account Contributions
100% paid Short Term and Long Term Disability Insurance for Employees
Life and Voluntary Supplemental Insurance Options
Other Insurance Options, such as Pet & Legal Insurance
Various Supplementary Health Benefits, such as discounted Virtual Healthcare Appointments and Serious Illness Support
Flexible Spending Account
401(k)
Employee Assistance Program
Flexible PTO
Paid Holidays
Parental Leave
Other In-Office Perks
Equal Employment Opportunity
TensorWave is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate on the basis of any protected status under applicable law.
Reasonable Accommodations
TensorWave provides reasonable accommodations in accordance with applicable laws. If you require accommodation during the hiring process, please contact accomodations@tensorwave.com.
Employment Eligibility
All offers of employment are contingent upon verification of identity and authorization to work in United States, as required by law.
Background Checks
Where permitted by law, employment may be contingent upon the successful completion of a job-related background check.
Data Privacy Notice
By submitting an application, you acknowledge that TensorWave may collect, use, and retain your personal information for recruiting and employment-related purposes in accordance with applicable data privacy laws.
Read LessAbout TensorWave
Our mission is simple: deliver seamless, secure, reliable, and resilient AI compute at scale. We've built a versatile cloud platform that eliminates infrastructure barriers, empowering builders to focus on innovation instead of fighting their stack. Because breakthrough AI should move at the speed of ideas, not infrastructure.
About the Role
We’re looking for a Staff Database Engineer to join our team during an exciting phase of growth. In this role, you’ll be responsible for database architecture, database reliability, infrastructure-adjacent database platforms, performance engineering, observability, automation, operational maturity, incident response, and engineering leadership, working closely with cross-functional partners to support business objectives while upholding our standards for excellence, collaboration, and impact.
What You’ll Do
Database Architecture & Platform OwnershipDesign and own database architecture for critical infrastructure and platform services, including PostgreSQL-backed internal platforms, Slurm accounting and operational databases, NetBox and infrastructure source-of-truth databases, custom internal applications and automation services, observability, inventory, and platform metadata systems, future database-backed control plane services.
Define standard database patterns for high availability, replication, failover, backup and restore, point-in-time recovery, performance baselining, capacity planning, upgrade lifecycle management, access control and operational security.
Establish database design standards for new internal platforms, including schema review, indexing strategy, query design, service ownership boundaries, and production readiness requirements.
PostgreSQL, MySQL, and Database ReliabilityOperate and improve production database environments across PostgreSQL, MySQL, Percona, and adjacent systems.
Own the lifecycle of database systems, including provisioning, configuration, version upgrades, replication topology design, performance tuning, backup validation, disaster recovery testing, decommissioning, documentation and runbook creation
Troubleshoot and resolve production database issues involving query latency, lock contention, replication lag, storage I/O bottlenecks, connection exhaustion, poor indexing, schema design problems, database capacity constraints, backup or restore failures
Drive root cause analysis for database-related incidents and convert findings into durable engineering improvements.
Slurm, NetBox, and Infrastructure Data SystemsServe as the senior database engineering owner for infrastructure-adjacent database platforms, including Slurm and NetBox.
For Slurm environments, support and improve database architecture related to SlurmDBD, accounting data, job history, reporting queries, performance and retention strategy, database scaling, backup and recovery, long-term operational reliability
For NetBox and source-of-truth systems, support PostgreSQL performance, database lifecycle planning, backup and restore validation, data integrity, schema-impact review, integration patterns with automation systems
Partner with DevOps, Infrastructure, MLOps, and Platform Engineering teams to ensure database-backed systems are designed to scale as the environment grows.
Performance Engineering & ObservabilityBuild deep database observability beyond basic dashboards.
Develop and maintain visibility into query performance, execution plans, index usage, replication health, locking behavior, buffer/cache efficiency, storage latency, connection pool behavior, OS-level database bottlenecks
Use tools such as PostgreSQL native statistics, MySQL/Percona tooling, Prometheus, Grafana, PMM, Query logs, slow query logs, eBPF/BCC or equivalent low-level profiling tools, Linux performance tooling
Create performance baselines and alerting standards for critical database platforms.
Identify recurring database failure patterns and build preventive monitoring, automation, and operational guardrails.
Automation, Standards, and Operational MaturityCreate database automation patterns that can be integrated with existing infrastructure tooling.
Partner with DevOps and Infrastructure Engineering to automate database provisioning, configuration standards, backup verification, health checks, replication checks, user and permission management, upgrade workflows, monitoring deployment, runbook-driven recovery procedures
Contribute database-specific modules, roles, or workflows to Ansible, CI/CD pipelines, or internal automation platforms where appropriate.
Define production database readiness standards for new services before they are promoted into critical environments.
Incident Response & Engineering LeadershipAct as the technical lead for major database incidents.
Own or support triage, root cause analysis, cross-team coordination, customer or stakeholder impact analysis, postmortems, corrective action plans, long-term remediation
Mentor L4 and L5 engineers on database operations, SQL troubleshooting, HA design, incident response, and performance analysis.
Provide senior technical review for database-impacting changes across infrastructure and platform teams.
Who You Are
Required Qualifications8+ years of production database engineering, database administration, or database architecture experience
Strong hands-on experience with PostgreSQL in production environments
Strong hands-on experience with MySQL, Percona, or equivalent relational database platforms
Experience designing and operating highly available database systems
Experience with replication, failover, backup, restore, and disaster recovery validation
Deep SQL performance tuning experience, including execution plan analysis, index design, query rewrite, schema optimization, lock contention troubleshooting, storage and I/O analysis
Strong Linux systems knowledge
Experience supporting production incidents and performing root cause analysis
Experience building or improving database monitoring and observability
Ability to work across infrastructure, DevOps, platform, and application engineering teams
Ability to define standards, influence architecture, and mentor other engineers without requiring direct management authority
Preferred QualificationsExperience with SlurmDBD, Slurm accounting databases, or HPC/AI infrastructure database workloads
Experience with NetBox or other infrastructure source-of-truth platforms
Experience with Percona XtraDB Cluster, ProxySQL, or advanced MySQL/Percona architectures
Experience with PostgreSQL HA tooling and replication architectures
Experience with Prometheus, Grafana, PMM, Splunk, or similar observability platforms
Experience with eBPF/BCC, perf, strace, or other low-level Linux diagnostic tooling
Experience supporting databases for SaaS, cloud, HPC, AI infrastructure, or large multi-tenant platforms
Experience with MongoDB, Oracle, SQL Server, or other secondary database platforms
Experience with database automation using Ansible, Terraform, CI/CD systems, or internal tooling
Experience with zero-downtime migrations, major version upgrades, and production database consolidation
What We Offer
Stock Options
100% paid Medical, Dental, and Vision insurance for Employees
Company Health Savings Account Contributions
100% paid Short Term and Long Term Disability Insurance for Employees
Life and Voluntary Supplemental Insurance Options
Other Insurance Options, such as Pet & Legal Insurance
Various Supplementary Health Benefits, such as discounted Virtual Healthcare Appointments and Serious Illness Support
Flexible Spending Account
401(k)
Employee Assistance Program
Flexible PTO
Paid Holidays
Parental Leave
Other In-Office Perks
Equal Employment Opportunity
TensorWave is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate on the basis of any protected status under applicable law.
Reasonable Accommodations
TensorWave provides reasonable accommodations in accordance with applicable laws. If you require accommodation during the hiring process, please contact accomodations@tensorwave.com.
Employment Eligibility
All offers of employment are contingent upon verification of identity and authorization to work in United States, as required by law.
Background Checks
Where permitted by law, employment may be contingent upon the successful completion of a job-related background check.
Data Privacy Notice
By submitting an application, you acknowledge that TensorWave may collect, use, and retain your personal information for recruiting and employment-related purposes in accordance with applicable data privacy laws.
Read LessAbout TensorWave
Our mission is simple: deliver seamless, secure, reliable, and resilient AI compute at scale. We've built a versatile cloud platform that eliminates infrastructure barriers, empowering builders to focus on innovation instead of fighting their stack. Because breakthrough AI should move at the speed of ideas, not infrastructure.
About the Role
TensorWave is building the next generation of GPU cloud infrastructure, and our Global Operations Center is the backbone that keeps it running 24/7 across multiple data centers. As Lead Operations Engineer, you’ll be the technical backbone of the GOC and bridge the gap between our frontline operations engineers and the engineering teams that build and maintain our platform.
You’re the person who makes the shift teams more effective: developing and validating the runbooks they execute, reviewing major incidents to drive systemic improvements, and working directly with engineering leads to build better alerting and identify tasks that can be safely pushed to the operations floor. You’ll work with the Head of Global Operations to own the operational maturity of the GOC and be the driving force behind turning reactive firefighting into proactive, repeatable operations.
What You’ll Do
Establish and enforce standards for runbook quality, including clear escalation criteria, rollback procedures, and customer impact assessmentsValidate runbooks through tabletop exercises and live testing before releasing to shift teamsContinuously identify gaps in GOC processes, tooling, and coverageDevelop and implement new processes to address operational blind spotsDrive standardization of incident triage, escalation, and resolution workflows across all shiftsBuild and maintain the GOC’s operational knowledge base. Own the full lifecycle of GOC runbooks: creation, peer review, validation, and retirementDevelop runbooks for common and safe operational tasks that shift teams can execute independentlyLead post-incident reviews for all major incidents, producing actionable findingsTrack recurring issue patterns and drive root cause resolution with engineering teamsMaintain and report on incident metrics, identifying trends that require systemic fixesEnsure lessons learned are incorporated into runbooks, alerting, and trainingServe as the primary technical liaison between the GOC and EngineeringWork with engineering leads to develop and refine alerting thresholds and reduce noiseIdentify operational tasks that can be safely delegated to shift teams and work with engineering to build the tooling and guardrails to support thatAdvocate for operational improvements in platform design and deployment processesProvide operational input into change management and maintenance planning
Who You Are
Required Qualifications
3+ years in infrastructure operations, site reliability engineering, NOC/SOC, or a similar role in a data center, cloud, or managed services environmentStrong understanding of GPU compute infrastructure, Linux systems administration, and networking fundamentalsDemonstrated experience building runbooks, operational procedures, and incident management frameworks from scratchTrack record of leading post-incident reviews and driving measurable operational improvementsAbility to work across organizational boundaries—comfortable engaging with engineering leads, shift operators, and leadership alikeExperience with monitoring and observability tools (e.g., Prometheus, Grafana, PagerDuty, or similar)Strong written communication skills; you’ll be authoring documentation that shift teams rely on at 3 AMFamiliarity with ITIL or similar operational frameworks (pragmatic application, not certification worship)Preferred Qualifications
Experience in GPU cloud, HPC, or AI/ML infrastructure operationsBackground in both NOC and SOC functions or converged operations environmentsExperience supporting enterprise customers with SLA-driven service commitmentsFamiliarity with hardware lifecycle management, RMA processes, and vendor coordinationScripting/automation skills (Python, Bash) for building operational toolingExperience scaling operations teams from early-stage to mature 24/7 coverage
What We Offer
Stock Options100% paid Medical, Dental, and Vision insurance for EmployeesCompany Health Savings Account Contributions100% paid Short Term and Long Term Disability Insurance for EmployeesLife and Voluntary Supplemental Insurance OptionsOther Insurance Options, such as Pet & Legal InsuranceVarious Supplementary Health Benefits, such as discounted Virtual Healthcare Appointments and Serious Illness SupportFlexible Spending Account401(k)Employee Assistance ProgramFlexible PTOPaid HolidaysParental LeaveOther In-Office Perks
Equal Employment Opportunity
TensorWave is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate on the basis of any protected status under applicable law.
Reasonable Accommodations
TensorWave provides reasonable accommodations in accordance with applicable laws. If you require accommodation during the hiring process, please note it in your application.
Employment Eligibility
All offers of employment are contingent upon verification of identity and authorization to work in the United States, as required by law.
Background Checks
Where permitted by law, employment may be contingent upon the successful completion of a job-related background check.
Data Privacy Notice
By submitting an application, you acknowledge that TensorWave may collect, use, and retain your personal information for recruiting and employment-related purposes in accordance with applicable data privacy laws.
Read LessAbout TensorWave
Our mission is simple: deliver seamless, secure, reliable, and resilient AI compute at scale. We've built a versatile cloud platform that eliminates infrastructure barriers, empowering builders to focus on innovation instead of fighting their stack. Because breakthrough AI should move at the speed of ideas, not infrastructure.
About the Role
We are looking for a Technical Account Manager to independently own a portfolio of customer accounts and serve as the primary technical point of contact across the full customer lifecycle. Reporting into CX, our TAMs manage onboarding, adoption, expansion, and renewal for a mix of mid-market and enterprise customers running workloads on TensorWave’s infrastructure.
This role sits at the intersection of technical depth and relationship management. You will develop tailored success plans aligned to customer business objectives, lead escalation coordination during critical incidents, and partner with Sales to identify expansion opportunities. Beyond your own accounts, you will contribute to the broader TAM team through knowledge sharing, process improvement, and mentorship of junior peers. If you thrive in fast-paced environments and want to be a trusted advisor to customers building at the frontier of AI, this role is for you.
What You’ll Do
Independently own 5-10 customer accounts across the full lifecycle: onboarding, adoption, expansion, and renewal
Develop and maintain account success plans that align TensorWave capabilities to customer roadmaps
Lead quarterly business reviews (QBRs) with customer engineering and management stakeholders
Monitor account health metrics (utilization, ticket volume, CSAT) and take corrective action before issues escalate
Identify and triage technical risks—underutilization, configuration drift, pending hardware end-of-life—before they become escalations
Coordinate cross-functional responses to P1/P2 incidents including NOC, Infrastructure, and Engineering teams
Produce customer-facing technical documentation: architecture diagrams, runbooks, migration plans, and post-incident reports
Perform root cause analysis on infrastructure incidents using logs, metrics, and tracing tools
Partner with Sales on upsell and expansion discussions; provide technical qualification for new workloads
Identify qualified expansion opportunities and support commercial conversations with technical depth
Contribute to the internal knowledge base; flag gaps in tooling or process to leadership with proposed solutions
Mentor TAM peers on technical topics and account management best practices
Participate in internal enablement sessions and share learnings from customer engagements
Required Experience
3+ years of experience in a technical, customer-facing role (solutions engineering, technical account management, customer success engineering, or similar)
Demonstrated independent ownership of enterprise customer relationships with measurable retention and expansion outcomes
Experience with ticketing and incident tracking systems (e.g., PagerDuty, Jira, or equivalent)
Ability to diagnose and resolve networking issues independently (BGP peering, RDMA/RoCE, optical link performance)
Solid understanding of distributed ML training frameworks (PyTorch, JAX) and inference serving patterns
Working knowledge of data center operations: power, cooling, hardware lifecycle, and maintenance windows
Experience leading customer-facing business reviews and producing executive-level status communications
Excellent written and verbal communication skills, particularly when engaging with both technical engineers and business stakeholders
Preferred Experience
Experience in GPU cloud, HPC, or AI/ML infrastructure operations
Proficiency in Kubernetes administration, GPU workload scheduling, and multi-node cluster management
Familiarity with AMD GPU platforms (MI300X, MI325X, MI355X) and the ROCm software ecosystem
Background with WEKA or parallel file systems in high-performance compute environments
Experience with ITIL or structured incident management and change management frameworks
Hands-on experience with monitoring and observability platforms (e.g., Grafana, Prometheus, or similar)
Scripting or automation skills (Python, Bash) for building operational tooling or customer-facing utilities
Background in cloud infrastructure providers (AWS, GCP, Azure, or neocloud/GPU cloud providers)
What We Offer
Stock Options
100% paid Medical, Dental, and Vision insurance for Employees
Company Health Savings Account Contributions
100% paid Short Term and Long Term Disability Insurance for Employees
Life and Voluntary Supplemental Insurance Options
Other Insurance Options, such as Pet & Legal Insurance
Various Supplementary Health Benefits, such as discounted Virtual Healthcare Appointments and Serious Illness Support
Flexible Spending Account
401(k)
Employee Assistance Program
Flexible PTO
Paid Holidays
Parental Leave
Other In-Office Perks
Equal Employment Opportunity
TensorWave is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate on the basis of any protected status under applicable law.
Reasonable Accommodations
TensorWave provides reasonable accommodations in accordance with applicable laws. If you require accommodation during the hiring process, please contact accomodations@tensorwave.com.
Employment Eligibility
All offers of employment are contingent upon verification of identity and authorization to work in United States, as required by law.
Background Checks
Where permitted by law, employment may be contingent upon the successful completion of a job-related background check.
Data Privacy Notice
By submitting an application, you acknowledge that TensorWave may collect, use, and retain your personal information for recruiting and employment-related purposes in accordance with applicable data privacy laws.
Read Less