Job Description:
" Work on implementing a cutting-edge standard for AI networking, revolutionizing next-generation infrastructure for Generative AI training clusters.
" Design, implement and test drivers for hardware acceleration, enabling distributed AI/ML applications.
" Collaborate with a diverse team of system/software architects, hardware designers, and system/test engineers.
" Collaborate with open-source communities.
Job Qualifications:
" BSc or MSc in computer science or computer engineering, or equivalent experience
" 7+ years of experience in software development.
" Developing and running GPU-accelerated HPC or AI related applications.
" Background with HPC or AI/ML cluster networking.
" Hands-on experience with Collective Communication Libraries (e.g., NCCL) and Libfabric.
" Solid knowledge of kernel programming and kernel drivers.
" Strong programming skills in C.
" Knowledge and experience of networking and/or RDMA protocols (e.g., TCP/IP, RoCE).
" Familiarity with PCIe protocol and virtualization technologies.
" Experience with ARMv8 architecture.
" Contributions to HPC or AI/ML related open-source projects
Company Occupation:
High Tech
Company Size:
Medium (50 - 150)