new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Feb 13

Balancing Pipeline Parallelism with Vocabulary Parallelism

Pipeline parallelism is widely used to scale the training of transformer-based large language models, various works have been done to improve its throughput and memory footprint. In this paper, we address a frequently overlooked issue: the vocabulary layers can cause imbalanced computation and memory usage across pipeline stages, worsening pipeline bubbles and the memory bottleneck. To tackle this, we partition the vocabulary layers evenly across pipeline devices and group the computation into pipeline passes. To reduce the activation memory overhead, we propose several algorithms to reduce communication barriers within vocabulary layers. Additionally, we utilize a generalizable method to integrate Vocabulary Parallelism with existing pipeline schedules. By combining these techniques, our methods effectively balance the computation and parameter memory, with only a small constant activation memory overhead. Notably, when combined with activation memory-balanced schedules like V-Half, our approach achieves perfect balance in both memory and computation. Extensive evaluations demonstrate that our method achieves computation and memory balance regardless of the vocabulary size, resulting in a 5% to 51% improvement in throughput compared to naive approaches, meanwhile significantly reducing peak memory usage especially for large vocabulary scenarios. Our implementation is open-sourced at https://github.com/sail-sg/VocabularyParallelism .

  • 4 authors
·
Nov 7, 2024 3

RUBIES: a complete census of the bright and red distant Universe with JWST/NIRSpec

We present the Red Unknowns: Bright Infrared Extragalactic Survey (RUBIES), providing JWST/NIRSpec spectroscopy of red sources selected across ~150 arcmin^2 from public JWST/NIRCam imaging in the UDS and EGS fields. RUBIES novel observing strategy offers a well-quantified selection function: the survey is optimised to reach high (>70%) completeness for bright and red (F150W-F444W>2) sources that are very rare. To place these rare sources in context, we simultaneously observe a reference sample of the 2<z<7 galaxy population, sampling sources at a rate that is inversely proportional to their number density in the 3D space of F444W magnitude, F150W-F444W colour, and photometric redshift. In total, RUBIES observes ~3000 targets across 1<z_{phot}<10 with both the PRISM and G395M dispersers, and ~1500 targets at z_{phot}>3 using only the G395M disperser. The RUBIES data reveal a highly diverse population of red sources that span a broad redshift range (z_{spec}sim1-9), with photometric redshift scatter and outlier fraction that are 3 times higher than for similarly bright sources that are less red. This diversity is not apparent from the photometric SEDs. Only spectroscopy reveals that the SEDs encompass a mixture of galaxies with dust-obscured star formation, extreme line emission, a lack of star formation indicating early quenching, and luminous active galactic nuclei. As a first demonstration of our broader selection function we compare the stellar masses and rest-frame U-V colours of the red sources and our reference sample: red sources are typically more massive (M_*sim10^{10-11.5} M_odot) across all redshifts. However, we find that the most massive systems span a wide range in U-V colour. We describe our data reduction procedure and data quality, and publicly release the reduced RUBIES data and vetted spectroscopic redshifts of the first half of the survey through the DJA.

  • 28 authors
·
Sep 9, 2024

Euclid. II. The VIS Instrument

This paper presents the specification, design, and development of the Visible Camera (VIS) on the ESA Euclid mission. VIS is a large optical-band imager with a field of view of 0.54 deg^2 sampled at 0.1" with an array of 609 Megapixels and spatial resolution of 0.18". It will be used to survey approximately 14,000 deg^2 of extragalactic sky to measure the distortion of galaxies in the redshift range z=0.1-1.5 resulting from weak gravitational lensing, one of the two principal cosmology probes of Euclid. With photometric redshifts, the distribution of dark matter can be mapped in three dimensions, and, from how this has changed with look-back time, the nature of dark energy and theories of gravity can be constrained. The entire VIS focal plane will be transmitted to provide the largest images of the Universe from space to date, reaching m_AB>24.5 with S/N >10 in a single broad I_E~(r+i+z) band over a six year survey. The particularly challenging aspects of the instrument are the control and calibration of observational biases, which lead to stringent performance requirements and calibration regimes. With its combination of spatial resolution, calibration knowledge, depth, and area covering most of the extra-Galactic sky, VIS will also provide a legacy data set for many other fields. This paper discusses the rationale behind the VIS concept and describes the instrument design and development before reporting the pre-launch performance derived from ground calibrations and brief results from the in-orbit commissioning. VIS should reach fainter than m_AB=25 with S/N>10 for galaxies of full-width half-maximum of 0.3" in a 1.3" diameter aperture over the Wide Survey, and m_AB>26.4 for a Deep Survey that will cover more than 50 deg^2. The paper also describes how VIS works with the other Euclid components of survey, telescope, and science data processing to extract the cosmological information.

  • 435 authors
·
May 22, 2024

Transcending Scaling Laws with 0.1% Extra Compute

Scaling language models improves performance but comes with significant computational costs. This paper proposes UL2R, a method that substantially improves existing language models and their scaling curves with a relatively tiny amount of extra compute. The key idea is to continue training a state-of-the-art large language model (e.g., PaLM) on a few more steps with UL2's mixture-of-denoiser objective. We show that, with almost negligible extra computational costs and no new sources of data, we are able to substantially improve the scaling properties of large language models on downstream metrics. In this paper, we continue training PaLM with UL2R, introducing a new set of models at 8B, 62B, and 540B scale which we call U-PaLM. Impressively, at 540B scale, we show an approximately 2x computational savings rate where U-PaLM achieves the same performance as the final PaLM 540B model at around half its computational budget (i.e., saving sim4.4 million TPUv4 hours). We further show that this improved scaling curve leads to 'emergent abilities' on challenging BIG-Bench tasks -- for instance, U-PaLM does much better than PaLM on some tasks or demonstrates better quality at much smaller scale (62B as opposed to 540B). Overall, we show that U-PaLM outperforms PaLM on many few-shot setups, i.e., English NLP tasks (e.g., commonsense reasoning, question answering), reasoning tasks with chain-of-thought (e.g., GSM8K), multilingual tasks (MGSM, TydiQA), MMLU and challenging BIG-Bench tasks. Finally, we provide qualitative examples showing the new capabilities of U-PaLM for single and multi-span infilling.

  • 16 authors
·
Oct 20, 2022

The Apache Point Observatory Galactic Evolution Experiment (APOGEE)

The Apache Point Observatory Galactic Evolution Experiment (APOGEE), one of the programs in the Sloan Digital Sky Survey III (SDSS-III), has now completed its systematic, homogeneous spectroscopic survey sampling all major populations of the Milky Way. After a three year observing campaign on the Sloan 2.5-m Telescope, APOGEE has collected a half million high resolution (R~22,500), high S/N (>100), infrared (1.51-1.70 microns) spectra for 146,000 stars, with time series information via repeat visits to most of these stars. This paper describes the motivations for the survey and its overall design---hardware, field placement, target selection, operations---and gives an overview of these aspects as well as the data reduction, analysis and products. An index is also given to the complement of technical papers that describe various critical survey components in detail. Finally, we discuss the achieved survey performance and illustrate the variety of potential uses of the data products by way of a number of science demonstrations, which span from time series analysis of stellar spectral variations and radial velocity variations from stellar companions, to spatial maps of kinematics, metallicity and abundance patterns across the Galaxy and as a function of age, to new views of the interstellar medium, the chemistry of star clusters, and the discovery of rare stellar species. As part of SDSS-III Data Release 12, all of the APOGEE data products are now publicly available.

  • 78 authors
·
Sep 17, 2015

oMEGACat. VII. Tracing Interstellar and Intracluster Medium of $ω$ Centauri using Sodium Absorptions

We investigate the foreground interstellar medium along the line of sight and intracluster medium of omega Centauri (omega Cen) by measuring the equivalent width of Na I D absorptions from MUSE observations. The large line-of-sight velocity difference between omega Cen and the foreground enables us to separate Na I D absorption contributed from atomic gas in the interstellar and intracluster medium. We find that small-scale substructures in the foreground Na I D distribution correlate with differential reddening derived from photometric methods. Using an empirical Na I D equivalent width-reddening relation, we determine an average reddening of E(B-V)=0.153pm0.003 mag within the half-light radius of omega Cen. However, the Na I D-inferred differential reddening is significantly larger than photometric estimates. This is likely due to scatter in the Na I D-reddening relation. We find no evidence for intracluster atomic gas from spectra of horizontal branch stars, as there is no significant Na I D absorption at omega Cen's systemic velocity. Given this non-detection, we place the strongest upper limit to date on the intracluster atomic gas column density in omega Cen of lesssim2.17 times 10^{18}~cm^{-2}. We also estimate the ionized gas density from pulsar dispersion measure variations, which exceed the atomic gas limit by sim50 times. Nevertheless, the strong correlation between dispersion measure and foreground Na I D suggests that much or all of this ionized gas resides in the foreground. Given ongoing mass loss from bright giant stars, our findings imply that the intracluster gas accumulation timescale is short, and gas removal in the cluster is likely not tied to stripping as omega Cen passes through the Galactic disk.

  • 17 authors
·
Sep 30, 2025