Last October, Vivo made a noteworthy entrance into the European market with the X51 — a slightly tweaked version of a premium smartphone already known in Asia as the X50. While the marketing emphasis certainly is on its innovative camera setup, the premium smartphone has more than one trick up its sleeve: powered by an AK4377A independent audio chip, the X51 promises to “immerse you in a world of music and sound.” When recording videos, the premium phone also features Autozoom and 3D Sound Tracking to “follow the sound coming from a chosen subject, no matter where it goes.”
We put the Vivo X51 through our rigorous DXOMARK Audio test suite to measure its performance both at recording sound using its built-in microphones, and at playing audio back through its single built-in speaker. In this review, we will break down how it fared in a variety of tests and several common use cases.
Audio specifications include:
- One side-firing speaker on the bottom
- Autozoom and 3D Sound Tracking technologies
- No headphone jack
About DXOMARK Audio tests: For scoring and analysis in our smartphone audio reviews, DXOMARK engineers perform a variety of objective tests and undertake more than 20 hours of perceptual evaluation under controlled lab conditions. This article highlights the most important results of our testing. Note that we evaluate both Playback and Recording using only the device’s built-in hardware and default apps. (For more details about our Playback protocol, click here; for more details about our Recording protocol, click here.)
With an overall score of 53, the Vivo X51 5G placed towards the nether end of our DXOMARK Audio protocol rankings, way behind Xiaomi’s Mi 10 Pro score of 76, the top-scoring phone to date.
The phone’s Playback score is heavily brought down by its single-speaker design, and by its lack of high- end low-end extension. This results in a midrange-focused rendering, nonexistent wideness, off-centered left/right balance, and weak dynamics sub-attributes. While maximum volume is decent, minimum volume isn’t well tuned, inducing a loss of intelligibility for dynamic content (such as classical music or movies). That all said, the phone has one impressive skill: it does an excellent job at managing sonic artifacts, both spectral (distortion) and temporal (compression), especially from quiet to nominal volumes.
Audio recorded through the Vivo X51 5G’s microphones doesn’t have much more to offer. In recording, too, the frequency response is shrunk down to a midrange-focused reproduction, which also affects the sound envelope. Spatial performance is slightly superior in the recording area, thanks to good localizability and above-average directivity. However, stereo is inverted in life videos (landscape mode), and an elevated level of background noise impairs the distance rendering as well as the signal-to-noise ratio. Finally, in loud environments, recordings exhibit compression and distortion, but otherwise, the phone is as effective at controlling artifacts when recording audio as it is when playing it back.
The DXOMARK Audio overall score of 53 for the Vive X51 5G is derived from its Playback and Recording scores and their respective sub-scores. In this section, we’ll take a closer look at these audio quality sub-scores and explain what they mean for the user.
Xiaomi Mi 10 Pro
Best: Xiaomi Mi 10 Pro (78)
Timbre tests measure how well a phone reproduces sound across the audible tonal range and takes into account bass, midrange, treble, tonal balance, and volume dependency.
The Vivo X51 5G’s playback timbre performance is below average, with a tonal balance exhibiting a lack of both high- and low-end extension. This induces a particularly midrange-focused sound. Classical music, pop rock, and movies fare timidly better than other genres, but still remain below average.
That said, compared to similar mono devices, the overall rendering is fairly clear.
Asus ROG Phone 5
Best: Asus ROG Phone 5 (76)
DXOMark’s dynamics tests measure how well a device reproduces the energy level of a sound source, and how precisely it reproduces bass frequencies.
Sound played back through the X51 5G’s single speaker exhibits middling dynamics: attack is weak and lacks sharpness, and both bass precision and punch are impaired by the lack of low-end extension. At maximum volume, a noticeable compression occasionally appears, which further weakens the punch reproduction.
Asus ROG Phone 5
Best: Asus ROG Phone 5 (78)
Sub-attributes for perceptual spatial tests include localizability, balance, distance, and wideness.
In the Spatial sub-category, the X51 5G is only one up from the lowest-scoring phone in this category, namely the Honor 20 Pro. Since it is a mono device, the sound field’s wideness is nonexistent, and its left/right balance is drastically shifted to the right side of the device (in landscape mode).
Localizability of the sound sources is average when evaluated against similarly-built devices, but quite poor in comparison to stereo smartphones. The distance rendering, however, is reasonably realistic.
Realme X2 Pro
Best: Realme X2 Pro (79)
Volume tests measure both the overall loudness a device is able to reproduce and how smoothly volume increases and decreases based on user input.
Playback volume isn’t one of the X51 5G’s strong suits either. Minimum volume doesn’t allow dynamic content such as classical music or movies to remain intelligible. Further, as shown in the graph above, volume could increase in a more consistent and natural manner (as is the case with LG’s V60 ThinQ 5G). Maximum volume is average for a mono device.
|Vivo X51 5G||71.3 dBA||70.4 dBA|
|Oppo Reno4 Pro 5G||75.4 dBA||72.1 dBA|
|LG V60 ThinQ 5G||75.3 dBA||71.8 dBA|
Asus ROG Phone 5
Best: Asus ROG Phone 5 (93)
Artifacts tests measure how much source audio is distorted when played back through a device’s speakers. Distortion can occur both because of sound processing in the device and because of the quality of the speakers.
Unlike every other playback sub-attribute, the X51 5G delivers a stellar performance in terms of artifacts management, tied with our best-scoring phone so far.
Except for occasional pumping at maximum volume, almost no type of artifact is noticeable — whether temporal (compression), spectral (distortion), or noise. The Vivo X51 fares best when playing games, where no user-induced artifact (such as possible occlusion of the speaker by the user’s fingers or palms) is encountered. Movie and music playback are not far behind, with excellent sub-scores as well.
Asus ROG Phone 5
Best: Asus ROG Phone 5 (85)
Similarly to the playback timbre performance, the overall reproduction is focused on midrange frequencies and suffers from a lack of high-end extension in both selfie and life videos — particularly noticeable when filming in urban surroundings. In loud environments, bass and low-mids are also recessed (especially in selfie videos), which results in slightly nasal midrange frequencies.
That said, the X51 5G fares well in our meeting use case, thanks to a harmonious tonal balance.
Best: OnePlus 8 (78)
The X51 5G delivers a rather narrow signal-to-noise ratio, due to elevated background noise. Recessed treble impairs the envelope reproduction, which in turn impairs the sharpness of plosives (sounds like “p” and “b”). When recording in loud environments, the envelope is also affected by noticeable distortion and compression.
Asus ROG Phone 5
Best: Asus ROG Phone 5 (78)
The X51 5G fares better when recording spatial attributes, thanks to decent localizability of the sound sources (albeit it slightly impaired by the lack of high-end extension), and above-average directivity. Selfie videos, for instance, provide a particularly well-suited directivity pattern, which helps attenuate sound sources outside of the field of view; but the other side of that coin is that the sound field is consequently very narrow. The default setting when recording live videos, however, does not implement such directivity. (It is available by switching from “surround recording” to the “front recording” or “back recording” option in the video app settings.)
Also note that stereo is inverted in life video recordings. Distance rendering is globally affected by loud background noise in which voices get easily drowned, and by the lack of treble.
The 3D Sound Tracking technology undoubtedly has an effect on the spatialization in recorded videos. The target sound does indeed follow moving and speaking subjects: balance is quite precise, which helps with the localization of sound sources. That said, the feature is quite tricky to activate: besides requiring the user to tap on the subject to track it, the device doesn’t make it clear if the tracking is on or off. Additionally, depending on the recording, the balance between left and right channels sometimes seems to be inverted: when the subject is going to the left, the sound occasionally goes… to the right.
Huawei Mate 30 Pro 5G
Best: Huawei Mate 30 Pro 5G (88)
While nominal loudness of recorded content is only average, the maximum level reachable without exhibiting disturbing artifacts is good. This adds up to a decent volume performance when recording for the Vivo X51 5G. Here are our test results, measured in LUFS (Loudness Unit Full Scale); as a reference, we expect loudness levels to be above -24 LUFS for recorded content:
|Meeting||Life Video||Selfie Video||Memo|
|Vivo X51 5G||-29.2 LUFS||-24.1 LUFS||-20.5 LUFS||-23.7 LUFS|
|Oppo Reno4 Pro 5G||-30 LUFS||-22.4 LUFS||-20.8 LUFS||-23.5 LUFS|
|LG V60 ThinQ 5G||-24 LUFS||-15.6 LUFS||-15.4 LUFS||-19.2 LUFS|
Asus ROG Phone 5
Best: Asus ROG Phone 5 (88)
When recording, too, the Vivo X51 5G does an excellent job at keeping sonic artifacts to a minimum… except in high-SPL scenarios. From soft to nominal volumes, both spectral and temporal artifacts are very well controlled, with only slight distortion and clipping on shouting voices. In loud environments, however, distortion and hissing become noticeable, especially on midrange-focused content.
Apple iPhone XS Max
Best: Apple iPhone XS Max (58)
The Vivo X51 5G’s background performance is quite limited by its tonal imbalance, which results in a perceivable loss of details, and therefore a lack of realism. That said, our experts noticed no artifacts during our background recording tests, except when filming selfie videos.
Despite being a high-end smartphone, the Vivo X51 5G can’t quite keep its promise of immersing the listener in a world of music and sound. Although it proves impressively effective at keeping sonic artifacts under control, its single-speaker design, significant tonal imbalance, and elevated background noise affect its timbre, spatial, and dynamics performances in both playback and recording.
- Tonal balance is fairly clear compared to similar mono devices.
- Almost no noticeable artifacts
- Lack of low- and high-end extension, leading to a midrange-focused sound reproduction.
- Poor spatial performance (especially for wideness and balance) due to the single-speaker design.
- Limited attack, punch, and bass precision due to midrange-focused tonal balance.
- Minimum volume isn’t well tuned, resulting in a lack of intelligibility for dynamic content.
- Good tonal balance in meeting use case
- Few noticeable artifacts overall
- Inverted stereo in life video
- Midrange-focused tonal balance and lack of high- and low-end extension
- Envelope affected by lack of treble, as well by as by compression and distortion when recording in loud environments.
Marie Georgescu De Hillerin