Smartmontools - SMART data analysis tool for HDD, SSD, NVMe drives

| F.A.Q.Other

Smartmontools - SMART data analysis tool for HDD, SSD, NVMe drives

smartctl controls the self-monitoring, analysis and reporting (SMART) system built into most ATA/SATA and SCSI/SAS hard drives and SSD/NVMe solid-state drives. The purpose of SMART is to monitor hard drive reliability and predict drive failures, and to perform various types of drive self-tests. Smartctl also supports some non-SMART related functions.

View all drive information

smartctl -a /dev/sdb
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-48-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
START OF INFORMATION SECTION
Model Family: Western Digital RE4 Serial ATA
Device Model: WDC WD5003ABYX-01WERA1
Serial Number: WD-WMAYP5108832
(...)

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 180 180 051 Pre-fail Always - 476055 (mało ważne)
3 Spin_Up_Time 0x0027 139 138 021 Pre-fail Always - 4025
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 24
5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 40 (WAŻNE)
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 094 094 000 Old_age Always - 4545
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 - (WAŻNE)
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 22
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 21
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2
194 Temperature_Celsius 0x0022 105 082 000 Old_age Always - 38
196 Reallocated_Event_Count 0x0032 160 160 000 Old_age Always - 40 (WAŻNE)
197 Current_Pending_Sector 0x0032 200 198 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 198 000 Old_age Offline - 49 (WAŻNE)
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 001 000 Old_age Offline - 0

Not every drive must have all of the above parameters or the parameters that are described below. Some drives may also have other parameters that are not listed here. Each parameter has several values:

  • Curent – error threshold
  • Worst – threshold value for the disk model
  • Treshold – The worst value (critical level) at which there were problems reading RAW
  • Data – present value

Key parameters

The four most sensitive SMART parameters, a non-zero value in these fields increases the probability of disk failure within 60 days respectively ():

  • Error count (39x)
  • Reallocation count (14x)
  • Off-line reallocation count (21x)
  • Number of sectors “on probation.” (16x)

SMART parameter description

  • 01 Raw Read Error Rate - Error rate during reading.(LITTLE IMPORTANT)
  • 02 Throughput Performance – Total (overall) efficiency of the disk. If the value of this attribute decreases there is a good chance that disk problems are approaching. This is not a critical error. Unfortunately, not every program monitors this value.
  • 03 Spin Up Time – The average time to spin up (accelerate) the tależy (from 0 rpm to full speed). The RAW value of this attribute expresses the time in seconds or milliseconds. Depending on the disk model. This is not a critical error, but high platter acceleration times indicate emerging problems with the mechanical systems of the disk.
  • 04 Start/Stop Count – The RAW value of this attribute determines the number of disk start/stop cycles. This is not a critical error, but in combination with parameter 09 Power-On Time Count and 10 Spin Retry Count, it gives a picture of how the disk is being used. The drive's start-up time should be close to the number of attempts to spin up the platters. Too large differences between these parameters indicate a problem with the power supply of the drive, which starts up correctly, but does not accelerate the platters. This is not only the fault of the power supply.
  • 05 Reallocated Sectors Count - Number of reallocated sectors (also referred to as: "bady", "bad sectors"). When a drive encounters a read/write/verification error, it marks that sector as reallocated and moves the data to a special reserved area (reserve area). This process is also referred to as remapping and the reallocated sectors are referred to as remaps. This is the reason why, on modern disks, we do not see "bad blocks" during area tests - this is a critical error, it indicates the formation of logical or physical "bads". By reading this value, we can tell if we have "bad sectors" on the disk. (IMPORTANT)
  • 06 Read Channel Margin – Channel reserve during data reading. The function of this attribute is not covered by the specification. It does not report a critical error.
  • 07 Seek Error Rate – Magnetic head search error rate. If the mechanical positioning system is damaged, the servo is damaged, or the thermal expansion of the disk is damaged, the number of seek errors increases. More seek errors mean deterioration of the disk surface and the disk mechanical subsystem. This is not a critical error, but it is worth monitoring.
  • 08 Seek Time Performance – The average performance of magnetic head search operations. If the value of the attribute decreases, it is a sign of problems with the mechanical subsystem of the disk.
  • 09 Power-On Time – The number of hours in the powered state. The RAW value of the attribute corresponds to the total number of hours (or minutes, seconds, depending on the manufacturer) worked by the drive. A decrease in this value to a critical level (threshold) indicates a decrease in the MTBF (mean time between failures) parameter. However, in reality, even if the MTBF decreases to zero, it does not mean that the MTBF resource has completely depleted and the drive will stop working.
  • 10 0A Spin Retry Count - Number of repeated attempts to unleash theależy. This attribute stores the total number of attempts to run the tależy before reaching full speed (provided that the first attempt was unsuccessful). A drop in this attribute is a sign of problems with the mechanical subsystem of the disk.(IMPORTANT)
  • 11 0B Recalibration Retries - This attribute indicates the number of recalibration requests (after the first attempt has failed). A drop in this attribute is a sign of problems with the mechanical subsystem of the disk.
  • 12 0C Device Power Cycle Count - This attribute specifies the total number of full disk power cycles.
  • 13 0D Soft Read Error Rate – This is the number of software read errors that occur when reading data from the disk surface.
  • 14 0E G-Sense Error Rate – frequency of errors occurring due to shock. This attribute stores the indications of the overload sensor and gives the total number of errors occurring as a result of internal overloads (disk drop, improper installation, etc.).
  • 15 0F Power Cycle Count – This attribute determines how many times the disk was powered off. Actually, it specifies the number of repaired drive power-on cycles. In combination with the 04 Start/Stop Count parameter, it indicates a problem with the drive's power supply.
  • 193 C1 Load/Unload Cycle Count – Number of parking/parking zone (Landing Zone) cycles.
  • 194 C2 Temperature – Hard drive temperature. The RAW value of this attribute gives the indication of the built-in heat sensor (in degrees Celsius). It often happens that the temperature sensor is damaged, so this value is very high or not monitored at all.
  • 196 C4 Reallocation Event Count – The number of remap operations (transferring data from the damaged sector to a special reserved area - the reserve area). The RAW value of this attribute determines the total number of attempts to transfer data from the reallocated sector to the reserve area. Both successful and unsuccessful attempts are counted. This is undoubtedly one of the most important parameters, thanks to which we can remove through remap/reset any further (on the way) bad sectors. (IMPORTANT) 
  • 197 C5 Current Pending Sector Count – This parameter determines the number of unstable sectors (waiting for remapping). During a read/write attempt, the disk marks these sectors as unstable. They will wait for remap of the platter surface. If this is not done, the disk will move them to the Reallocated Sectors Count area, thus creating new bad sectors. (This is a most critical error and is worth monitoring.)
  • 198 C6 Off-line Uncorrectable Sector Count – Number of uncorrectable errors. The RAW value of this attribute indicates the total number of uncorrectable errors during a read/write sector. An increase in the value of this attribute indicates obvious defects in the disk surface and/or problems with the mechanical subsystem of the disk. This is a critical error, indicating serious damage to the platter array or impending damage to the disk mechanism. (IMPORTANT) 
  • 199 C7 UltraDMA CRC Error Count – Total number of CRC errors in UltraDMA mode. The RAW value of the attribute indicates the number of errors detected by CRC (Interface CRC) during data transfer in UltraDMA mode. High values indicate that the electronic components of the disk are damaged. At the same time, this parameter informs us that the disk has problems in communication with the controller (disk↔plate← disk). The most common reasons for the appearance of this error are: damaged ribbon (tape), fixated power supply, damaged disk electronics, damaged motherboard electronics.
  • 200 C8 Write Error Rate (Multi Zone Error Rate) - Write error rate. This attribute indicates the total number of write errors when writing a sector. The higher the RAW value, the worse the condition of the disk surface and/or mechanical subsystem. In combination with the Raw Read Error Rate parameter, it informs us about the condition of the platters, or rather the writing on their surface.

Other parameters:

  • Disk Shift – The offset of the disks from the axis. The RAW value shows how much the disk has been displaced. The unit of measurement is unknown. NOTE: The displacement of the disks is a possible result of a strong impact or fall. Undoubtedly a critical error.
  • Loaded Hours - Use of magnetic head accurator caused by normal operation. The only thing that matters is the operation time of the accurator.
  • Load/Unload Retry Count – The use of magnetic heads accurator caused by numerous occurrences of operations such as reading, writing, pozcjonowanie heads and tp. Only the time when the heads were in the operating state counts.
  • Load Friction - The use of acurator of magnetic heads caused by friction of mechanical parts of the disk. Only the time when the heads were in the operating state counts.
  • Load-in Time – Total head accurator operating time. This attribute indicates the total time during which the disk was loaded (assuming that the heads were in the running state, outside the parking area).
  • Torque Amplification Count - Number of attempts to dismantle the disk tależy
  • GMR Head Amplitude – Amplitude of head vibration (GMR-head) during operation.
  • Head Flying Hours – The time in which the head is positioned.
  • Read Error Retry Rate – Reading error rate.

SMART - Tests

Error:

# smartctl -l error /dev/sdb
SMART Error Log Version: 1
No Errors Logged

Selftest:

smartctl -l selftest /dev/sdb
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-48-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
START OF READ SMART DATA SECTION
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 4546 -
# 2 Conveyance offline Completed without error 00% 4545 -
# 3 Extended offline Completed without error 00% 4544 -
# 4 Short offline Completed without error 00% 4543 -

Selective:

smartctl -l selective /dev/sdb
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on pow

Related Pages:

  1. Bad sector removal (hdparm) - How to remove bad sectors?
  2. Linux tools - Hdparm - (Disk management and bandwidth testing).
  3. Erasing data from disk (dd, hdparm, shred) - How to erase data from disk?