Ich möchte Dateipfade lesen, unabhängig davon, ob sie HDFS oder lokal sind. Derzeit übergebe ich die lokalen Pfade mit dem Präfix file: // und HDFS-Pfade mit dem Präfix hdfs: // und schreibe folgenden Code
Configuration configuration = new Configuration();
FileSystem fileSystem = null;
if (filePath.startsWith("hdfs://")) {
fileSystem = FileSystem.get(configuration);
} else if (filePath.startsWith("file://")) {
fileSystem = FileSystem.getLocal(configuration).getRawFileSystem();
}
Von hier aus verwende ich die APIs des Dateisystems, um die Datei zu lesen.
Können Sie mich bitte wissen lassen, ob es einen besseren Weg gibt?
Macht das Sinn,
public static void main(String[] args) throws IOException {
Configuration conf = new Configuration();
conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml"));
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
System.out.println("Enter the file path...");
String filePath = br.readLine();
Path path = new Path(filePath);
FileSystem fs = path.getFileSystem(conf);
FSDataInputStream inputStream = fs.open(path);
System.out.println(inputStream.available());
fs.close();
}
Sie müssen diesen Haken nicht setzen, wenn Sie diesen Weg gehen. Holen Sie sich das Dateisystem direkt von Path und tun Sie, was Sie möchten.
Sie können die FileSystem
folgendermaßen erhalten:
Configuration conf = new Configuration();
Path path = new Path(stringPath);
FileSystem fs = FileSystem.get(path.toUri(), conf);
Sie müssen nicht beurteilen, ob der Pfad mit hdfs://
oder file://
beginnt. Diese API erledigt die Arbeit.
Bitte überprüfen Sie das Code-Snippet unterhalb der Listendateien vom HDFS-Pfad. nämlich die Pfadzeichenfolge, die mit hdfs://
beginnt. Wenn Sie die Hadoop-Konfiguration und den lokalen Pfad angeben können, werden auch Dateien aus dem lokalen Dateisystem aufgeführt. nämlich die Pfadzeichenfolge, die mit file://
beginnt.
//helper method to get the list of files from the HDFS path
public static List<String> listFilesFromHDFSPath(Configuration hadoopConfiguration, String hdfsPath,
boolean recursive)
{
//resulting list of files
List<String> filePaths = new ArrayList<String>();
FileSystem fs = null;
//try-catch-finally all possible exceptions
try
{
//get path from string and then the filesystem
Path path = new Path(hdfsPath); //throws IllegalArgumentException, all others will only throw IOException
fs = path.getFileSystem(hadoopConfiguration);
//resolve hdfsPath first to check whether the path exists => either a real directory or o real file
//resolvePath() returns fully-qualified variant of the path
path = fs.resolvePath(path);
//if recursive approach is requested
if (recursive)
{
//(heap issues with recursive approach) => using a queue
Queue<Path> fileQueue = new LinkedList<Path>();
//add the obtained path to the queue
fileQueue.add(path);
//while the fileQueue is not empty
while (!fileQueue.isEmpty())
{
//get the file path from queue
Path filePath = fileQueue.remove();
//filePath refers to a file
if (fs.isFile(filePath))
{
filePaths.add(filePath.toString());
}
else //else filePath refers to a directory
{
//list paths in the directory and add to the queue
FileStatus[] fileStatuses = fs.listStatus(filePath);
for (FileStatus fileStatus : fileStatuses)
{
fileQueue.add(fileStatus.getPath());
} // for
} // else
} // while
} // if
else //non-recursive approach => no heap overhead
{
//if the given hdfsPath is actually directory
if (fs.isDirectory(path))
{
FileStatus[] fileStatuses = fs.listStatus(path);
//loop all file statuses
for (FileStatus fileStatus : fileStatuses)
{
//if the given status is a file, then update the resulting list
if (fileStatus.isFile())
filePaths.add(fileStatus.getPath().toString());
} // for
} // if
else //it is a file then
{
//return the one and only file path to the resulting list
filePaths.add(path.toString());
} // else
} // else
} // try
catch(Exception ex) //will catch all exception including IOException and IllegalArgumentException
{
ex.printStackTrace();
//if some problem occurs return an empty array list
return new ArrayList<String>();
} //
finally
{
//close filesystem; not more operations
try
{
if(fs != null)
fs.close();
} catch (IOException e)
{
e.printStackTrace();
} // catch
} // finally
//return the resulting list; list can be empty if given path is an empty directory without files and sub-directories
return filePaths;
} // listFilesFromHDFSPath
Wenn Sie wirklich mit der Java.io.File-API arbeiten möchten, hilft Ihnen die folgende Methode, Dateien nur aus dem lokalen Dateisystem aufzulisten. Pfadangabe, die mit file://
beginnt.
//helper method to list files from the local path in the local file system
public static List<String> listFilesFromLocalPath(String localPathString, boolean recursive)
{
//resulting list of files
List<String> localFilePaths = new ArrayList<String>();
//get the Java file instance from local path string
File localPath = new File(localPathString);
//this case is possible if the given localPathString does not exit => which means neither file nor a directory
if(!localPath.exists())
{
System.err.println("\n" + localPathString + " is neither a file nor a directory; please provide correct local path");
//return with empty list
return new ArrayList<String>();
} // if
//at this point localPath does exist in the file system => either as a directory or a file
//if recursive approach is requested
if (recursive)
{
//recursive approach => using a queue
Queue<File> fileQueue = new LinkedList<File>();
//add the file in obtained path to the queue
fileQueue.add(localPath);
//while the fileQueue is not empty
while (!fileQueue.isEmpty())
{
//get the file from queue
File file = fileQueue.remove();
//file instance refers to a file
if (file.isFile())
{
//update the list with file absolute path
localFilePaths.add(file.getAbsolutePath());
} // if
else //else file instance refers to a directory
{
//list files in the directory and add to the queue
File[] listedFiles = file.listFiles();
for (File listedFile : listedFiles)
{
fileQueue.add(listedFile);
} // for
} // else
} // while
} // if
else //non-recursive approach
{
//if the given localPathString is actually a directory
if (localPath.isDirectory())
{
File[] listedFiles = localPath.listFiles();
//loop all listed files
for (File listedFile : listedFiles)
{
//if the given listedFile is actually a file, then update the resulting list
if (listedFile.isFile())
localFilePaths.add(listedFile.getAbsolutePath());
} // for
} // if
else //it is a file then
{
//return the one and only file absolute path to the resulting list
localFilePaths.add(localPath.getAbsolutePath());
} // else
} // else
//return the resulting list; list can be empty if given path is an empty directory without files and sub-directories
return localFilePaths;
} // listFilesFromLocalPath